Theses Doctoral

Distributionally Robust Performance Analysis: Data, Dependence and Extremes

He, Fei

This dissertation focuses on distributionally robust performance analysis, which is an area of applied probability whose aim is to quantify the impact of model errors. Stochastic models are built to describe phenomena of interest with the intent of gaining insights or making informed decisions. Typically, however, the fidelity of these models (i.e. how closely they describe the underlying reality) may be compromised due to either the lack of information available or tractability considerations. The goal of distributionally robust performance analysis is then to quantify, and potentially mitigate, the impact of errors or model misspecifications. As such, distributionally robust performance analysis affects virtually any area in which stochastic modelling is used for analysis or decision making.
This dissertation studies various aspects of distributionally robust performance analysis. For example, we are concerned with quantifying the impact of model error in tail estimation using extreme value theory. We are also concerned with the impact of the dependence structure in risk analysis when marginal distributions of risk factors are known. In addition, we also are interested in connections recently found to machine learning and other statistical estimators which are based on distributionally robust optimization.
The first problem that we consider consists in studying the impact of model specification in the context of extreme quantiles and tail probabilities. There is a rich statistical theory that allows to extrapolate tail behavior based on limited information. This body of theory is known as extreme value theory and it has been successfully applied to a wide range of settings, including building physical infrastructure to withstand extreme environmental events and also guiding the capital requirements of insurance companies to ensure their financial solvency. Not surprisingly, attempting to extrapolate out into the tail of a distribution from limited observations requires imposing assumptions which are impossible to verify. The assumptions imposed in extreme value theory imply that a parametric family of models (known as generalized extreme value distributions) can be used to perform tail estimation. Because such assumptions are so difficult (or impossible) to be verified, we use distributionally robust optimization to enhance extreme value statistical analysis. Our approach results in a procedure which can be easily applied in conjunction with standard extreme value analysis and we show that our estimators enjoy correct coverage even in settings in which the assumptions imposed by extreme value theory fail to hold.
In addition to extreme value estimation, which is associated to risk analysis via extreme events, another feature which often plays a role in the risk analysis is the impact of dependence structure among risk factors. In the second chapter we study the question of evaluating the worst-case expected cost involving two sources of uncertainty, each of them with a specific marginal probability distribution. The worst-case expectation is optimized over all joint probability distributions which are consistent with the marginal distributions specified for each source of uncertainty. So, our formulation allows to capture the impact of the dependence structure of the risk factors. This formulation is equivalent to the so-called Monge-Kantorovich problem studied in optimal transport theory, whose theoretical properties have been studied in the literature substantially. However, rates of convergence of computational algorithms for this problem have been studied only recently. We show that if one of the random variables takes finitely many values, a direct Monte Carlo approach allows to evaluate such worst case expectation with $O(n^{-1/2})$ convergence rate as the number of Monte Carlo samples, $n$, increases to infinity.
Next, we continue our investigation of worst-case expectations in the context of multiple risk factors, not only two of them, assuming that their marginal probability distributions are fixed. This problem does not fit the mold of standard optimal transport (or Monge-Kantorovich) problems. We consider, however, cost functions which are separable in the sense of being a sum of functions which depend on adjacent pairs of risk factors (think of the factors indexed by time). In this setting, we are able to reduce the problem to the study of several separate Monge-Kantorovich problems. Moreover, we explain how we can even include martingale constraints which are often natural to consider in settings such as financial applications.
While in the previous chapters we focused on the impact of tail modeling or dependence, in the later parts of the dissertation we take a broader view by studying decisions which are made based on empirical observations. So, we focus on so-called distributionally robust optimization formulations. We use optimal transport theory to model the degree of distributional uncertainty or model misspecification. Distributionally robust optimization based on optimal transport has been a very active research topic in recent years, our contribution consists in studying how to specify the optimal transport metric in a data-driven way. We explain our procedure in the context of classification, which is of substantial importance in machine learning applications.


  • thumnail for He_columbia_0054D_14956.pdf He_columbia_0054D_14956.pdf application/pdf 1.03 MB Download File

More About This Work

Academic Units
Industrial Engineering and Operations Research
Thesis Advisors
Blanchet, Jose H.
Ph.D., Columbia University
Published Here
October 9, 2018