2020 Theses Doctoral

# Uncertainty Quantification in Data-Driven Simulation and Optimization: Statistical and Computational Efficiency

Models governing stochasticity in various systems are typically calibrated from data, therefore are subject to statistical errors/uncertainties which can lead to inferior decision making. This thesis develops statistically and computationally efficient data-driven methods for problems in stochastic simulation and optimization to quantify and hedge impacts of these uncertainties.

The first half of the thesis focuses on efficient methods for tackling input uncertainty which refers to the simulation output variability arising from the statistical noise in specifying the input models. Due to the convolution of the simulation noise and the input noise, existing bootstrap approaches consist of a two-layer sampling and typically require substantial simulation effort. Chapter 2 investigates a subsampling framework to reduce the required effort, by leveraging the form of the variance and its estimation error in terms of the data size and the sampling requirement in each layer. We show how the total required effort is reduced, and explicitly identify the procedural specifications in our framework that guarantee relative consistency in the estimation, and the corresponding optimal simulation budget allocations. In Chapter 3 we study an optimization-based approach to construct confidence intervals for simulation outputs under input uncertainty. This approach computes confidence bounds from simulation runs driven by probability weights defined on the data, which are obtained from solving optimization problems under suitably posited averaged divergence constraints. We illustrate how this approach offers benefits in computational efficiency and finite-sample performance compared to the bootstrap and the delta method. While resembling distributionally robust optimization, we explain the procedural design and develop tight statistical guarantees via a generalization of the empirical likelihood method.

The second half develops uncertainty quantification techniques for certifying solution feasibility and optimality in data-driven optimization. Regarding optimality, Chapter 4 proposes a statistical method to estimate the optimality gap of a given solution for stochastic optimization as an assessment of the solution quality. Our approach is based on bootstrap aggregating, or bagging, resampled sample average approximation (SAA). We show how this approach leads to valid statistical confidence bounds for non-smooth optimization. We also demonstrate its statistical efficiency and stability that are especially desirable in limited-data situations. We present our theory that views SAA as a kernel in an infinite-order symmetric statistic. Regarding feasibility, Chapter 5 considers data-driven optimization under uncertain constraints, where solution feasibility is often ensured through a "safe" reformulation of the constraints, such that an obtained solution is guaranteed feasible for the oracle formulation with high confidence. Such approaches generally involve an implicit estimation of the whole feasible set that can scale rapidly with the problem dimension, in turn leading to over-conservative solutions. We investigate validation-based strategies to avoid set estimation by exploiting the intrinsic low dimensionality of the set of all possible solutions output from a given reformulation. We demonstrate how our obtained solutions satisfy statistical feasibility guarantees with light dimension dependence, and how they are asymptotically optimal and thus regarded as the least conservative with respect to the considered reformulation classes.

## Files

- Qian_columbia_0054D_16070.pdf application/pdf 3.03 MB Download File

## More About This Work

- Academic Units
- Industrial Engineering and Operations Research
- Thesis Advisors
- Lam, Kwai Hung Henry
- Degree
- Ph.D., Columbia University
- Published Here
- September 8, 2020