Theses Doctoral

Rare-Event Estimation and Calibration for Large-Scale Stochastic Simulation Models

Bai, Yuanlu

Stochastic simulation has been widely applied in many domains. More recently, however, the rapid surge of sophisticated problems such as safety evaluation of intelligent systems has posed various challenges to conventional statistical methods. Motivated by these challenges, in this thesis, we develop novel methodologies with theoretical guarantees and numerical applications to tackle them from different perspectives.

In particular, our works can be categorized into two areas: (1) rare-event estimation (Chapters 2 to 5) where we develop approaches to estimating the probabilities of rare events via simulation; (2) model calibration (Chapters 6 and 7) where we aim at calibrating the simulation model so that it is close to reality.

In Chapter 2, we study rare-event simulation for a class of problems where the target hitting sets of interest are defined via modern machine learning tools such as neural networks and random forests. We investigate an importance sampling scheme that integrates the dominating point machinery in large deviations and sequential mixed integer programming to locate the underlying dominating points. We provide efficiency guarantees and numerical demonstration of our approach.

In Chapter 3, we propose a new efficiency criterion for importance sampling, which we call probabilistic efficiency. Conventionally, an estimator is regarded as efficient if its relative error is sufficiently controlled. It is widely known that when a rare-event set contains multiple "important regions" encoded by the dominating points, importance sampling needs to account for all of them via mixing to achieve efficiency. We argue that the traditional analysis recipe could suffer from intrinsic looseness by using relative error as an efficiency criterion. Thus, we propose the new efficiency notion to tighten this gap. In particular, we show that under the standard Gartner-Ellis large deviations regime, an importance sampling that uses only the most significant dominating points is sufficient to attain this efficiency notion.

In Chapter 4, we consider the estimation of rare-event probabilities using sample proportions output by crude Monte Carlo. Due to the recent surge of sophisticated rare-event problems, efficiency-guaranteed variance reduction may face implementation challenges, which motivate one to look at naive estimators. In this chapter we construct confidence intervals for the target probability using this naive estimator from various techniques, and then analyze their validity as well as tightness respectively quantified by the coverage probability and relative half-width.

In Chapter 5, we propose the use of extreme value analysis, in particular the peak-over-threshold method which is popularly employed for extremal estimation of real datasets, in the simulation setting. More specifically, we view crude Monte Carlo samples as data to fit on a generalized Pareto distribution. We test this idea on several numerical examples. The results show that in the absence of efficient variance reduction schemes, it appears to offer potential benefits to enhance crude Monte Carlo estimates.

In Chapter 6, we investigate a framework to develop calibration schemes in parametric settings, which satisfies rigorous frequentist statistical guarantees via a basic notion that we call eligibility set designed to bypass non-identifiability via a set-based estimation. We investigate a feature extraction-then-aggregation approach to construct these sets that target at multivariate outputs. We demonstrate our methodology on several numerical examples, including an application to calibration of a limit order book market simulator.

In Chapter 7, we study a methodology to tackle the NASA Langley Uncertainty Quantification Challenge, a model calibration problem under both aleatory and epistemic uncertainties. Our methodology is based on an integration of distributionally robust optimization and importance sampling. The main computation machinery in this integrated methodology amounts to solving sampled linear programs. We present theoretical statistical guarantees of our approach via connections to nonparametric hypothesis testing, and numerical performances including parameter calibration and downstream decision and risk evaluation tasks.


  • thumnail for Bai_columbia_0054D_17747.pdf Bai_columbia_0054D_17747.pdf application/pdf 6.7 MB Download File

More About This Work

Academic Units
Industrial Engineering and Operations Research
Thesis Advisors
Lam, Henry
Ph.D., Columbia University
Published Here
May 3, 2023