2021 Theses Doctoral
Toward a scalable Bayesian workflow
A scalable Bayesian workflow needs the combination of fast but reliable computing, efficient but targeted model evaluation, and extensive but directed model building and expansion. In this thesis, I develop a sequence of methods to push the scalability frontier of the workflow.
First, I study diagnostics of Bayesian computing. The Pareto smoothed importance sampling stabilizes importance weights using a generalized Pareto distribution fit to the upper tail of the distribution of the simulated importance ratios. The method, which empirically performs better than existing methods for stabilizing importance sampling estimates, includes stabilized effective sample size estimates, Monte Carlo error estimates and convergence diagnostics. For variational inference, I propose two diagnostic algorithms. The Pareto smoothed importance sampling diagnostic gives a goodness of fit measurement for joint distributions, while the variational simulation-based calibration assesses the average performance of point estimates. I further apply this importance sampling strategy to causal inference and develop diagnostics for covariate imbalance in observational studies.
Second, I develop a solution to continuous model expansion using adaptive path sampling and tempering. This development is relevant to both model-building and computing in the workflow. For the former, I provide an automated way to connect models via a geometric bridge such that a supermodel encompasses individual models as a special case. For the latter, I use adaptive path sampling as a preferred strategy to estimating the normalizing constant and marginal density, based on which I propose two metastable sampling schemes. The continuous simulated tempering aims at multimodal posterior sampling, and the implicit divide-and-conquer sampler aims for a funnel-shaped entropic barrier. Both schemes are highly automated and empirically perform better than existing methods for sampling from metastable distributions.
Last, a complete Bayesian workflow distinguishes itself from a one-shot data analysis by its enthusiasm for multiple model fittings, and open-mindedness to model misspecification. I take the idea of stacking from the point estimation literature and generalize to the combination of Bayesian predictive distributions. Using importance sampling based leave-one-out approximation, stacking is computationally efficient. I compare stacking, Bayesian model averaging, and several variants in a decision theory framework. I further apply the stacking strategy to multimodal sampling in which Markov chain Monte Carlo algorithms can have difficulty moving between modes. The result from stacking is not necessarily equivalent, even asymptotically, to fully Bayesian inference, but it serves many of the same goals. Under misspecified models, stacking can give better predictive performance than full Bayesian inference, hence the multimodality can be considered a blessing rather than a curse. Furthermore, I show that stacking is most effective when the model predictive performance is heterogeneous in inputs, such that it can be further improved by hierarchical modeling. To this end, I develop hierarchical stacking, in which the model weights are input-varying yet partially-pooled, and further generalize this method to incorporate discrete and continuous inputs, other structured priors, and time-series and longitudinal data—big data need big models, and big models need big model evaluation, and big model evaluation itself needs extra data collection and model building.
- Yao_columbia_0054D_16552.pdf application/pdf 6.47 MB Download File
More About This Work
- Academic Units
- Thesis Advisors
- Gelman, Andrew
- Ph.D., Columbia University
- Published Here
- June 14, 2021