2024 Theses Doctoral

# Trade-Offs and Opportunities in High-Dimensional Bayesian Modeling

With the increasing availability of large multivariate datasets, modern parametric statisticalmodels makes increasing use of high-dimensional parameter spaces to flexibly represent complex data generating mechanisms. Yet, ceteris paribus, increases in dimensionality often carry drawbacks across the various sub-problems of data analysis, posing challenges for the data analyst who must balance model plausibility against the practical considerations of implementation. We focus here on challenges to three components of data analysis: computation, inference, and model checking. In the computational domain, we are concerned with achieving reasonable scaling of the computational complexity with the parameter dimension without sacrificing the trustworthiness of our computation.

Here, we study a particular class of algorithms - the vectorized approximate message passing (VAMP) iterations - which offer the possibility of linear per-iteration scaling with dimension. These iterations perform approximate inference for a class of Bayesian generalized linear regression models, and we demonstrate that under flexible distributional conditions, the estimation performance of these VAMP iterations can be predicted to high accuracy with probability decaying exponentially fast in the size of the regression problem. In the realm of statistical inference, we investigate the relationship between parameter dimension and identification. We develop formal notions of weak identification and model expansion in the Bayesian setting and use this to argue for a very general tendency for dimensionality-increasing model expansion to weaken the identification of model parameters.

We draw two substantive conclusions from this formalism. First, the negative association between dimensionality and identification can be weakened or reversed when we construct prior distributions that encode sufficiently strong dependence between parameters. Absent such prior information, we derive bounds which indicate that decreasing identification is usually unavoidable with sufficient inflation of the dimension without increasing the severity of the third challenge we consider: that of dimensionality to model checking.

We divide the topic of model checking into two sub-problems: fitness testing and correctness testing. Using our model expansion formalism, we show again that both of these problems tend to become more difficult as the model dimension grows. We propose two extensions of the posterior predictive 𝑝-value - certain conditional and joint 𝑝-values, which are designed to address these challenges for fitness and correctness testing respectively. We demonstrate the potential of these 𝑝-values to allow successful model checking that scales with dimensionality theoretically and with examples.

## Subjects

## Files

- Cademartori_columbia_0054D_18647.pdf application/pdf 1.9 MB Download File

## More About This Work

- Academic Units
- Statistics
- Thesis Advisors
- Gelman, Andrew
- Rush, Cynthia
- Degree
- Ph.D., Columbia University
- Published Here
- August 28, 2024