2020 Theses Doctoral

# Some Statistical Models for Prediction

This dissertation examines the use of statistical models for prediction. Examples are drawn from public policy and chosen because they represent pressing problems facing U.S. governments at the local, state, and federal level. The first five chapters provide examples where the perfunctory use of linear models, the prediction tool of choice in government, failed to produce reasonable predictions. Methodological flaws are identified, and more accurate models are proposed that draw on advances in statistics, data science, and machine learning. Chapter 1 examines skyscraper construction, where the normality assumption is violated and extreme value analysis is more appropriate. Chapters 2 and 3 examine presidential approval and voting (a leading measure of civic participation), where the non-collinearity assumption is violated and an index model is more appropriate. Chapter 4 examines changes in temperature sensitivity due to global warming, where the linearity assumption is violated and a first-hitting-time model is more appropriate. Chapter 5 examines the crime rate, where the independence assumption is violated and a block model is more appropriate. The last chapter provides an example where simple linear regression was overlooked as providing a sensible solution. Chapter 6 examines traffic fatalities, where the linear assumption provides a better predictor than the more popular non-linear probability model, logistic regression. A theoretical connection is established between the linear probability model, the influence score, and the predictivity.

## Geographic Areas

## Subjects

## Files

- Auerbach_columbia_0054D_16224.pdf application/pdf 5.42 MB Download File

## More About This Work

- Academic Units
- Statistics
- Thesis Advisors
- Gelman, Andrew
- Lo, Shaw-Hwa
- Degree
- Ph.D., Columbia University
- Published Here
- October 19, 2020