Academic Commons

Theses Doctoral

Some Statistical Models for Prediction

Auerbach, Jonathan Lyle

This dissertation examines the use of statistical models for prediction. Examples are drawn from public policy and chosen because they represent pressing problems facing U.S. governments at the local, state, and federal level. The first five chapters provide examples where the perfunctory use of linear models, the prediction tool of choice in government, failed to produce reasonable predictions. Methodological flaws are identified, and more accurate models are proposed that draw on advances in statistics, data science, and machine learning. Chapter 1 examines skyscraper construction, where the normality assumption is violated and extreme value analysis is more appropriate. Chapters 2 and 3 examine presidential approval and voting (a leading measure of civic participation), where the non-collinearity assumption is violated and an index model is more appropriate. Chapter 4 examines changes in temperature sensitivity due to global warming, where the linearity assumption is violated and a first-hitting-time model is more appropriate. Chapter 5 examines the crime rate, where the independence assumption is violated and a block model is more appropriate. The last chapter provides an example where simple linear regression was overlooked as providing a sensible solution. Chapter 6 examines traffic fatalities, where the linear assumption provides a better predictor than the more popular non-linear probability model, logistic regression. A theoretical connection is established between the linear probability model, the influence score, and the predictivity.

Geographic Areas


  • thumnail for Auerbach_columbia_0054D_16224.pdf Auerbach_columbia_0054D_16224.pdf application/pdf 5.42 MB Download File

More About This Work

Academic Units
Thesis Advisors
Gelman, Andrew
Lo, Shaw-Hwa
Ph.D., Columbia University
Published Here
October 19, 2020