2023 Theses Doctoral
Interpretable Machine Learning for the Social Sciences: Applications in Political Science and Labor Economics
Recent advances in machine learning offer social scientists a unique opportunity to use data-driven methods to uncover insights into human behavior. However, current machine learning methods are opaque, ineffective on small social science datasets, and tailored for predicting unseen values rather than estimating parameters from data. In this thesis, we develop interpretable machine learning techniques designed to uncover latent patterns and estimate critical quantities in the social sciences.
We focus on two aspects of interpretability: explaining individual model predictions and discovering latent patterns from data. We describe a method for explaining the predictions of general, black-box sequence models. This method approximates a combinatorial objective to elucidate the decision-making processes of sequence models. Next, we narrow our focus to domain-specific applications. In political science, we develop the text-based ideal point model, a model that quantifies political positions from text.
This model marries a classical idea from political science with a Bayesian matrix factorization technique to infer meaningful structure from text. In labor economics, we adapt a model from natural language processing to analyze career trajectories. We describe a transfer learning method that can overcome the constraints posed by small survey datasets. Finally, we adapt this predictive model to estimate an important quantity in labor economics: the history-adjusted gender wage gap.
- Vafa_columbia_0054D_17906.pdf application/pdf 2.96 MB Download File
More About This Work
- Academic Units
- Computer Science
- Thesis Advisors
- Blei, David Meir
- Ph.D., Columbia University
- Published Here
- June 28, 2023