Theses Doctoral

Interpretable Machine Learning for the Social Sciences: Applications in Political Science and Labor Economics

Vafa, Keyon

Recent advances in machine learning offer social scientists a unique opportunity to use data-driven methods to uncover insights into human behavior. However, current machine learning methods are opaque, ineffective on small social science datasets, and tailored for predicting unseen values rather than estimating parameters from data. In this thesis, we develop interpretable machine learning techniques designed to uncover latent patterns and estimate critical quantities in the social sciences.

We focus on two aspects of interpretability: explaining individual model predictions and discovering latent patterns from data. We describe a method for explaining the predictions of general, black-box sequence models. This method approximates a combinatorial objective to elucidate the decision-making processes of sequence models. Next, we narrow our focus to domain-specific applications. In political science, we develop the text-based ideal point model, a model that quantifies political positions from text.

This model marries a classical idea from political science with a Bayesian matrix factorization technique to infer meaningful structure from text. In labor economics, we adapt a model from natural language processing to analyze career trajectories. We describe a transfer learning method that can overcome the constraints posed by small survey datasets. Finally, we adapt this predictive model to estimate an important quantity in labor economics: the history-adjusted gender wage gap.


  • thumnail for Vafa_columbia_0054D_17906.pdf Vafa_columbia_0054D_17906.pdf application/pdf 2.96 MB Download File

More About This Work

Academic Units
Computer Science
Thesis Advisors
Blei, David Meir
Ph.D., Columbia University
Published Here
June 28, 2023