Theses Doctoral

Methods for Measurement and Inference in Large-scale Systems: Applications in Public Policy and Large Language Models

Shi, Claudia

This thesis is about measurement and inference in large-scale systems where the quantities of interest cannot be directly observed. It consists of two distinct threads.

The first thread examines a well-established problem: estimating the causal effect of a public policy on a large population, such as the state-wide effect of a policy change. Since we cannot observe what would have happened without the intervention, we must rely on modeling assumptions. Synthetic control methods address this challenge by approximating a unit's counterfactual outcomes using a weighted combination of other units' observed outcomes.

However, this raises a fundamental question: when and why does this linear combination assumption actually hold? In chapter one, we develop a new conceptual framework and derive sufficient conditions for nonparametric identification for synthetic control methods. Chapter two extends this framework by deriving error bounds when the linear assumption fails and developing estimators that minimize these errors.

The second thread tackles a newer but equally challenging problem: understanding large language models. Unlike policy evaluation, where decades of methodological development provide established approaches, measuring what these models have learned presents entirely novel challenges.
We cannot directly observe the beliefs or capabilities encoded within these systems, yet understanding them is crucial for their deployment. This thread approaches the measurement challenge from two angles: external evaluation and internal representation.

In chapter three, we focus on external evaluation by developing methods to measure the moral beliefs that large language models express through their outputs, adapting survey methodology to work with AI systems as respondents and creating specialized evaluation metrics and datasets. Chapter four shifts to internal representation, investigating how computational capabilities are organized within these models by testing the hypothesis that specific abilities are executed by small subnetworks called "circuits" through formalized criteria and statistical hypothesis tests.

Files

  • thumbnail for Shi_columbia_0054D_19567.pdf Shi_columbia_0054D_19567.pdf application/pdf 4.31 MB Download File

More About This Work

Academic Units
Computer Science
Thesis Advisors
Blei, David Meir
Degree
Ph.D., Columbia University
Published Here
October 29, 2025