Academic Commons

Theses Doctoral

Design-based, Bayesian Causal Inference for the Social-Sciences

Leavitt, Thomas

Scholars have recognized the benefits to science of Bayesian inference about the relative plausibility of competing hypotheses as opposed to, say, falsificationism in which one either rejects or fails to reject hypotheses in isolation. Yet inference about causal effects — at least as they are conceived in the potential outcomes framework (Neyman, 1923; Rubin, 1974; Holland, 1986) — has been tethered to falsificationism (Fisher, 1935; Neyman and Pearson, 1933) and difficult to integrate with Bayesian inference. One reason for this difficulty is that potential outcomes are fixed quantities that are not embedded in statistical models. Significance tests about causal hypotheses in either of the traditions traceable to Fisher (1935) or Neyman and Pearson (1933) conceive potential outcomes in this way; randomness in inferences about about causal effects stems entirely from a physical act of randomization, like flips of a coin or draws from an urn. Bayesian inferences, by contrast, typically depend on likelihood functions with model-based assumptions in which potential outcomes — to the extent that scholars invoke them — are conceived as outputs of a stochastic, data-generating model. In this dissertation, I develop Bayesian statistical inference for causal effects that incorporates the benefits of Bayesian scientific reasoning, but does not require probability models on potential outcomes that undermine the value of randomization as the “reasoned basis” for inference (Fisher, 1935, p. 14).

In the first paper, I derive a randomization-based likelihood function in which Bayesian inference of causal effects is justified by the experimental design. I formally show that, under weak conditions on a prior distribution, as the number of experimental subjects increases indefinitely, the resulting sequence of posterior distributions converges in probability to the true causal effect. This result, typically known as the Bernstein-von Mises theorem, has been derived in the context of parametric models. Yet randomized experiments are especially credible precisely because they do not require such assumptions. Proving this result in the context of randomized experiments enables scholars to quantify how much they learn from experiments without sacrificing the design-based properties that make inferences from experiments especially credible in the first place.

Having derived a randomization-based likelihood function in the first paper, the second paper turns to the calibration of a prior distribution for a target experiment based on past experimental results. In this paper, I show that usual methods for analyzing randomized experiments are equivalent to presuming that no prior knowledge exists, which inhibits knowledge accumulation from prior to future experiments. I therefore develop a methodology by which scholars can (1) turn results of past experiments into a prior distribution for a target experiment and (2) quantify the degree of learning in the target experiment after updating prior beliefs via a randomization-based likelihood function. I implement this methodology in an original audit experiment conducted in 2020 and show the amount of Bayesian learning that results relative to information from past experiments. Large Bayesian learning and statistical significance do not always coincide, and learning is greatest among theoretically important subgroups of legislators for which relatively less prior information exists. The accumulation of knowledge about these subgroups, specifically Black and Latino legislators, carries implications about the extent to which descriptive representation operates not only within, but also between minority groups.

In the third paper, I turn away from randomized experiments toward observational studies, specifically the Difference-in-Differences (DID) design. I show that DID’s central assumption of parallel trends poses a neglected problem for causal inference: Counterfactual uncertainty, due to the inability to observe counterfactual outcomes, is hard to quantify since DID is based on parallel trends, not an as-if-randomized assumption. Hence, standard errors and ?-values are too small since they reflect only sampling uncertainty due to the inability to observe all units in a population. Recognizing this problem, scholars have recently attempted to develop inferential methods for DID under an as-if-randomized assumption. In this paper, I show that this approach is ill-suited for the most canonical DID designs and also requires conducting inference on an ill-defined estimand. I instead develop an empirical Bayes’ procedure that is able to accommodate both sampling and counterfactual uncertainty under the DIDs core identification assumption. The overall method is straightforward to implement and I apply it to a study on the effect of terrorist attacks on electoral outcomes.


  • thumnail for Leavitt_columbia_0054D_16910.pdf Leavitt_columbia_0054D_16910.pdf application/pdf 2.61 MB Download File

More About This Work

Academic Units
Political Science
Thesis Advisors
Humphreys, Macartan N.
Green, Donald P.
Ph.D., Columbia University
Published Here
October 27, 2021