Academic Commons

Theses Doctoral

Machine Learning Methods for Causal Inference with Observational Biomedical Data

Averitt, Amelia Jean

Causal inference -- the process of drawing a conclusion about the impact of an exposure on an outcome -- is foundational to biomedicine, where it is used to guide intervention. The current gold-standard approach for causal inference is randomized experimentation, such as randomized controlled trials (RCTs). Yet, randomized experiments, including RCTs, often enforce strict eligibility criteria that impede the generalizability of causal knowledge to the real world. Observational data, such as the electronic health record (EHR), is often regarded as a more representative source from which to generate causal knowledge. However, observational data is non-randomized, and therefore causal estimates from this source are susceptible to bias from confounders. This weakness complicates two central tasks of causal inference: the replication or evaluation of existing causal knowledge and the generation of new causal knowledge. In this dissertation I (i) address the feasibility of observational data to replicate existing causal knowledge and (ii) present new methods for the generation of causal knowledge with observational data, with a focus on the causal tasks of comparing an outcome between two cohorts and the estimation of attributable risks of exposures in a causal system.


  • thumnail for Averitt_columbia_0054D_16037.pdf Averitt_columbia_0054D_16037.pdf application/pdf 3.01 MB Download File

More About This Work

Academic Units
Biomedical Informatics
Thesis Advisors
Perotte, Adler J.
Ph.D., Columbia University
Published Here
July 28, 2020