2025 Theses Doctoral
Scalable Optimization Methods for Causal Inference and Discovery
In this thesis, we want to develop methods for making data-driven predictions about how a unit would respond to counterfactual interventions. Such predictions are relevant in a variety of data-rich settings. To name a few: What will be the health outcome for a patient if she is administered a prescribed set of treatments, given her sex and age? What will be the future state of a country’s economy if a new tax is introduced, given the country’s current production output? What will be a company’s future profits if a new promotional discount is introduced, given we know the demand for its products? Questions of this form are known as counterfactual queries. In this thesis, we focus on two fundamental challenges critical to answering such queries: causal inference and causal discovery. We assume we only have access to historical, observational data from other units. This assumption is a reasonable one, since such data is readily available in several settings, and performing interventions on units is often expensive. The prime focus of our methods throughout this dissertation is scalability to high-dimensional datasets and systems of variables with high complexity. In particular, we offer significant advances in efficiency and compute compared to state of the art methods.
In Chapters 2 and 3, we focus on causal inference, where we make predictions about how a new unit would respond to interventions using observational data from other units. In these chapters, we assume that the causal mechanisms which govern the system of variables which characterize a unit are known. However, the key challenge in this setting is the presence of unobserved confounders, which are unmeasured variables that create spurious correlations between measured variables in the dataset and can negatively impact data-driven decision making. Hence, it is imperative that we account for such confounders. In these chapters, we develop efficient algorithms for computing bounds for counterfactual queries that account for such confounders. We show that our methods provide significant runtime improvement compared to benchmarks in numerical experiments and allow us to compute bounds for significantly larger causal inference problems as compared to what is possible using existing techniques.
In Chapter 4, we focus on causal discovery from observational data, where the causal mechanisms which govern a system of variables are assumed to be unknown. In fact, our challenge in this setting is to learn these mechanisms. We propose an optimization algorithm for causal model learning which computes high quality solutions significantly faster than the state of the art for high-dimensional datasets, without graph-size specific hyperparameter tuning.
Subjects
Files
-
Shridharan_columbia_0054D_19298.pdf
application/pdf
860 KB
Download File
More About This Work
- Academic Units
- Industrial Engineering and Operations Research
- Thesis Advisors
- Iyengar, Garud N.
- Degree
- Ph.D., Columbia University
- Published Here
- August 6, 2025