Academic Commons

Theses Doctoral

Feature Selection for High Dimensional Causal Inference

Lu, Rui

Selecting an appropriate set for confounding control is essential for causal inference. The strong ignorability is a strong assumption. With observational data, researchers are unsure the strong ignorability assumption holds. To reduce the possibility of the bias caused by unmeasured confounders, one solution is to include the widest range of pre-treatment covariates, which has been demonstrated to be problematic. Subjective knowledge-based covariate screening is a common approach that has been applied widely. However, under high dimensional settings, it becomes difficult for domain experts to screen thousands of covariates. Machine learning based automatic causal estimation makes it possible for high dimensional causal estimation. While the theoretical properties of these techniques are desirable, they are only necessarily applicable asymptotically (i.e., requiring large sample sizes to be guaranteed to hold), and their performance in smaller samples is sometimes less clear. Data-based pre-processing approaches may fill this gap. Nevertheless, there is no clear guidance on when and how covariate selection should be involved in high dimensional causal estimation.

In this dissertation, I address the above issues by (a) providing a classification scheme for major causal covariate selections methods (b) extending causal covariate selection framework (c) conducting a comprehensive empirical Monte Carlo simulation study to illustrate theoretical properties of causal covariate selection and estimation methods, and (d) following-up with a case study to compare different covariate selection approaches in a real data testing ground.

Under small sample and/or high dimensional settings, study results indicate choosing an appropriate covariate selection method as pre-processing tool is necessary for causal estimation. Under relatively large sample and low dimensional settings, covariate selection is not necessary for machine learning based automatic causal estimation. Careful pre-processing guided by subjective knowledge is essential.


  • thumnail for Lu_columbia_0054D_16193.pdf Lu_columbia_0054D_16193.pdf application/pdf 3.67 MB Download File

More About This Work

Academic Units
Measurement and Evaluation
Thesis Advisors
Keller, Bryan S.
Ph.D., Columbia University
Published Here
September 21, 2020