Theses Doctoral

Causal machine learning for reliable real-world evidence generation in healthcare

Zhang, Linying

Real-world evidence (RWE) plays a crucial role in understanding the impact of medical interventions and uncovering disparities in clinical practice. However, confounding bias, especially unmeasured confounding, poses challenges to inferring causal relationships from observational data, such as estimating treatment effects and treatment responses. Various methods have been developed to reduce confounding bias, including methods specific for detecting and adjusting for unmeasured confounding. However, these methods typically rely on assumptions that are either untestable or too strong to hold in practice. Some methods also require domain knowledge that is rarely available in medicine. Despite recent advances in method development, the challenge of unmeasured confounding in observational studies persists.

This dissertation provides insights in adjusting for unmeasured confounding by exploiting correlations within electronic health records (EHRs). In Aim 1, we demonstrate a novel use of probabilistic model for inferring unmeasured confounders from drug co-prescription pattern. In Aim 2, we provide theoretical justifications and empirical evidence that adjusting for all (pre-treatment) covariates without explicitly selecting for confounders, as implemented in the large-scale propensity score (LSPS) method, offers a more robust approach to mitigating unmeasured confounding.

In Aim 3, we shift focus to the problem of evaluating fairness of treatment allocation in clinical practice from a causal perspective. We develop a causal fairness algorithm for assessing treatment allocation. By applying this fairness analysis method to a cohort of patients with coronary artery disease from EHR data, we uncover disparities in treatment allocation based on gender and race, highlighting the importance of addressing fairness concerns in clinical practice. Furthermore, we demonstrate that social determinants of health, variables that are often unavailable in EHR databases and are potential unmeasured confounders, do not significantly impact the estimation of treatment responses when conditioned on clinical features from EHR data, shedding light on the intricate relationship between EHR features and social determinants of health.

Collectively, this dissertation contributes valuable insights into addressing unmeasured confounding in the context of evidence generation from EHRs. These findings have significant implications for improving the reliability of observational studies and promoting equitable healthcare practices.


  • thumnail for Zhang_columbia_0054D_18043.pdf Zhang_columbia_0054D_18043.pdf application/pdf 1.35 MB Download File

More About This Work

Academic Units
Biomedical Informatics
Thesis Advisors
Hripcsak, George M.
Ph.D., Columbia University
Published Here
August 9, 2023