Theses Doctoral

Statistical Methods for Learning Patients Heterogeneity and Treatment Effects to Achieve Precision Medicine

Xu, Tianchen

The burgeoning adoption of modern technologies provides a great opportunity for gathering multiple modalities of comprehensive personalized data on individuals. The thesis aims to address statistical challenges in analyzing these data, including patient-specific biomarkers, digital phenotypes and clinical data available from the electronic health records (EHRs) linked with other data sources to achieve precision medicine. The first part of the thesis introduces a dimension reduction method of microbiome data to facilitate subsequent analysis such as regression and clustering. We adopt the proposed zero-inflated Poisson factor analysis (ZIPFA) model on the Oral Infections, Glucose Intolerance and Insulin Resistance Study (ORIGINS) and provide valuable insights into the relation between subgingival microbiome and periodontal disease.

The second part focuses on modeling the intensive longitudinal digital phenotypes collected by mobile devices. We develop a method based on a generalized state-space model to estimate the latent process of patient's health status. The application to the Mobile Parkinson's Observatory for Worldwide Evidence-based Research (mPower) data reveals the low-rank structure of digital phenotypes and infers the short-term and long-term Levodopa treatment effect.

The third part proposes a self-matched learning method to learn individualized treatment rule (ITR) from longitudinal EHR data. The medical history data in EHRs provide the opportunity to alleviate unmeasured time-invariant confounding by matching different periods of treatments within the same patient (self-controlled matching). We estimate the ITR for type 2 diabetes patients for reducing the risk of diabetes-related complications using the EHRs data from New York Presbyterian (NYP) hospital. Furthermore, we include an additional example of self-controlled case series (SCCS) study on the side effect of stimulants. Significant associations between the use of stimulants and mortality are found from both FDA Adverse Event Reporting System and the SCCS study, but the latter uses a much smaller sample size which suggests high efficiency of the SCCS design.


  • thumnail for Xu_columbia_0054D_17422.pdf Xu_columbia_0054D_17422.pdf application/pdf 5.1 MB Download File

More About This Work

Academic Units
Thesis Advisors
Wang, Yuanjia
Li, Gen
Ph.D., Columbia University
Published Here
August 17, 2022