Theses Doctoral

Statistical Methods for Integrated Cancer Genomic Data Using a Joint Latent Variable Model

Drill, Esther

Inspired by the TCGA (The Cancer Genome Atlas), we explore multimodal genomic datasets with integrative methods using a joint latent variable approach. We use iCluster+, an existing clustering method for integrative data, to identify potential subtypes within TCGA sarcoma and mesothelioma tumors, and across a large cohort of 33 dierent TCGA cancer datasets. For classication, motivated to improve the prediction of platinum resistance in high grade serous ovarian cancer (HGSOC) treatment, we propose novel integrative methods, iClassify to perform classication using a joint latent variable model. iClassify provides eective data integration and classication while handling heterogeneous data types, while providing a natural framework to incorporate covariate risk factors and examine genomic driver by covariate risk factor interaction. Feature selection is performed through a thresholding parameter that combines both latent variable and feature coecients. We demonstrate increased accuracy in classication over methods that assume homogeneous data type, such as linear discriminant analysis and penalized logistic regression, and improved feature selection. We apply iClassify to a TCGA cohort of HGSOC patients with three types of genomic data and platinum response data. This methodology has broad applications beyond predicting treatment outcomes and disease progression in cancer, including predicting prognosis and diagnosis in other diseases with major public health implications.


  • thumnail for Drill_cumc.columbia_0054E_10044.pdf Drill_cumc.columbia_0054E_10044.pdf application/pdf 8.14 MB Download File

More About This Work

Academic Units
Thesis Advisors
Wang, Yuanjia
Shen, Ronglai
Dr.P.H., Mailman School of Public Health, Columbia University
Published Here
July 28, 2018