2018 Theses Doctoral
Statistical Methods for Integrated Cancer Genomic Data Using a Joint Latent Variable Model
Inspired by the TCGA (The Cancer Genome Atlas), we explore multimodal genomic datasets with integrative methods using a joint latent variable approach. We use iCluster+, an existing clustering method for integrative data, to identify potential subtypes within TCGA sarcoma and mesothelioma tumors, and across a large cohort of 33 dierent TCGA cancer datasets. For classication, motivated to improve the prediction of platinum resistance in high grade serous ovarian cancer (HGSOC) treatment, we propose novel integrative methods, iClassify to perform classication using a joint latent variable model. iClassify provides eective data integration and classication while handling heterogeneous data types, while providing a natural framework to incorporate covariate risk factors and examine genomic driver by covariate risk factor interaction. Feature selection is performed through a thresholding parameter that combines both latent variable and feature coecients. We demonstrate increased accuracy in classication over methods that assume homogeneous data type, such as linear discriminant analysis and penalized logistic regression, and improved feature selection. We apply iClassify to a TCGA cohort of HGSOC patients with three types of genomic data and platinum response data. This methodology has broad applications beyond predicting treatment outcomes and disease progression in cancer, including predicting prognosis and diagnosis in other diseases with major public health implications.
Files
- Drill_cumc.columbia_0054E_10044.pdf application/pdf 8.14 MB Download File
More About This Work
- Academic Units
- Biostatistics
- Thesis Advisors
- Wang, Yuanjia
- Shen, Ronglai
- Degree
- Dr.P.H., Mailman School of Public Health, Columbia University
- Published Here
- July 28, 2018