Theses Doctoral

Computational Algorithms for Multi-omics and Electronic Health Records Data

Guo, Jia

Real world data have enhanced healthcare research, improving our understanding of disease progression, aiding in diagnosis, and enabling the development of personalized and targeted treatments. In recent years, multi-omics data and electronic health record (EHR) data have become increasingly available, providing researchers with a wealth of information to analyze. The use of machine learning methods with EHR and multi-omics data has emerged as a promising approach to extract valuable insights from these complex data sources. This dissertation focuses on the development of supervised and unsupervised learning methods, as well as their applications to EHR and multi-omics data, with a particular emphasis on early detection of clinical outcomes and identification of novel cancer subtypes.

The first part of the dissertation centers on developing a risk prediction tool using EHR data that enables disease early detection so that preventive treatments can be taken to better manage the disease. For this goal, we developed a similarity-based supervised learning method with two applications to predict end-stage kidney disease (ESKD) and aortic stenosis (AS). In the second part of the dissertation, we expanded our goal to a phenome-wide prediction task and developed a patient representation based deep learning method that is able to predict phenotypes across the phenome. Through a weighting scheme, this approach is conducting tailored disease phenotype prediction computationally efficiently with good prediction performance. In the final part of the dissertation, I shifted the focus with the goal to identify clinical meaningful novel disease subtypes with unsupervised learning methods using multi-omics data. We tackled this goal through integrating multiple patient graphs being generated from multiple omics data with molecular level features for an improved disease subtyping.

This dissertation has significantly contributed to the development of data-driven approaches to healthcare and biomedical research using EHR data and multi-omics data. The new methodologies developed with applications in multiple diseases using EHR and multi-omics data advanced our knowledge in disease diagnosis, vulnerable groups identification, and ultimately improve patient care.


  • thumnail for Guo_columbia_0054D_17971.pdf Guo_columbia_0054D_17971.pdf application/pdf 1.66 MB Download File

More About This Work

Academic Units
Thesis Advisors
Wang, Shuang
Ph.D., Columbia University
Published Here
July 5, 2023