Theses Doctoral

Systems biology approaches to precision medicine

He, Jing

This dissertation reviews the development and implementation of two systems biology meth- ods: ADVOCATE and hpARACNE. ADVOCATE was designed to deconvolve epithelium and stroma compartments fractions and virtual expression profiles from bulk gene expression profiles from human patients. We used laser capture microdissection and RNA sequencing to disentangle the transcriptional programs active in the malignant epithelium and stroma of pancreatic ductal adenocarcinoma (PDA), an aggressive malignancy with a prominent stromal component. We learned that distinct molecular subtypes are present in both the epithelium and the stroma of pancreatic cancer, and that the subtype identity of these two compartments are independent of one another. Critically, we discovered that specific com- binations of epithelial and stromal subtypes are strongly associated with patient survival across multiple external datasets, exhibiting both an effect-size and a level of reproducibility that was absent from previous efforts. These analyses were made possible by a new proba- bilistic algorithm (Adaptive DeconVolution Of CAncer Tissue Expression - ADVOCATE) that can extract compartment-specific gene expression profiles from bulk gene expression data. ADVOCATE accurately predicted the compartment fractions of bulk tumor samples and improved the performance of molecular classifiers by controlling for the diverse cellular compositions of independent datasets. This approach provides a much-needed framework to handle solid tumor tissue heterogeneity, allowing integrated analysis of both epithelial and stromal transcriptional programs from individual bulk samples.
Reverse engineering approaches have been used to systematically dissect regulatory in- teractions based on gene expression profiles in different context and data types, thus im- proving our mechanistic understanding of molecular programs under perturbations. Pro- teomics data, on the other hand, provides direct evidence of cell functions. Particularly,
signaling molecules are best candidates for drug targets. Previous efforts have shown that targeting signaling proteins could potentially lead to cancer remission. In this work, I introduce hybrid proteomics Algorithm for the Reconstruction of Accurate Cellular Network (hpARACNE), a re-design of gene expression based ARACNE algorithm. Us- ing Clinical Proteomics Tumor Analysis Consortium (CPTAC) breast cancer proteomics data, hpARACNE reconstructs a network that significantly outperforms ARACNE when compared with curated Kinase/Phosphatase-substrates interactions from public databases. Compared with Stable Isotope Labeling with Amino acid in Cell Culture (SILAC) ex- perimentally identified substrates for EGFR, hpARACNE predicts substrates with high accuracy. Integrative network analysis of breast cancer transcriptome and phosphopro- teome reveals potential drug targets for Triple Negative Breast Cancer (TNBC) treat- ment. hpARACNE has three innovations that adapt it to proteomics data and signaling process: 1) Refinement of the kinase/phosphatase peptides by integrating matched whole proteomic and whole phosphoproteomic profiles; 2) Establishment of association based on newly designed Mutual Information (MI) estimator for missing data; 3) Network pruning using directional Data Processing Inequality (dDPI) for signalling process.


  • thumnail for He_columbia_0054D_14206.pdf He_columbia_0054D_14206.pdf application/pdf 45.7 MB Download File

More About This Work

Academic Units
Biomedical Informatics
Thesis Advisors
Califano, Andrea
Ph.D., Columbia University
Published Here
October 3, 2017