2017 Theses Doctoral
Systems biology approaches to precision medicine
This dissertation reviews the development and implementation of two systems biology meth- ods: ADVOCATE and hpARACNE. ADVOCATE was designed to deconvolve epithelium and stroma compartments fractions and virtual expression profiles from bulk gene expression profiles from human patients. We used laser capture microdissection and RNA sequencing to disentangle the transcriptional programs active in the malignant epithelium and stroma of pancreatic ductal adenocarcinoma (PDA), an aggressive malignancy with a prominent stromal component. We learned that distinct molecular subtypes are present in both the epithelium and the stroma of pancreatic cancer, and that the subtype identity of these two compartments are independent of one another. Critically, we discovered that specific com- binations of epithelial and stromal subtypes are strongly associated with patient survival across multiple external datasets, exhibiting both an effect-size and a level of reproducibility that was absent from previous efforts. These analyses were made possible by a new proba- bilistic algorithm (Adaptive DeconVolution Of CAncer Tissue Expression - ADVOCATE) that can extract compartment-specific gene expression profiles from bulk gene expression data. ADVOCATE accurately predicted the compartment fractions of bulk tumor samples and improved the performance of molecular classifiers by controlling for the diverse cellular compositions of independent datasets. This approach provides a much-needed framework to handle solid tumor tissue heterogeneity, allowing integrated analysis of both epithelial and stromal transcriptional programs from individual bulk samples.
Reverse engineering approaches have been used to systematically dissect regulatory in- teractions based on gene expression profiles in different context and data types, thus im- proving our mechanistic understanding of molecular programs under perturbations. Pro- teomics data, on the other hand, provides direct evidence of cell functions. Particularly,
signaling molecules are best candidates for drug targets. Previous efforts have shown that targeting signaling proteins could potentially lead to cancer remission. In this work, I introduce hybrid proteomics Algorithm for the Reconstruction of Accurate Cellular Network (hpARACNE), a re-design of gene expression based ARACNE algorithm. Us- ing Clinical Proteomics Tumor Analysis Consortium (CPTAC) breast cancer proteomics data, hpARACNE reconstructs a network that significantly outperforms ARACNE when compared with curated Kinase/Phosphatase-substrates interactions from public databases. Compared with Stable Isotope Labeling with Amino acid in Cell Culture (SILAC) ex- perimentally identified substrates for EGFR, hpARACNE predicts substrates with high accuracy. Integrative network analysis of breast cancer transcriptome and phosphopro- teome reveals potential drug targets for Triple Negative Breast Cancer (TNBC) treat- ment. hpARACNE has three innovations that adapt it to proteomics data and signaling process: 1) Refinement of the kinase/phosphatase peptides by integrating matched whole proteomic and whole phosphoproteomic profiles; 2) Establishment of association based on newly designed Mutual Information (MI) estimator for missing data; 3) Network pruning using directional Data Processing Inequality (dDPI) for signalling process.
Files
- He_columbia_0054D_14206.pdf application/pdf 15.3 MB Download File
More About This Work
- Academic Units
- Biomedical Informatics
- Thesis Advisors
- Califano, Andrea
- Degree
- Ph.D., Columbia University
- Published Here
- October 3, 2017