Academic Commons

Theses Doctoral

Computational integration of genome-wide observational and functional data in cancer

Sanchez Garcia, Felix

The emergence of high throughput technologies is enabling the characterization of cancer genomes at unprecedented resolution and scale. However, such data suffer from the typical limitations of observational studies, which are frequently challenged by their inability to differentiate between causality and correlation. Recently, several datasets of genome-wide functional assays performed on tumor cell lines have become available. Given the ability of these assays to interrogate cancer genomes for the function of each individual gene, these data can provide vital cues to identify causal events and, with them, novel drug targets. Unfortunately, current analytical methods have been unable to overcome the challenges posed by these assays, which include poor signal to noise ratio and wide-spread off-target effects.
Given the largely orthogonal strengths and weaknesses of descriptive analysis of genetic and genomic observational data from cancer genomes and genome-wide functional screening, I hypothesized that integrating the two data types into unified computational models would significantly increase the power of the biological analysis. In this dissertation I use integrative approaches to tackle two crucial problems in cancer research: the identification of driver genes and the discovery of tumor lethalities. I use the resulting methods to study breast cancer, the second most common form of this disease.
The first part of the dissertation focuses on the analysis of regions of copy number alteration for the identification of driver genes. I first describe how a simple integrative method enabled the identification of BIN3, a novel driver of metastasis in breast cancer. I then describe Helios, an unsupervised method for the identification of driver genes in regions of SCNA that integrates different data sources into a single probabilistic score. Applying Helios to breast cancer data identified a set of candidate drivers highly enriched with known drivers (p-value < e-14). In vitro validation of 12 novel candidates predicted by Helios found 10 conferred enhanced anchorage independent growth, demonstrating Helios's exquisite sensitivity and specificity. I further provide an extensive characterization of RSF-1, a driver identified by Helios whose amplification
correlates with poor prognosis, which displayed increased tumorigenesis and metastasis in mouse models.
The second part of this dissertation addresses the problem of identifying tumor vulnerabilities using genome-wide shRNA screens across tumor cell lines. I approach this endeavor using a novel integrative method that employs different biomarkers of cellular state to facilitate the identification of clusters of hairpins with similar phenotype. When applied to breast cancer data, the method not only recapitulates the main subtypes and lethalities associated to this malignancy, but also identifies several novel putative lethalities.
Taken together, this research demonstrates the importance of the computational integration of genome-wide functional and observational data in cancer research, providing novel approaches that yield important insights into the biology of the disease.


  • thumnail for SanchezGarcia_columbia_0054D_12530.pdf SanchezGarcia_columbia_0054D_12530.pdf binary/octet-stream 17.7 MB Download File

More About This Work

Academic Units
Computer Science
Thesis Advisors
Pe'er, Dana
Ph.D., Columbia University
Published Here
February 16, 2015