Genome-wide Predictive Simulation on the Effect of Perturbation and the Cause of Phenotypic variations with Network Biology Approach
Columbia University. Electrical Engineering
Columbia University. Biomedical Informatics
Columbia University. Biochemistry and Molecular Biophysics
Columbia University. Electrical Engineering
Thanks to modern high-throughput technologies such as microarray-based gene expression profiling, a large amount of molecular profile data have been generated in several disease related contexts. Despite the fact that these data likely contain systems-level information about disease regulation, revealing the underlying dynamics between genes and mechanisms of gene regulation in genome wide way remains a major challenge. Understanding these mechanisms in genome-wide fashion and the resulting dynamical behavior is a key goal of the nascent field of systems biology. One approach to dissect the logic of the cell, is to use reverse engineering algorithms that infer regulatory interactions form molecular profile data. In this context, use of information theoretic approaches has been very successful: for instance, the ARACNe algorithm has been able to successfully infer transcriptional interactions between transcription factors and their target genes; similarly, the MINDy algorithm has identified post-translational modulators of transcription factor activity by multivariate analysis of large gene expression profile datasets. Many methods have been proposed to improve ARACNe both from a computational efficiency perspective and in terms of increasing the accuracy of the predicted interactions. Yet, the main core of ARACNe, i.e., the data processing inequality (DPI), has remained virtually unaffected even though modern information theory has extended the DPI theorem into higher-order interactions. First, we introduce an improvement of ARACNe, hARACNe, which recursively applies a higher-order DPI analysis. We show that the new algorithm successfully detects false positive feed-forward loops involving more than three genes. Second, we extend the MINDy algorithm using co-information as a novel metric, thus replacing the conditional mutual information and significantly improving the algorithm’s predictions. Largely, two ultimate goals of systems perturbation studies are to reveal how human diseases are connected with the genes, and to find regulatory mechanism that determine disease cell behavior. However, these goals remain daunting: even the most talented researchers still have to rely on laborious genetic screens and very simplified hypotheses about effects of given perturbation have been experimentally validated and roughly analyzed with very limited regulatory sub-network such as pathway. To overcome these limitations, use of gene regulatory network is explored in this thesis research. Specifically, we propose creation of a new algorithm that can accurately predict cell state in genome-wide fashion following perturbation of individual genes, such as from silencing or ectopic expression experiments. Furthermore, experimentally validated methods to predict genome-wide changes in a cellular system following a genetic perturbation (e.g., gene silencing or ectopic expression) are still unavailable, and even though phenotypic variations are experimentally profiled and gene signatures are selected by being statistically tested, finding the exact regulator which systematically causes significant variations of gene signature is still quite challenging. In this research, I introduce and experimentally validate a probabilistic Bayesian method to simulate the propagation of genetic perturbations on integrated gene regulatory networks inferred by the hARACNe and coMINDy algorithms from human B cell data. With the same predictive framework, we also computationally predict the master driver (regulator) that is most likely to have produced the observed variations in gene expression levels; these studies as a systematized pre-screening process before genetic manipulation. I predict in silico the effect of silencing of several genes as well as the cause of phenotypic variations. Performance analysis, tested by Gene Set Enrichment Analysis (GSEA), shows that the new methods are highly predictive, thus providing an initial step toward building predictive probabilistic regulatory models, which may be applicable as pre-screening steps in perturbation studies.
Ph.D., Columbia University.
2012-04-19 13:44:47 -0400
2012-04-19 14:04:21 -0400