Theses Doctoral

Inferring Transcriptional and Post-Transcriptional Network Structure by Exploiting Natural Sequence Variation

Fazlollahi, Mina

Understanding how cellular processes of an organism translate its genome into its phenotype is one of the grand challenges in biology. Linkage studies seek to identify allelic variants that manifest themselves as phenotypic variation between individuals in a population. The advent of high-throughput genotyping and gene expression profiling technologies has made it possible to use messenger RNA levels as quantitative traits in linkage studies. This has created new opportunities to study genetic variation at the level of gene regulatory networks rather than individual genes.

This thesis consists of four parts, each of which outlines a different strategy for integrating genome-wide expression data and genotype data in order to identify transcriptional and post-transcriptional regulatory mechanisms. The data for these analyses comes from segregating populations of Saccharomyces cerevisiae (baker’s yeast) as well as Caenorhabditis elegans (roundworm).

The first study focused on inferring the in vitro binding specificity of RNA-binding proteins (RBPs). We first analyzed a recent compendium of in vivo mRNA binding data to model the sequence specificity of 45 yeast RBPs in the form of a position- specific affinity matrix (PSAM). We were able to recover known consensus nucleotide sequences for 12 RBPs and discovered novel binding preferences for 3 of the RBPs namely, Scp160p, Sik1p and Tdh3p.

The second study aimed to identify transacting chromosomal loci that regulate expression of a large number of genes. Traditionally, such loci are discovered by first mapping expression quantitative loci (eQTLs) for individual genes, and then looking for so-called “eQTLs hotspots”. Our method avoids the first step by integrating information across all genes, leading to a more elegant method that has increased statistical power. For yeast, we recovered 70% of the reported eQTL hotspots from two independent studies, and discovered a new transacting locus on chromosome V. For worm, we detected six transacting loci, only two of which were previously reported as eQTL hotspots.

The third study focused on post-transcriptional regulatory networks in yeast, by mapping the regulatory activity level of RNA binding proteins (RBPs) as a quantitative trait in so-called “aQTL” analysis. We used the collection of 15 sequence motifs with the associated mRNA region combinations that we obtained in our first study together with mRNA expression data to estimate RBP activities across yeast segregants. Consistent with a previous study, we recovered the MKT1 locus on chromosome XIV as a genetic modulator of Puf3p activity. We also discovered that Puf3p activity is modulated through distinct loci depending on whether it is binding to 50 or 30 untranslated region (UTR) of its target mRNAs.

Furthermore, we identified a locus on chromosome XV that includes the IRA2 gene as a putative aQTL for Puf4p; this prediction was validated using expression data for an IRA2 allele replacement strain. Our fourth study focused on the detection of loci whose allelic variation modulates the in vivo regulatory connectivity between a transcription factor and its target genes. We call these loci connectivity QTLs or “cQTLs”. We mapped the DIG2 locus on chromosome IV as a cQTL for the transcription factor Ste12p. Dig2p is indeed a known inhibitor of yeast mating response activator Ste12p. The coding region of the DIG2 gene contains a single non-synonymous mutation (T83I). We are experimentally testing the functional impact of this mutation in allele replacement strains. We also identified the TAF13 locus as a putative modulator of GCN4p connectivity.


  • thumnail for FAZLOLLAHI_columbia_0054D_11301.pdf FAZLOLLAHI_columbia_0054D_11301.pdf application/pdf 17.6 MB Download File

More About This Work

Academic Units
Thesis Advisors
Bussemaker, Harmen J.
Marka, Szabolcs
Ph.D., Columbia University
Published Here
May 14, 2013