Theses Doctoral

Systematically Mapping the Epigenetic Context Dependence of Transcription Factor Binding

Kribelbauer, Judith Franziska

At the core of gene regulatory networks are transcription factors (TFs) that recognize specific DNA sequences and target distinct gene sets. Characterizing the DNA binding specificity of all TFs is a prerequisite for understanding global gene regulatory logic, which in recent years has resulted in the development of high-throughput methods that probe TF specificity in vitro and are now routinely used to inform or interpret in vivo studies. Despite the broad success of such methods, several challenges remain, two of which are addressed in this thesis.
Genomic DNA can harbor different epigenetic marks that have the potential to alter TF binding, the most prominent being CpG methylation. Given the vast number of modified CpGs in the human genome and an increasing body of literature suggesting a link between epigenetic changes and genome instability, or the onset of disease such as cancer, methods that can characterize the sensitivity of TFs to DNA methylation are needed to mechanistically interpret its impact on gene expression. We developed a high-throughput in vitro method (EpiSELEX-seq) that probes TF binding to unmodified and modified DNA sequences in competition, resulting in high-resolution maps of TF binding preferences. We found that methylation sensitivity can vary between TFs of the the same structural family and is dependent on the position of the 5mCpG within the TF binding site. The importance of our in vitro profiling of methylation sensitivity is demonstrated by the preference of human p53 tetramers for 5mCpGs within its binding site core. This previously unknown, stabilizing effect is also detectable in p53 ChIP-seq data when comparing methylated and unmethylated sites genome-wide.
A second impediment to predicting TF binding is our limited understanding of i) how cooperative participation of a TF in different complexes can alter their binding preference, and ii) how the detailed shape of DNA aids in creating a substrate for adaptive multi-TF binding. To address these questions in detail, we studied the in vitro binding preferences of three D. melanogaster homeodomain TFs: Homothorax (Hth), Extradenticle(Exd) and one of the eight Hox proteins. In vivo, Hth occurs in two splice forms: with (HthFL) and without (HthHM) the DNA binding domain (DBD). HthHM-Exd itself is a Hox cofactor that has been shown to induce latent sequence specificity upon complex formation with Hox proteins. There are three possible complexes that can be formed, all potentially having specific target genes: HthHM-Exd-Hox, HthFL-Exd-Hox, and HthFL-Exd. We characterized the in vitro binding preferences of each of these by developing new computational approaches to analyze high-throughput SELEX-seq data. We found distinct orientation and spacing preference for HthFL-Exd-Hox, alternative recognition modes that depend on the affinity class a sequence falls into, and a strong preference for a narrow DNA minor grove near Exd's N-terminal DBD. Strikingly, this shape readout is crucial to stabilize the HthHM-Exd-Hox complex in the absence of a Hth DBD and can thus be used to distinguish HthHM from HthFL isoform binding. Mutating the amino acids responsible for the shape readout by Exd and reinserting the engineered protein into the fly genome allowed us to classify in vivo binding sites based on ChIP-seq signal comparison between “shape-mutant” and wild-type Exd.
In summary, the research presented here has investigated TF binding preferences beyond sequence context by combining novel high-throughput experimental and computational methods. This interdisciplinary approach has enabled us to study binding preferences of TF complexes with respect to the epigenetic landscape of their cognate binding sites. Our novel mechanistic insights into DNA shape readout have provided a new avenue of exploiting guided protein engineering to probe how specific TFs interact with their co-factors in a cellular context, and how flanking genomic sequence helps determine which multi-TF complexes will form and which binding mode a complex adopts.


  • thumnail for Kribelbauer_columbia_0054D_14851.pdf Kribelbauer_columbia_0054D_14851.pdf application/pdf 53.9 MB Download File

More About This Work

Academic Units
Cellular, Molecular and Biomedical Studies
Thesis Advisors
Bussemaker, Harmen J.
Ph.D., Columbia University
Published Here
August 4, 2018