2017 Theses Doctoral
Accurate and Sensitive Quantification of Protein-DNA Binding Affinity
Transcription factors control gene expression by binding to genomic DNA in a sequence-specific manner. Mutations in transcription factor binding sites are increasingly found to be associated with human disease, yet we currently lack robust methods to predict these sites. Here we developed a versatile maximum likelihood framework, named No Read Left Behind (NRLB), that fits a biophysical model of protein-DNA recognition to all in vitro selected DNA binding sites across the full affinity range. NRLB predicts human Max homodimer binding in near-perfect agreement with existing low-throughput measurements. The model captures the specificity of p53 tetrameric binding sites and discovers multiple binding modes in a single sample. Additionally, we confirm that newly-identified low-affinity enhancer binding sites are functional in vivo, and that their contribution to gene expression matches their predicted affinity. Our results establish a powerful paradigm for identifying protein binding sites and interpreting gene regulatory sequences in eukaryotic genomes.
This item is currently under embargo. It will be available starting 2019-08-22.
- Academic Units
- Applied Physics and Applied Mathematics
- Thesis Advisors
- Bussemaker, Harmen J.
- Bienstock, Daniel
- Ph.D., Columbia University