Theses Doctoral

Accurate and Sensitive Quantification of Protein-DNA Binding Affinity

Rastogi, Chaitanya

Transcription factors control gene expression by binding to genomic DNA in a sequence-specific manner. Mutations in transcription factor binding sites are increasingly found to be associated with human disease, yet we currently lack robust methods to predict these sites. Here we developed a versatile maximum likelihood framework, named No Read Left Behind (NRLB), that fits a biophysical model of protein-DNA recognition to all in vitro selected DNA binding sites across the full affinity range. NRLB predicts human Max homodimer binding in near-perfect agreement with existing low-throughput measurements. The model captures the specificity of p53 tetrameric binding sites and discovers multiple binding modes in a single sample. Additionally, we confirm that newly-identified low-affinity enhancer binding sites are functional in vivo, and that their contribution to gene expression matches their predicted affinity. Our results establish a powerful paradigm for identifying protein binding sites and interpreting gene regulatory sequences in eukaryotic genomes.


  • thumnail for Rastogi_columbia_0054D_14158.pdf Rastogi_columbia_0054D_14158.pdf application/pdf 22.8 MB Download File

More About This Work

Academic Units
Applied Physics and Applied Mathematics
Thesis Advisors
Bussemaker, Harmen J.
Bienstock, Daniel
Ph.D., Columbia University
Published Here
September 12, 2017