Theses Doctoral

Representation Learning for Genome Interpretation

Khan, Raiyan Rashid

The arrival of genome sequencing technologies has triggered a transformative surge in genomic data, revolutionizing the ability to study the genome's role in biological processes. Despite this progress, understanding how genomic elements work together to sustain life remains a critical challenge. In this work, we develop computational frameworks that bridge gaps in current genome modeling approaches by integrating biologically meaningful inductive biases.

First, we reformulate genome-wide association studies to account for nonlinear modes of variant interaction. We supplement this analysis by building polygenic risk scores to explore the interplay between genetics and environmental factors contributing to phenotype variation.

The second part of this thesis shifts focus to deep learning approaches for improved sequence modeling. We present a novel application of hyperbolic convolutional neural networks to exploit the evolutionarily-informed structure of sequence data, enabling more expressive DNA sequence representations. We demonstrate how our class of models discern phylogenetic structure amidst noisy signals, and further motivate our work by constructing an empirical method for interpreting the hyperbolicity of dataset embeddings.

Next, we leverage state space models to study an instance of long range genome interaction in the form of topologically associating domains. Our framework accurately recapitulates known patterns of chromatin organization, providing insights into epigenetic regulation. Altogether, these methods contribute to the development of advanced computational methods that align with the challenges of genomic sequence modeling and pave the way for a more comprehensive understanding of genome organization and function.

Files

This item is currently under embargo. It will be available starting 2027-01-15.

More About This Work

Academic Units
Computer Science
Thesis Advisors
Pe'er, Itshack G.
Degree
Ph.D., Columbia University
Published Here
January 15, 2025