2025 Theses Doctoral
Machine learning methods for characterizing the impact of genetic alterations on tumor heterogeneity through multi-modal data integration
Tumor heterogeneity remains one of the most pressing obstacles in the development of effective cancer therapeutics. Often driven by genetic mutations, heterogeneity may present as diverse responses to therapy across patients, as well as the presence of multiple lineages of malignant cells within a tumor. The emergence of state-of-the-art single-cell and spatial genomic technologies in conjunction with traditional computational analyses has allowed for the characterization of diverse cell states and temporal dynamics with unprecedented resolution. However, there remains a distinct lack in computational methods designed for integrative data analyses, that examine multiple patients or multiple modalities of data in order to paint a more complex picture of the dysregulated mechanisms occurring as a result of these mutations.
This dissertation focuses on the development of novel machine learning methods whose goal is to dissect the progression of cancer and other diseases through the integration of data across samples and across modalities. We first leverage a novel single-nucleus sequencing method in order to construct large atlases of data for rare cancer subtypes as well as banked clinical specimens, with the inclusion of scRNA-seq, TCR-seq, WGS, and spatial sequencing data. From these data, we identify the putative role of copy number alterations (CNAs) in driving heterogeneous patient response to therapy, emphasizing that immune resistance mechanisms possibly arising from these CNAs may be impacting immune infiltration into the tumor microenvironment (TME).
We then present a novel Bayesian hierarchical model, Echidna, that aims to uncover the mechanistic relationship between CNAs and heterogeneity in tumor phenotype, in settings such as resistance to immunotherapy. Echidna’s framework relies on the integration of scRNA-seq and WGS, leveraging the resolution of the former to aid the deconvolution and inference of the latter. At the same time, Echidna strives towards a new way of understanding the relationship between transcription and CNAs, paving the way for new perspectives towards identifying the CNAs with functional relevance. We apply Echidna to a large cohort of melanoma patients undergoing immune checkpoint blockade therapy and demonstrate that intrinsic drivers of tumor expansion phenotypes are shared across patients, suggesting putative therapeutic and diagnostic targets.
We are also interested in the effect of genetic mutations at the cellular level--specifically, how early mutations in driver genes may lead to derailed trajectories of disease progression. Towards this end, we present Decipher: a deep generative model that integrates data across disease contexts in order to align trajectories of cells and identify the specific derailments resulting from mutation. Decipher offers two interpretable latent spaces: a medium-dimensional z-space that captures specific cell-state transitions, and a low-dimensional v-space that may be directly used for visualization with greater faithfulness than common dimensionality reduction tools such as tSNE and UMAP. We demonstrate that when applied to NPM1-mutated AML, and Kras and p53-mutated PDAC, Decipher reveals novel insights into disease progression, highlighting key pathways and processes that may be disrupted as a result of the mutation.
The methods described in this dissertation together illustrate the depth of biological insight that may be derived from considering multiple modalities of data. We demonstrate the ability to map phenotypic effects to putative genomic drivers, as well as characterize the effect of mutations on cell state transitions--with the ultimate goal of better understanding the genomic drivers of cancer.
Subjects
Files
-
Fan_columbia_0054D_19492.pdf
application/pdf
9.6 MB
Download File
More About This Work
- Academic Units
- Biomedical Engineering
- Thesis Advisors
- Azizi, Elham
- Degree
- Ph.D., Columbia University
- Published Here
- October 15, 2025