Theses Doctoral

Computational Tools for Profiling Neural Cells via Molecular Image Data

Chen, Shuonan

A fundamental goal in neuroscience research is to comprehensively characterize cell populations within the brain in order to uncover the mechanisms governing brain states and behavior. This involves profiling specific cells through multiple biological aspects and discerning their contributions to the brain's circuitry. Characterizing cell populations at a large scale from multiple perspectives is crucial for advancing our understanding of brain function. Recently, experimental technologies for generating large datasets have become more accessible to neuroscience researchers working at different biological scales, including molecular, cellular, and functional levels.

These new technologies, including high-throughput sequencing, multiplexed spatial transcriptomics, cellular tracing, and multimodal experiments, provide us with a large number of rich datasets from which we can discern the underlying mechanisms governing the biological processes. Despite the power of these new technologies and how much information these datasets contain, such new data presents new computational challenges which prevent us from fully exploiting them to address critical biological questions. Specifically, this newly generated data differs substantially from traditional experimental data in terms of data size and captured dimensions. Traditional analytical approaches are either not applicable at all or need improvements that are specific to this type of new dataset. This in turn necessitates the development of robust and scalable analysis techniques that are specifically designed for the new data, as well as the exploration of the potential applications for these novel datasets.

This thesis introduces computational tools we developed to analyze three distinct types of complex datasets. These datasets were meticulously collected using cutting-edge experimental techniques which investigate biological phenomena at multiple levels of resolution. Our methods utilize statistical modeling, image analysis, and computer vision techniques to better analyze such data and equip researchers with scalable and robust tools for the new data they generate. The first work in this thesis presents a demixing tool to accurately decipher high-throughput spatial transcriptomics signals in order to better understand molecular diversity among neurons. In the second work we propose a blind demixing method and use carefully simulated data to assess the feasibility of using cellular barcoding technology to reconstruct neural morphology. In the final work we develop a three-dimensional volumetric image registration pipeline and a semi-automatic registration framework, in order to map neuronal functional activity information to the molecular profiles of these neurons. We extensively validate our proposed methods using both simulations and real datasets that were generated in experimental laboratories, demonstrating the robustness of our methods and highlighting their potential utilization in future high-throughput experiments.

In summary, this thesis provides three computational tools for facilitating analysis of advanced datasets at various biological levels. Addressing the computational challenges for these new datasets lays the foundation for a comprehensive understanding of cellular functions and brain functions, and the underlying mechanisms thereof. Though our methods were developed and validated using neuroscience data, we envision that these versatile tools can be seamlessly adapted and effectively applied to other fields, including but not limited to immunology and cancer research.


  • thumnail for Chen_columbia_0054D_18200.pdf Chen_columbia_0054D_18200.pdf application/pdf 7.97 MB Download File

More About This Work

Academic Units
Cellular, Molecular and Biomedical Studies
Thesis Advisors
Paninski, Liam
Ph.D., Columbia University
Published Here
November 29, 2023