Theses Doctoral

Machine learning and Bayesian modeling for quantitative biological imaging: from single molecule localization microscopy to mass spectrometry imaging

Hammer, Joseph Lane

The aim of this work is to establish and demonstrate unbiased, reproducible, and robust approaches to analyzing large and complex data from biological images. With advances in imaging techniques, more comprehensive and detailed data is produced than ever before. Extracting meaning from this data requires careful consideration to avoid overfitting while also maximizing the information and insights that can be gained from the data. Herein, we present two forms of complex and challenging biological imaging data and propose approaches that can be used to extract key information from this data. By leveraging machine learning and Bayesian modeling, we demonstrate robust analysis pipelines for single molecule localization microscopy (SMLM) and mass spectrometry imaging (MSI) data.

In Chapter 1, we present background on the development of modern imaging techniques and discuss in detail the development of SMLM and MSI along with common approaches used to analyze the data produced by these methods.

In Chapter 2, we address a common problem in SMLM and clustering algorithms broadly by proposing a Bayesian optimized approach for selecting density-based clustering algorithm parameters in an unbiased manner that maximizes the density-based cluster validation (DBCV) score. Here, we developed a high-speed implementation of DBCV using a k-dimensional tree and paired it with Bayesian optimization to evaluate a range of clustering parameters, finding the parameters which maximizes the DBCV score. We demonstrate this method (DBOpt) on simulated and experimental data and show the efficiency and effectiveness of the approach.

In Chapter 3, we present an analysis pipeline for extracting key analytes from MSI data. In particular, we analyze lipid abundance and spatial distribution of lipids in MDA-MB-231 breast cancer cells that have an oncogenic mutant-p53 protein compared to those where the protein is knocked down. Here, we employ supervised learning with a support vector machine to segment the data into regions of interest that correspond to cell invasion. We then leverage a Bayesian hierarchical model to quantify the probabilities that analytes are impacted by the knockdown of mutant-p53 and we identify the spatially dependent role of p53 in modulating key phospholipids and metabolites across the sample.

Files

This item is currently under embargo. It will be available starting 2027-06-26.

More About This Work

Academic Units
Chemistry
Thesis Advisors
Kaufman, Laura J.
Degree
Ph.D., Columbia University
Published Here
August 6, 2025