Theses Doctoral

Multimodal Data Analysis using Latent Variables with Applications in Psychiatry and Neuroscience

Gacheru, Margaret

This dissertation focuses on utilizing latent variable techniques to help integrate and extract information from multiple modalities. With the advancement of technology and the ability to collect different types of data on a single individual, developing multimodal data fusion methods is an important area of research. We focus on approaches that establish linear associations between and within modalities, specifically Canonical Correlation Analysis and formative modeling.

First, we propose a unified Bayesian Longitudinal Canonical Correlation Analysis (BLCCA) model that can incorporate data from multiple time points and adjust for covariates. Canonical Correlation Analysis (CCA) is a multivariate technique that can help establish relationships among multiple modalities. The availability of longitudinal data offers an opportunity to uncover dynamic relationships, such as how changes in one set of outcomes correspond to changes in another set over time. Existing probabilistic and Bayesian ideas serve as the foundation for BLCCA. To start off, the model obtains the linear trajectories for each individual from a random intercept and slope model. Then the linear trajectories are decomposed into shared and modality-specific components. To select the best model for the observed data, variational approximation is utilized. Simulation experiments show that BLCCA correctly uncovers the true dimension of the latent space, meaning that it can identify and discriminate between the shared patterns and distinctive features. Furthermore, in comparison to existing longitudinal variants of CCA, the proposed method performs well in small sample size settings and can handle unequal time gaps, temporal misalignment between two modalities, and missing values. Applying BLCCA to the Alzheimer’s Disease Neuroimaging Initiative (ADNI) data reveals a longitudinal relationship between elevated levels of amyloid and brain atrophy.

Next, we incorporate binary variables in Bayesian CCA (BCCA) and Bayesian Longitudinal CCA (BLCCA) while adjusting for covariates. While BCCA and its variants have proven effective for examining relationships between sets of variables, they are primarily designed for continuous variables and struggle to accommodate non-gaussian variables. In psychiatric research, discrete outcomes are common and often used to capture the presence or varying degrees of a certain behavior/condition. Therefore, we present a novel set of CCA methods in the Bayesian framework that can analyze binary-continuous or binary-binary multivariate outcomes for cross-sectional/longitudinal data settings, while incorporating an additive covariate term. The formulation for the binary outcomes is presented in two ways: approximating the Bernoulli likelihood by a Gaussian with spherical covariance and the underlying variable approach, which assumes that an observed binary variable is created by dichotomizing a latent continuous variable at a specific cut-off. For both binary approaches, simulation results reveal that the BCCA and BLCCA models for binary-continuous data can correctly identify the correct number of total canonical components as well as recover the shared and non-shared latent components as sample size increases. Moreover, performing analysis with the ADNI data reveals an inverse relationship between certain psychopathology symptoms and glucose metabolism.

Finally, we operationalize the construct of cognitive reserve using a formative framework in order to leverage information from multiple modalities. Cognitive reserve (CR) is defined as the individual differences in how people process tasks that allow some to cope better than others with brain pathology. Overall, there is no consensus on how to measure cognitive reserve, as there is a wide range of CR proxies used in analyses. At times, reflective models have been used to develop a summary construct of the CR proxies. However, reflective models may not be appropriate since the resulting latent measure captures the commonality between indicators and discards what is unique to each individual proxy.

Therefore, this study aims to determine a construct of CR that integrates information from lifetime exposures and neuroimaging data using a formative model. Due to the nature of the lifetime exposures in the study, a hierarchical reflective-formative model is chosen. From the model, we extract a construct of CR that is a composite of various lifetime exposures and strongly associated with brain derived measures. Since this unique formative construct is a composite of various lifetime exposures, it can be utilized in research as a more accurate estimate of CR. Using data from the Reference Ability Neural Network (RANN) and Cognitive Reserve (CR) study. we find that the derived CR construct is strongly related to IQ, a commonly used proxy. Furthermore, there is a significant additive effect of CR on cognitive decline in the speed processing domain, accounting for brain morphometry.

Files

This item is currently under embargo. It will be available starting 2027-07-08.

More About This Work

Academic Units
Biostatistics
Thesis Advisors
Lee, Seonjoo
Wall, Melanie M.
Degree
Ph.D., Columbia University
Published Here
August 20, 2025