2025 Theses Doctoral
Statistical Methods for High-dimensional Neuroimaging Data Analysis
Neuroimaging data, often high-dimensional and collected across multiple imaging modalities, is a valuable tool for studying the underlying mechanisms of how the human brain structures, functions, and thus impacts cognition. This dissertation aims to address the challenges of analyzing high-dimensional neuroimaging data, such as the missing data issue in multimodal fusion, the preservation of underlying hierarchical structure between mediators and exposure-by-mediator interactions in model selection with high-dimensional potential mediators, and the false discovery rate control for mediator selection from a high-dimensional candidate set.
The first part of this dissertation aims to address the commonly occurring missing data issue during multimodal fusion. Recent advances in multimodal imaging acquisition techniques have allowed us to measure different aspects of brain structure and function. Multimodal fusion, such as linked independent component analysis (LICA), is a popular approach to integrate complementary information. However, these methods are severely limited by the common occurrence of missing data in brain imaging. In the first chapter, we propose a Full Information LICA algorithm (FI-LICA) to handle the missing data problem during multimodal fusion under the LICA framework. Built upon the principle of full information from complete cases, our method utilizes all available information to recover the missing latent information. Our simulation experiments show the ideal performance of FI-LICA compared to current practices. Further, applying to multimodal data from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) study, FI-LICA demonstrates better performance in classifying current diagnosis and in predicting the transition of participants with mild cognitive impairment (MCI) to AD, thereby highlighting the practical utility of our proposed method.
The second part of this dissertation aims to preserve the underlying hierarchical structure between mediators and exposure-by-mediator interactions during model selection in the high-dimensional mediator settings. In mediation analysis, the exposure often influences the mediating effect, i.e., there is an interaction between exposure and mediator on the dependent variable. When the mediator is high-dimensional, it is necessary to identify non-zero mediators (M) and exposure-by-mediator (X-by-M) interactions. Although several high-dimensional mediation methods can naturally handle X-by-M interactions, research is scarce in preserving the underlying hierarchical structure between the main effects and the interactions. To fill the knowledge gap, in the second chapter, we develop the XMInt procedure to select M and X-by-M interactions in the high-dimensional mediators setting while preserving the hierarchical structure. Our proposed method employs a sequential regularization-based forward-selection approach to identify the mediators and their hierarchically preserved interaction with exposure. Our numerical experiments show promising selection results. Furthermore, we apply our method to ADNI morphological data and examine the role of cortical thickness and subcortical volumes on the effect of amyloid-beta accumulation on cognitive performance, which could be helpful in understanding the brain compensation mechanism.
The third part of this dissertation aims to control the false discovery rate (FDR) when selecting mediators from a high-dimensional candidate set. Specifically, we formulate a multiple-hypothesis testing framework for mediator selection from a high-dimensional candidate set and propose a method, which extends the recent development in FDR-controlled variable selection with knockoff, to select mediators with FDR control. We show that the proposed method and algorithm achieve finite sample FDR control. We present extensive simulation results to demonstrate the power and finite sample performance compared with the existing method.
Lastly, we demonstrate the method by analyzing data from the Adolescent Brain Cognitive Development (ABCD) study, in which the proposed method selects several resting-state functional magnetic resonance imaging connectivity markers as mediators for the relationship between adverse childhood events and the crystallized composite score in the NIH toolbox.
Subjects
Files
This item is currently under embargo. It will be available starting 2025-12-05.
More About This Work
- Academic Units
- Biostatistics
- Thesis Advisors
- Lee, Seonjoo
- Degree
- Ph.D., Columbia University
- Published Here
- December 11, 2024