2021 Theses Doctoral
Dynamic graphical models and curve registration for high-dimensional time course data
The theme of this dissertation is to improve the exploration of patient subgroups with a precision medicine lens, specifically using repeated measures data to evaluate longitudinal trajectories of clinical, biological, and lifestyle measures. Our proposed methodological contributions fall into two branches of statistical methodology: undirected graphical models and functional data analysis.
In the first part of this dissertation, our goal was to study longitudinal networks of brain imaging biomarkers and clinical symptoms during the time leading up to manifest Huntington's disease diagnosis among patients with known genetic risk of disease. Understanding the interrelationships between measures may improve our ability to identify patients who are nearing disease onset and who therefore might be ideal patients for clinical trial recruitment. Gaussian graphical models are a powerful approach for network modeling, and several extensions to these models have been developed to estimate time-varying networks. We propose a time-varying Gaussian graphical model specifically for a time scale that is centered on an anchoring event such as disease diagnosis. Our method contains several novel components intended to 1) reduce bias known to stem from 𝑙₁ penalization, and 2) improve temporal smoothness in network edge strength and structure. These novel components include time-varying adaptive lasso weights, as well as a combination of 𝑙₁, 𝑙₂, and 𝑙₀ penalization. We demonstrated via simulation studies that our proposed approach, as well as more computationally efficient subsets of our full proposed approach, have superior performance compared to existing methods. We applied our proposed approach to the PREDICT-HD study and found that the network edges did change with time leading up to and beyond diagnosis, with change points occurring at different times for different edges. For clinical symptoms, bradykinesia became well-connected with symptoms from several other domains. For imaging measures, we observed a loss of connection over time among gray matter regions, white matter regions, and the hippocampus.
In the second part of this dissertation, we consider time-varying network models for settings in which data are not all Gaussian. We sought to compare longitudinal clinical symptom networks between patients with neuropathologically-defined Alzheimer's disease (AD) vs. neuropathologically-defined Lewy body dementia (LBD), two common types of dementia which can often be clinically misdiagnosed. Given that the clinical measures of interest were largely non-Gaussian, we examined the literature for undirected graphical models for mixed data types. We then proposed an extension to the existing time-varying mixed graphical model by adding time-varying adaptive lasso weights, modeling time in reverse in order to treat neuropathological diagnoses as baseline covariates. The proposed adaptive lasso extension serves a two-fold purpose: they alleviate well-known bias of 𝑙₁ penalization and they encourage temporal smoothness in edge estimation. We demonstrated the improved performance of our extension in simulations studies. Applying our method to the National Alzheimer's Coordinating Center database, we found that the edge structure surrounding the Wechsler Memory Scale Revised (WMS-R) Logical Memory parts IA (immediate recall) and IIA (delayed recall) may contain important markers for discriminant analysis of AD and LBD populations.
In the third part of this dissertation, we explored a methodologically distinct area of research from the first two parts, moving from graphical models to functional data analysis. Our goal was to extract meaningful chronotypes, or phenotypes of circadian rhythms, from activity count data collected from accelerometers. Existing approaches for analyzing diurnal patterns using these data, including the cosinor model and functional principal components analysis, have revealed and quantified population-level diurnal patterns, but considerable subject-level variability remained uncaptured in features such as wake/sleep times and activity intensity. This remaining informative variability could provide a better understanding of chronotypes, or behavioral manifestations of one’s underlying 24-hour rhythm. Curve registration, or alignment, is a technique in functional data analysis that separates "vertical" variability in activity intensity from "horizontal" variability in time-dependent markers like wake and sleep times. We developed a parametric registration framework for 24-hour accelerometric rest-activity profiles that are represented as dichotomized into epoch-level states of activity or rest. Specifically, we estimated subject-specific piecewise linear time-warping functions parametrized with a small set of parameters. We applied this method to data from the Baltimore Longitudinal Study of Aging and illustrated how estimated parameters can give a more flexible quantification of chronotypes compared to traditional approaches.
This item is currently under embargo. It will be available starting 2023-07-14.
More About This Work
- Academic Units
- Thesis Advisors
- Wang, Yuanjia
- Ph.D., Columbia University
- Published Here
- July 16, 2021