Theses Doctoral

Flexible models and methods for longitudinal and multilevel functional data

Chen, Huaihou

In the first part of this dissertation, we propose penalized spline (P-spline)-based methods for functional mixed effects models with varying coefficients. This work is motivated by a clinical study of Complicated Grief (Shear et al. 2005). In the Complicated Grief Study, patients receive active treatment during a treatment period and then enter a follow-up period during which they no longer receive active treatment. It is conceivable that the primary outcome Inventory of Complicated Grief (ICG) Scale shows different trajectories for the treatment phase and follow-up phase. The length of treatment period varies across patients, i.e., some patients stay longer in the treatment than the others, thus a model that can flexibly accommodate the subject-specific curves and predict individual outcomes is desirable. In our proposed model, we decompose the outcome into a sum of several terms: a population mean function, covariates with time-varying coefficients, functional subject-specific random effects, and a residual measurement error process. Using P-splines, we propose nonparametric estimation of the population mean function, the varying coefficient, the random subject-specific curves, the associated covariance function that represents between-subject variation, and the variance function of the residual measurement errors (which represents within-subject variation). The proposed methods offer flexible estimation of both the population- and subject-level curves. In addition, decomposing variability of the outcomes into a between- and within-subject sources is useful for identifying the dominant variance component, which in turn produces an optimal model for the covariance function. We introduce a likelihood-based method to select the smoothing parameters. Furthermore, we study the asymptotic behavior of the baseline P-spline estimator. We conduct simulation studies to investigate the performance of the proposed methods. The benefit of the between- and within-subject covariance decomposition is illustrated through an analysis of the Berkeley growth data (Tuddenham and Snyder 1954). We identify distinct patterns in the between- and within-subject covariance functions of the children's heights. We also apply the proposed methods to the Framingham Heart Study data. In the second part of the dissertation, we applied a semiparametric marginal model to analyze the Northern Manhattan Study (NOMAS) data (Sacco et al. 1998). NOMAS is a prospective, population-based study, with a goal of characterizing the functional status of stroke survivors following stroke. The functional outcome is a binary indicator of functional independence, defined by Barthel Index greater than or equal to 95. Based on generalized estimating equation (GEE) models, a previous parametric analysis showed that the functional status declines over time and the trajectories of decline are different depending on insurance status. The trend in functional status may not be linear, however, which motivates our semiparametric modeling approach. In this work, we consider a partially linear model with time-varying coefficient to model the trend nonparametrically, and we include an interaction term between the nonparametric trend and the insurance variable. We consider both kernel-weighted local polynomial and regression spline approaches for estimating components of the semiparametric model, and we propose a test for the presence of the interaction effect. To evaluate the performance of the parametric model in the case of model misspecification, we study the bias and efficiency of the estimators under various misspecified parametric models. We find that when the adjusted covariates are independent of time, and the link function is identity, the estimators for those covariates are asymptotically unbiased, even if the time trend is misspecified. In general, however, under other conditions and a nonidentity link, the parametric estimators under the misspecified models are biased. We conduct simulation studies and compare power for testing the adjusted covariates when the time trend is modeled parametrically versus nonparametrically. In the simulation studies, we observe significant gain in power of those estimators obtained from a semiparametric model compared to the parametric model when the time trend is nonlinear. In the third part of the dissertation, we extend the semiparametric marginal model in the second part to the multilevel functional data case. This work is motivated by a clinical study of subarachnoid hemorrhage (SAH) at Columbia University, where patients undergo multiple 4-hour treatment cycles and within each treatment cycle, repeated measurements of subjects' vital signs are recorded (Choi et al. 2012). This data has a natural multilevel structure with treatment cycles nested within subjects and measurements nested within cycles. Most literature on nonparametric analysis of such multilevel functional data focus on conditional approaches using functional mixed effects models. However, parameters obtained from the conditional models do not have direct interpretations as population average effects. When population effects are of interest, we may employ marginal regression models. In this work, we propose marginal approaches to fit multilevel functional data through penalized spline generalized estimating equation (penalized spline GEE). The procedure is effective for modeling multilevel correlated categorical outcomes as well as continuous outcomes without suffering from numerical difficulties. We provide a new variance estimator robust to misspecification of correlation structure. We investigate the large sample properties of the penalized spline GEE with multilevel continuous data and show that the asymptotics falls into two categories. In the small knots scenario, the estimated mean function is asymptotically efficient when the true correlation function is used and the asymptotic bias does not depend on the working correlation matrix. In the large knots scenario, both the asymptotic bias and variance depend on the working correlation. We propose a new method to select the smoothing parameter for marginal penalized spline regression based on an estimate of the asymptotic mean squared error (MSE). Simulation studies suggest superior performance of the new smoothing parameter selector to existing alternatives such as cross validation in several settings. Finally, we apply the methods to the SAH study to evaluate a recent debate on discontinuing the use of Nimodipine in the clinical community.



  • thumnail for Chen_columbia_0054D_10579.pdf Chen_columbia_0054D_10579.pdf application/pdf 1.08 MB Download File

More About This Work

Academic Units
Thesis Advisors
Wang, Yuanjia
Ph.D., Columbia University
Published Here
February 17, 2012