Theses Doctoral

Sparse selection in Cox models with functional predictors

Zhang, Yulei

This thesis investigates sparse selection in the Cox regression models with functional predictors. Interest in sparse selection with functional predictors (Lindquist and McKeague, 2009; McKeague and Sen, 2010) can arise in biomedical studies. A functional predictor is a predictor with a trajectory which is usually indexed by time, location or other factors. When the trajectory of a covariate is observed for each subject, and we need to identify a common "sensitive" point of these trajectories which drives outcome, the problem can be formulated as sparse selection with functional predictors. For example, we may locate a gene that is associated to cancer risk along a chromosome. The functional linear regression method is widely used for the analysis of functional covariates. However, it could lack interpretability. The method we develop in this thesis has straightforward interpretation since it relates the hazard to some sensitive components of functional covariates. The Cox regression model has been extensively studied in the analysis of time-to-event data. In this thesis, we extend it to allow for sparse selection with functional predictors. Using the partial likelihood as the criterion function, and following the 3-step procedure for M-estimators established in van der Vaart and Wellner (1996), the consistency, rate of convergence and asymptotic distribution are obtained for M-estimators of the sensitive point and the regression coefficients. In this thesis, to study these large sample properties of the estimators, the fractional Brownian motion assumption is posed for the trajectories for mathematical tractability. Simulations are conducted to evaluate the finite sample performance of the methods, and a way to construct the confidence interval for the location parameter, i.e., the sensitive point, is proposed. The proposed method is applied to an adult brain cancer study and a breast cancer study to find the sensitive point, here the locus of a chromosome, which is closely related to cancer mortality. Since the breast cancer data set has missing values, we investigate the impact of varying proportions of missingness in the data on the accuracy of our estimator as well.


  • thumnail for Zhang_columbia_0054D_10839.pdf Zhang_columbia_0054D_10839.pdf application/pdf 1020 KB Download File

More About This Work

Academic Units
Thesis Advisors
McKeague, Ian W.
Ph.D., Columbia University
Published Here
June 7, 2012