2017 Theses Doctoral
A unified view of high-dimensional bridge regression
In many application areas ranging from bioinformatics to imaging, we are interested in recovering a sparse coefficient in the high-dimensional linear model, when the sample size n is comparable to or less than the dimension p. One of the most popular classes of estimators is the Lq-regularized least squares (LQLS), a.k.a. bridge regression. There have been extensive studies towards understanding the performance of the best subset selection (q=0), LASSO (q=1) and ridge (q=2), three widely known estimators from the LQLS family. This thesis aims at giving a unified view of LQLS for all the non-negative values of q. In contrast to most existing works which obtain order-wise error bounds with loose constants, we derive asymptotically exact error formulas characterized through a series of fixed point equations. A delicate analysis of the fixed point equations enables us to gain fruitful insights into the statistical properties of LQLS across the entire spectrum of Lq-regularization. Our work not only validates the scope of folklore understanding of Lq-minimization, but also provides new insights into high-dimensional statistics as a whole. We will elaborate on our theoretical findings mainly from parameter estimation point of view. At the end of the thesis, we briefly mention bridge regression for variable selection and prediction.
We start by considering the parameter estimation problem and evaluate the performance of LQLS by characterizing the asymptotic mean square error (AMSE). The expression we derive for AMSE does not have explicit forms and hence is not useful in comparing LQLS for different values of q, or providing information in evaluating the effect of relative sample size n/p or the sparsity level of the coefficient. To simplify the expression, we first perform the phase transition (PT) analysis, a widely accepted analysis diagram, of LQLS. Our results reveal some of the limitations and misleading features of the PT framework. To overcome these limitations, we propose the small-error analysis of LQLS. Our new analysis framework not only sheds light on the results of the phase transition analysis, but also describes when phase transition analysis is reliable, and presents a more accurate comparison among different Lq-regularizations.
We then extend our low noise sensitivity analysis to linear models without sparsity structure. Our analysis, as a generalization of phase transition analysis, reveals a clear picture of bridge regression for estimating generic coefficients. Moreover, by a simple transformation we connect our low-noise sensitivity framework to the classical asymptotic regime in which n/p goes to infinity, and give some insightful implications beyond what classical asymptotic analysis of bridge regression can offer.
Furthermore, following the same idea of the new analysis framework, we are able to obtain an explicit characterization of AMSE in the form of second-order expansions under the large noise regime. The expansions provide us some intriguing messages. For example, ridge will outperform LASSO in terms of estimating sparse coefficients when the measurement noise is large.
Finally, we present a short analysis of LQLS, for the purpose of variable selection and prediction. We propose a two-stage variable selection technique based on the LQLS estimators, and describe its superiority and close connection to parameter estimation. For prediction, we illustrate the intricate relation between the tuning parameter selection for optimal in-sample prediction and optimal parameter estimation.
- Weng_columbia_0054D_14144.pdf application/pdf 961 KB Download File
More About This Work
- Academic Units
- Thesis Advisors
- Feng, Yang
- Ph.D., Columbia University
- Published Here
- August 20, 2017