2006 Articles

# Breakdown Point of Model Selection When the Number of Variables Exceeds the Number of Observations

The classical multivariate linear regression problem assumes p variables X1, X2, ... , Xp and a response vector y, each with n observations, and a linear relationship between the two: y = X beta + z, where z ~ N(0, sigma2). We point out that when p > n, there is a breakdown point for standard model selection schemes, such that model selection only works well below a certain critical complexity level depending on n/p. We apply this notion to some standard model selection algorithms (Forward Stepwise, LASSO, LARS) in the case where pGtn. We find that 1) the breakdown point is well-de ned for random X-models and low noise, 2) increasing noise shifts the breakdown point to lower levels of sparsity, and reduces the model recovery ability of the algorithm in a systematic way, and 3) below breakdown, the size of coefficient errors follows the theoretical error distribution for the classical linear model.

## Subjects

## Files

- IJCNN06-1700-dldvcs.pdf application/pdf 1.19 MB Download File

## Also Published In

- Title
- 2006 International Joint Conference on Neural Networks, Sheraton Vancouver Wall Centre Hotel, Vancouver, BC, Canada, July 16-21, 2006
- Publisher
- IEEE
- DOI
- https://doi.org/10.1109/IJCNN.2006.246934

## More About This Work

- Academic Units
- Statistics
- Published Here
- October 11, 2011