"Preconditioning" for feature selection and regression in high-dimensional problems

We consider regression problems where the number of predictors greatly exceeds the number of observations. We propose a method for variable selection that first estimates the regression function, yielding a "preconditioned" response variable. The primary method used for this initial regression is supervised principal components. Then we apply a standard procedure such as forward stepwise selection or the LASSO to the preconditioned response variable. In a number of simulated and real data examples, this two-step procedure outperforms forward stepwise selection or the usual LASSO (applied directly to the raw outcome). We also show that under a certain Gaussian latent variable model, application of the LASSO to the preconditioned response variable is consistent as the number of predictors and observations increases. Moreover, when the observational noise is rather large, the suggested procedure can give a more accurate estimate than LASSO. We illustrate our method on some real problems, including survival analysis with microarray data.

[1]  Mee Young Park,et al.  L1‐regularization path algorithm for generalized linear models , 2007 .

[2]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[3]  Peng Zhao,et al.  On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..

[4]  D. Donoho For most large underdetermined systems of linear equations the minimal 𝓁1‐norm solution is also the sparsest solution , 2006 .

[5]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[6]  Martin J. Wainwright,et al.  Sharp thresholds for high-dimensional and noisy recovery of sparsity , 2006, ArXiv.

[7]  R. Tibshirani,et al.  Prediction by Supervised Principal Components , 2006 .

[8]  Robert Tibshirani,et al.  Gene Expression Profiling Predicts Survival in Conventional Renal Cell Carcinoma , 2005, PLoS medicine.

[9]  D. Paul Nonparametric estimation of principal components , 2005 .

[10]  Jianqing Fan,et al.  Nonconcave penalized likelihood with a diverging number of parameters , 2004, math/0406466.

[11]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[12]  D. Madigan,et al.  [Least Angle Regression]: Discussion , 2004 .

[13]  R. Tibshirani,et al.  Semi-Supervised Methods to Predict Patient Survival from Gene Expression Data , 2004, PLoS biology.

[14]  Laurence L. George,et al.  The Statistical Analysis of Failure Time Data , 2003, Technometrics.

[15]  Michael Elad,et al.  Optimally sparse representation in general (nonorthogonal) dictionaries via ℓ1 minimization , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[16]  R. Tibshirani,et al.  Diagnosis of multiple cancer types by shrunken centroids of gene expression , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[17]  Xiaotong Shen,et al.  Adaptive Model Selection , 2002 .

[18]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[19]  K. J. Utikal,et al.  Inference for Density Families Using Functional Principal Component Analysis , 2001 .

[20]  S. Szarek,et al.  Chapter 8 - Local Operator Theory, Random Matrices and Banach Spaces , 2001 .

[21]  Wenjiang J. Fu,et al.  Asymptotics for lasso-type estimators , 2000 .

[22]  M. R. Osborne,et al.  On the LASSO and its Dual , 2000 .

[23]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[24]  J. Herson The statistical analysis of failure time data , 1981 .

[25]  J. Kalbfleisch,et al.  The Statistical Analysis of Failure Time Data , 1980 .