Sparse Algorithms Are Not Stable

We consider two desired properties of learning algorithms: sparsity and algorithmic stability. Both properties are believed to lead to good generalization ability. We show that these two properties are fundamentally at odds with each other: A sparse algorithm cannot be stable and vice versa. Thus, one has to trade off sparsity and stability in designing a learning algorithm. In particular, our general result implies that '1-regularized regression (Lasso) cannot be stable, while '2-regularized regression is known to have strong stability properties and is therefore not sparse. Index Terms—Stability, sparsity, Lasso, regularization.

[1]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[2]  Massimiliano Pontil,et al.  Leave One Out Error, Stability, and Generalization of Voting Combinations of Classifiers , 2004, Machine Learning.

[3]  Alexandre d'Aspremont,et al.  Full regularization path for sparse principal component analysis , 2007, ICML '07.

[4]  Sayan Mukherjee,et al.  Learning theory: stability is sufficient for generalization and necessary and sufficient for consistency of empirical risk minimization , 2006, Adv. Comput. Math..

[5]  Tommi S. Jaakkola,et al.  Maximum Entropy Discrimination , 1999, NIPS.

[6]  Olvi L. Mangasarian,et al.  Generalized Support Vector Machines , 1998 .

[7]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[8]  Bernhard Schölkopf,et al.  Learning with kernels , 2001 .

[9]  Robert Tibshirani,et al.  1-norm Support Vector Machines , 2003, NIPS.

[10]  Emmanuel J. Candès,et al.  Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information , 2004, IEEE Transactions on Information Theory.

[11]  T. Poggio,et al.  General conditions for predictivity in learning theory , 2004, Nature.

[12]  Dean P. Foster,et al.  The risk inflation criterion for multiple regression , 1994 .

[13]  T. Poggio,et al.  Sufficient Conditions for Uniform Stability of Regularization Algorithms , 2009 .

[14]  Ohad Shamir,et al.  Learnability and Stability in the General Learning Setting , 2009, COLT.

[15]  André Elisseeff,et al.  Stability and Generalization , 2002, J. Mach. Learn. Res..

[16]  Balas K. Natarajan,et al.  Sparse Approximate Solutions to Linear Systems , 1995, SIAM J. Comput..

[17]  D. L. Donoho,et al.  Compressed sensing , 2006, IEEE Trans. Inf. Theory.

[18]  Stéphane Mallat,et al.  Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..

[19]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[20]  Federico Girosi,et al.  An Equivalence Between Sparse Approximation and Support Vector Machines , 1998, Neural Computation.

[21]  Ingo Steinwart,et al.  Consistency of support vector machines and other regularized kernel classifiers , 2005, IEEE Transactions on Information Theory.

[22]  Michael I. Jordan,et al.  A Direct Formulation for Sparse Pca Using Semidefinite Programming , 2004, SIAM Rev..

[23]  Ronald R. Coifman,et al.  Entropy-based algorithms for best basis selection , 1992, IEEE Trans. Inf. Theory.

[24]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .