论文信息 - Sparse Algorithms Are Not Stable

Sparse Algorithms Are Not Stable

We consider two desired properties of learning algorithms: sparsity and algorithmic stability. Both properties are believed to lead to good generalization ability. We show that these two properties are fundamentally at odds with each other: A sparse algorithm cannot be stable and vice versa. Thus, one has to trade off sparsity and stability in designing a learning algorithm. In particular, our general result implies that '1-regularized regression (Lasso) cannot be stable, while '2-regularized regression is known to have strong stability properties and is therefore not sparse. Index Terms—Stability, sparsity, Lasso, regularization.

[1] R. Tibshirani. Regression Shrinkage and Selection via the Lasso , 1996 .

[2] Massimiliano Pontil,et al. Leave One Out Error, Stability, and Generalization of Voting Combinations of Classifiers , 2004, Machine Learning.

[3] Alexandre d'Aspremont,et al. Full regularization path for sparse principal component analysis , 2007, ICML '07.

[4] Sayan Mukherjee,et al. Learning theory: stability is sufficient for generalization and necessary and sufficient for consistency of empirical risk minimization , 2006, Adv. Comput. Math..

[5] Tommi S. Jaakkola,et al. Maximum Entropy Discrimination , 1999, NIPS.

[6] Olvi L. Mangasarian,et al. Generalized Support Vector Machines , 1998 .

[7] Geoffrey E. Hinton,et al. Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[8] Bernhard Schölkopf,et al. Learning with kernels , 2001 .

[9] Robert Tibshirani,et al. 1-norm Support Vector Machines , 2003, NIPS.

[10] Emmanuel J. Candès,et al. Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information , 2004, IEEE Transactions on Information Theory.

[11] T. Poggio,et al. General conditions for predictivity in learning theory , 2004, Nature.

[12] Dean P. Foster,et al. The risk inflation criterion for multiple regression , 1994 .

[13] T. Poggio,et al. Sufficient Conditions for Uniform Stability of Regularization Algorithms , 2009 .

[14] Ohad Shamir,et al. Learnability and Stability in the General Learning Setting , 2009, COLT.

[15] André Elisseeff,et al. Stability and Generalization , 2002, J. Mach. Learn. Res..

[16] Balas K. Natarajan,et al. Sparse Approximate Solutions to Linear Systems , 1995, SIAM J. Comput..

[17] D. L. Donoho,et al. Compressed sensing , 2006, IEEE Trans. Inf. Theory.

[18] Stéphane Mallat,et al. Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..

[19] Michael A. Saunders,et al. Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[20] Federico Girosi,et al. An Equivalence Between Sparse Approximation and Support Vector Machines , 1998, Neural Computation.

[21] Ingo Steinwart,et al. Consistency of support vector machines and other regularized kernel classifiers , 2005, IEEE Transactions on Information Theory.

[22] Michael I. Jordan,et al. A Direct Formulation for Sparse Pca Using Semidefinite Programming , 2004, SIAM Rev..

[23] Ronald R. Coifman,et al. Entropy-based algorithms for best basis selection , 1992, IEEE Trans. Inf. Theory.

[24] H. Zou,et al. Regularization and variable selection via the elastic net , 2005 .