Learning theory: stability is sufficient for generalization and necessary and sufficient for consistency of empirical risk minimization

Abstract Solutions of learning problems by Empirical Risk Minimization (ERM) – and almost-ERM when the minimizer does not exist – need to be consistent, so that they may be predictive. They also need to be well-posed in the sense of being stable, so that they might be used robustly. We propose a statistical form of stability, defined as leave-one-out (LOO) stability. We prove that for bounded loss classes LOO stability is (a) sufficient for generalization, that is convergence in probability of the empirical error to the expected error, for any algorithm satisfying it and, (b) necessary and sufficient for consistency of ERM. Thus LOO stability is a weak form of stability that represents a sufficient condition for generalization for symmetric learning algorithms while subsuming the classical conditions for consistency of ERM. In particular, we conclude that a certain form of well-posedness and consistency are equivalent for ERM.

[1]  J. Lamperti ON CONVERGENCE OF STOCHASTIC PROCESSES , 1962 .

[2]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[3]  A. N. Tikhonov,et al.  Solutions of ill-posed problems , 1977 .

[4]  Luc Devroye,et al.  Distribution-free performance bounds for potential function rules , 1979, IEEE Trans. Inf. Theory.

[5]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[6]  M. Bertero,et al.  Ill-posed problems in early vision , 1988, Proc. IEEE.

[7]  Richard Durbin,et al.  A dimension reduction framework for understanding cortical maps , 1990, Nature.

[8]  R. Dudley,et al.  Uniform and universal Glivenko-Cantelli classes , 1991 .

[9]  M. Talagrand Type, infratype and the Elton-Pajor theorem , 1992 .

[10]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[11]  H. Engl,et al.  Regularization of Inverse Problems , 1996 .

[12]  M. Talagrand Majorizing measures: the generic chaining , 1996 .

[13]  M. Talagrand A new look at independence , 1996 .

[14]  Noga Alon,et al.  Scale-sensitive dimensions, uniform convergence, and learnability , 1997, JACM.

[15]  Dana Ron,et al.  Algorithmic Stability and Sanity-Check Bounds for Leave-one-Out Cross-Validation , 1997, COLT.

[16]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[17]  Dana Ron,et al.  Algorithmic Stability and Sanity-Check Bounds for Leave-One-Out Cross-Validation , 1997, Neural Computation.

[18]  V. Peña A General Class of Exponential Inequalities for Martingales and Ratios , 1999 .

[19]  André Elisseeff,et al.  Algorithmic Stability and Generalization Performance , 2000, NIPS.

[20]  Tomaso A. Poggio,et al.  Regularization Networks and Support Vector Machines , 2000, Adv. Comput. Math..

[21]  E. Berger UNIFORM CENTRAL LIMIT THEOREMS (Cambridge Studies in Advanced Mathematics 63) By R. M. D UDLEY : 436pp., £55.00, ISBN 0-521-46102-2 (Cambridge University Press, 1999). , 2001 .

[22]  A. W. van der Vaart,et al.  Uniform Central Limit Theorems , 2001 .

[23]  Felipe Cucker,et al.  On the mathematical foundations of learning , 2001 .

[24]  Ding-Xuan Zhou,et al.  The covering number in learning theory , 2002, J. Complex..

[25]  Partha Niyogi,et al.  Almost-everywhere Algorithmic Stability and Generalization Error , 2002, UAI.

[26]  André Elisseeff,et al.  Stability and Generalization , 2002, J. Mach. Learn. Res..

[27]  T. Poggio,et al.  Statistical Learning: Stability is Sufficient for Generalization and Necessary and Sufficient for Consistency of Empirical Risk Minimization , 2002 .

[28]  T. Poggio,et al.  Regression and Classification with Regularization , 2003 .

[29]  T. Poggio,et al.  General conditions for predictivity in learning theory , 2004, Nature.

[30]  S. Mendelson Geometric Parameters in Learning Theory , 2004 .

[31]  T. Poggio,et al.  The Mathematics of Learning: Dealing with Data , 2005, 2005 International Conference on Neural Networks and Brain.