Statistical Learning: Stability is Sufficient for Generalization and Necessary and Sufficient for Consistency of Empirical Risk Minimization

Abstract : Solutions of learning problems by Empirical Risk Minimization (ERM) -- and almost-ERM when the minimizer does not exist -- need to be consistent, so that they may be predictive. They also need to be well-posed in the sense of being stable, so that they might be used robustly. We propose a statistical form of leave-one-out stability, called CVEEE(loo) stability. Our main new results are two. We prove that for bounded loss classes CVEEE(loo) stability is (a) sufficient for generalization, that is convergence in probability of the empirical error to the expected error, for any algorithm satisfying it and, (b) necessary and sufficient for generalization and consistency of ERM. Thus CVEEE(loo) stability is a weak form of stability that represents a sufficient condition for generalization for general learning algorithms while subsuming the classical conditions for consistency of ERM. We discuss alternative forms of stability. In particular, we conclude that for ERM a certain form of well-posedness is equivalent to consistency.

[1]  R. A. Silverman,et al.  Introductory Real Analysis , 1972 .

[2]  A. N. Tikhonov,et al.  Solutions of ill-posed problems , 1977 .

[3]  Luc Devroye,et al.  Distribution-free performance bounds for potential function rules , 1979, IEEE Trans. Inf. Theory.

[4]  B. Efron,et al.  The Jackknife Estimate of Variance , 1981 .

[5]  D. Pollard Convergence of stochastic processes , 1984 .

[6]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, CACM.

[7]  M. Bertero,et al.  Ill-posed problems in early vision , 1988, Proc. IEEE.

[8]  R. Dudley,et al.  Uniform and universal Glivenko-Cantelli classes , 1991 .

[9]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[10]  H. Engl,et al.  Regularization of Inverse Problems , 1996 .

[11]  M. Talagrand A new look at independence , 1996 .

[12]  Noga Alon,et al.  Scale-sensitive dimensions, uniform convergence, and learnability , 1997, JACM.

[13]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[14]  Dana Ron,et al.  Algorithmic Stability and Sanity-Check Bounds for Leave-One-Out Cross-Validation , 1997, Neural Computation.

[15]  V. Peña A General Class of Exponential Inequalities for Martingales and Ratios , 1999 .

[16]  André Elisseeff,et al.  Algorithmic Stability and Generalization Performance , 2000, NIPS.

[17]  Tomaso A. Poggio,et al.  Regularization Networks and Support Vector Machines , 2000, Adv. Comput. Math..

[18]  E. Berger UNIFORM CENTRAL LIMIT THEOREMS (Cambridge Studies in Advanced Mathematics 63) By R. M. D UDLEY : 436pp., £55.00, ISBN 0-521-46102-2 (Cambridge University Press, 1999). , 2001 .

[19]  Felipe Cucker,et al.  On the mathematical foundations of learning , 2001 .

[20]  Ding-Xuan Zhou,et al.  The covering number in learning theory , 2002, J. Complex..

[21]  Partha Niyogi,et al.  Almost-everywhere Algorithmic Stability and Generalization Error , 2002, UAI.

[22]  André Elisseeff,et al.  Stability and Generalization , 2002, J. Mach. Learn. Res..

[23]  Shahar Mendelson,et al.  Improving the sample complexity using global data , 2002, IEEE Trans. Inf. Theory.

[24]  Shahar Mendelson,et al.  A Few Notes on Statistical Learning Theory , 2002, Machine Learning Summer School.

[25]  P. MassartLedoux Concentration Inequalities Using the Entropy Method , 2002 .

[26]  S. Boucheron,et al.  Concentration inequalities using the entropy method , 2003 .

[27]  T. Poggio,et al.  Regression and Classification with Regularization , 2003 .

[28]  S. Mendelson Geometric Parameters in Learning Theory , 2004 .