Fat-shattering and the learnability of real-valued functions

We consider the problem of learning real-valued functions from random examples when the function values are corrupted with noise. With mild conditions on independent observation noise, we provide characterizations of the learnability of a real-valued function class in terms of a generalization of the Vapnik-Chervonenkis dimension, the fat shattering function, introduced by Kearns and Schapire. We show that, given some restrictions on the noise, a function class is learnable in our model if and only if its fat-shattering function is finite. With different (also quite mild) restrictions, satisfied for example by gaussian noise, we show that a function class is learnable from polynomially many examples if and only if its fat-shattering function grows polynomially. We prove analogous results in an agnostic setting, where there is no assumption of an underlying function class.

[1]  John Shawe-Taylor,et al.  Bounding Sample Size with the Vapnik-Chervonenkis Dimension , 1993, Discrete Applied Mathematics.

[2]  Yuval Ishai,et al.  Valid Generalisation from Approximate Interpolation , 1996, Combinatorics, Probability and Computing.

[3]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[4]  Peter L. Bartlett,et al.  Learning with a slowly changing distribution , 1992, COLT '92.

[5]  Martin Anthony,et al.  Valid generalisation of functions from close approximations on a sample , 1994 .

[6]  Alon Itai,et al.  Learnability with Respect to Fixed Distributions , 1991, Theor. Comput. Sci..

[7]  H. Balsters,et al.  Learnability with respect to fixed distributions , 1991 .

[8]  Philip M. Long,et al.  More theorems about scale-sensitive dimensions and learning , 1995, COLT '95.

[9]  S. Geer Regression analysis and empirical processes , 1988 .

[10]  Gerhard J. Woeginger,et al.  On the complexity of function learning , 1993, COLT '93.

[11]  Gerhard J. Woeginger,et al.  On the complexity of function learning , 1993, COLT '93.

[12]  Philip M. Long,et al.  Characterizations of Learnability for Classes of {0, ..., n}-Valued Functions , 1995, J. Comput. Syst. Sci..

[13]  Shai Ben-David,et al.  Characterizations of learnability for classes of {O, …, n}-valued functions , 1992, COLT '92.

[14]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[15]  Noga Alon,et al.  Scale-sensitive dimensions, uniform convergence, and learnability , 1993, Proceedings of 1993 IEEE 34th Annual Foundations of Computer Science.

[16]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[17]  Robert E. Schapire,et al.  Efficient Distribution-free Learning of Probabilistic Concepts (Extended Abstract) , 1990, FOCS 1990.

[18]  Neri Merhav,et al.  Universal schemes for sequential decision from individual data sequences , 1993, IEEE Trans. Inf. Theory.

[19]  Balas K. Natarajan,et al.  Occam's razor for functions , 1993, COLT '93.

[20]  Leslie G. Valiant,et al.  Fast probabilistic algorithms for hamiltonian circuits and matchings , 1977, STOC '77.

[21]  John Shawe-Taylor,et al.  Computational learning theory : EuroCOLT '93 : based on the proceedings of the First European Conference on Computational Learning Theory, organized by the Institute of Mathematics and Its Applications and held at Royal Holloway, University of London in December, 1993 , 1994 .

[22]  David Haussler,et al.  Learnability and the Vapnik-Chervonenkis dimension , 1989, JACM.

[23]  Philip M. Long,et al.  Simulating access to hidden information while learning , 1994, STOC '94.

[24]  J. Lamperti ON CONVERGENCE OF STOCHASTIC PROCESSES , 1962 .

[25]  Robert E. Schapire,et al.  Efficient distribution-free learning of probabilistic concepts , 1990, Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science.

[26]  Hans Ulrich Simon,et al.  Bounds on the Number of Examples Needed for Learning Functions , 1994, SIAM J. Comput..

[27]  David Haussler,et al.  Decision Theoretic Generalizations of the PAC Model for Neural Net and Other Learning Applications , 1992, Inf. Comput..