More theorems about scale-sensitive dimensions and learning

We present a new general-purpose algorithm for learning classes of [0; 1]-valued functions in a generalization of the prediction model, and prove a general upper bound on the expected absolute error of this algorithm in terms of a scale-sensitive generalization of the Vapnik dimension proposed by Alon, Ben-David, Cesa-Bianchi and Haussler. We give lower bounds implying that our upper bounds cannot be improved by more than a constant in general. We apply this result, together with techniques due to Haussler, to obtain new upper bounds on packing numbers in terms of this scale-sensitive notion of dimension. Using a different technique, we obtain new bounds on packing numbers in terms of Kearns and Schapire’s fat-shattering function. We show how to apply both packing bounds to obtain improved general bounds on the sample complexity of agnostic learning. For each > 0, we establish weaker sufficient and stronger necessary conditions for a class of [0; 1]-valued functions to be agnostically learnable to within , to be an uniform Glivenko-Cantelli class, and to be agnostically learnable to within by an algorithm using only hypotheses from the class.

[1]  J. Lamperti ON CONVERGENCE OF STOCHASTIC PROCESSES , 1962 .

[2]  Leslie G. Valiant,et al.  A general lower bound on the number of examples needed for learning , 1988, COLT '88.

[3]  David Haussler,et al.  Predicting {0,1}-functions on randomly drawn points , 1988, COLT '88.

[4]  David Haussler,et al.  Equivalence of models for polynomial learnability , 1988, COLT '88.

[5]  Vladimir Vapnik,et al.  Inductive principles of the search for empirical dependences (methods based on weak convergence of probability measures) , 1989, COLT '89.

[6]  David Haussler,et al.  Learnability and the Vapnik-Chervonenkis dimension , 1989, JACM.

[7]  John Shawe-Taylor,et al.  The learnability of formal concepts , 1990, COLT '90.

[8]  Alon Itai,et al.  Learnability with Respect to Fixed Distributions , 1991, Theor. Comput. Sci..

[9]  H. Balsters,et al.  Learnability with respect to fixed distributions , 1991 .

[10]  R. Schapire Toward Eecient Agnostic Learning , 1992 .

[11]  David Haussler,et al.  Decision Theoretic Generalizations of the PAC Model for Neural Net and Other Learning Applications , 1992, Inf. Comput..

[12]  N. Alon,et al.  Scale-sensitive Dimensions, Uniform Convergence, , 1993 .

[13]  Hans Ulrich Simon,et al.  Bounds on the Number of Examples Needed for Learning Functions , 1994, SIAM J. Comput..

[14]  Robert E. Schapire,et al.  Efficient distribution-free learning of probabilistic concepts , 1990, Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science.

[15]  Philip M. Long,et al.  Fat-shattering and the learnability of real-valued functions , 1994, COLT '94.

[16]  David Haussler,et al.  Sphere Packing Numbers for Subsets of the Boolean n-Cube with Bounded Vapnik-Chervonenkis Dimension , 1995, J. Comb. Theory, Ser. A.

[17]  Leonid Gurvits,et al.  Approximation and Learning of Convex Superpositions , 1997, J. Comput. Syst. Sci..

[18]  Noga Alon,et al.  Scale-sensitive dimensions, uniform convergence, and learnability , 1997, JACM.