On efficient agnostic learning of linear combinations of basis functions

We consider efficient agnostic learning of linear combinations of basis functions when the sum of absolute values of the weights of the linear combinations is bounded. With the quadratic loss function, we show that the class of linear combinations of a set of basis functions is efficiently agnostically learnable if and only if the class of basis functions is efficiently agnostically learnable. We also show that the sample complexity for learning the linear combinations grows polynomially if and only if a combinatorial property of the class of basis functions, called the fat-shattering function, grows at most polynomially. We also relate the problem to agnostic learning of f0 1g-valued function classes by showing that if a class of f0 1g-valued functions is efficiently agnostically learnable (using the same function class) with the discrete loss function, then the class of linear combinations of functions from the class is efficiently agnostically learnable with the quadratic loss function.

[1]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[2]  Alfred V. Aho,et al.  The Design and Analysis of Computer Algorithms , 1974 .

[3]  Leslie G. Valiant,et al.  Fast probabilistic algorithms for hamiltonian circuits and matchings , 1977, STOC '77.

[4]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, CACM.

[5]  David Haussler,et al.  Learnability and the Vapnik-Chervonenkis dimension , 1989, JACM.

[6]  Robert E. Schapire,et al.  The strength of weak learnability , 1990, Mach. Learn..

[7]  Martin Anthony,et al.  Computational learning theory: an introduction , 1992 .

[8]  Martin Anthony,et al.  Computational Learning Theory , 1992 .

[9]  L. Jones A Simple Lemma on Greedy Approximation in Hilbert Space and Convergence Rates for Projection Pursuit Regression and Neural Network Training , 1992 .

[10]  David Haussler,et al.  Decision Theoretic Generalizations of the PAC Model for Neural Net and Other Learning Applications , 1992, Inf. Comput..

[11]  Hans Ulrich Simon,et al.  Robust Trainability of Single Neurons , 1995, J. Comput. Syst. Sci..

[12]  Andrew R. Barron,et al.  Universal approximation bounds for superpositions of a sigmoidal function , 1993, IEEE Trans. Inf. Theory.

[13]  Robert E. Schapire,et al.  Efficient Distribution-Free Learning of Probabilistic , 1994 .

[14]  Mark Jerrum Simple Translation-Invariant Concepts Are Hard to Learn , 1994, Inf. Comput..

[15]  Pascal Koiran,et al.  Efficient learning of continuous neural networks , 1994, COLT '94.

[16]  Robert E. Schapire,et al.  Efficient distribution-free learning of probabilistic concepts , 1990, Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science.

[17]  Philip M. Long,et al.  Fat-shattering and the learnability of real-valued functions , 1994, COLT '94.

[18]  R. Schapire,et al.  Toward Efficient Agnostic Learning , 1994 .

[19]  Wolfgang Maass,et al.  Agnostic PAC Learning of Functions on Analog Neural Nets , 1993, Neural Computation.

[20]  Leonid Gurvits,et al.  Approximation and Learning of Convex Superpositions , 1995, J. Comput. Syst. Sci..

[21]  Peter L. Bartlett,et al.  Efficient agnostic learning of neural networks with bounded fan-in , 1996, IEEE Trans. Inf. Theory.

[22]  Noga Alon,et al.  Scale-sensitive dimensions, uniform convergence, and learnability , 1997, JACM.