论文信息 - The Importance of Convexity in Learning with Squared Loss

The Importance of Convexity in Learning with Squared Loss

We show that if the closure of a function class F under the metric induced by some probability distribution is not convex, then the sample complexity for agnostically learning F with squared loss (using only hypotheses in F) is /spl Omega/(ln(1//spl delta/)//spl epsiv//sup 2/) where 1-/spl delta/ is the probability of success and /spl epsiv/ is the required accuracy. In comparison, if the class F is convex and has finite pseudodimension, then the sample complexity is O(1//spl epsiv/(ln(1//spl epsiv/)+ln(1/b)). If a nonconvex class F has finite pseudodimension, then the sample complexity for agnostically learning the closure of the convex hull of F, is O(1//spl epsiv/(1//spl epsiv/(ln(1//spl epsiv/)+ln(1//spl delta/)). Hence, for agnostic learning, learning the convex hull provides better approximation capabilities with little sample complexity penalty.

Peter L. Bartlett | Wee Sun Lee | Robert C. Williamson | R. C. Williamson | P. Bartlett

[1] R. A. Silverman,et al. Introductory Real Analysis , 1972 .

[2] D. Pollard. Convergence of stochastic processes , 1984 .

[3] D. Braess. Nonlinear Approximation Theory , 1986 .

[4] H. Balsters,et al. Learnability with respect to fixed distributions , 1991 .

[5] L. Jones. A Simple Lemma on Greedy Approximation in Hilbert Space and Convergence Rates for Projection Pursuit Regression and Neural Network Training , 1992 .

[6] Linda Sellie,et al. Toward efficient agnostic learning , 1992, COLT '92.

[7] David Haussler,et al. Decision Theoretic Generalizations of the PAC Model for Neural Net and Other Learning Applications , 1992, Inf. Comput..

[8] Andrew R. Barron,et al. Universal approximation bounds for superpositions of a sigmoidal function , 1993, IEEE Trans. Inf. Theory.

[9] Daniel F. McCaffrey,et al. Convergence rates for single hidden layer feedforward networks , 1994, Neural Networks.

[10] Robert E. Schapire,et al. Efficient distribution-free learning of probabilistic concepts , 1990, Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science.

[11] Shai Ben-David,et al. Learning distributions by their density-levels - a paradigm for learning without a teacher , 1995, EuroCOLT.

[12] Peter L. Bartlett,et al. On efficient agnostic learning of linear combinations of basis functions , 1995, COLT '95.

[13] Wolfgang Maass,et al. Agnostic PAC Learning of Functions on Analog Neural Nets , 1993, Neural Computation.

[14] D. Pollard. Uniform ratio limit theorems for empirical processes , 1995 .

[15] Y. Makovoz. Random Approximants and Neural Networks , 1996 .

[16] Peter L. Bartlett,et al. Efficient agnostic learning of neural networks with bounded fan-in , 1996, IEEE Trans. Inf. Theory.

[17] Jon A. Wellner,et al. Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[18] Sanjeev R. Kulkarni,et al. Covering numbers for real-valued function classes , 1997, IEEE Trans. Inf. Theory.