Neural Networks with Quadratic VC Dimension

This paper shows that neural networks which use continuous activation functions have VC dimension at least as large as the square of the number of weightsw. This results settles a long-standing open question, namely whether the well-knownO(wlogw) bound, known for hard-threshold nets, also held for more general sigmoidal nets. Implications for the number of samples needed for valid generalization are discussed.

[1]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[2]  Eduardo Sontag Sigmoids distinguish better than Heavisides , 1989 .

[3]  David Haussler,et al.  What Size Net Gives Valid Generalization? , 1989, Neural Computation.

[4]  S. Smale,et al.  On a theory of computation and complexity over the real numbers; np-completeness , 1989 .

[5]  Eduardo D. Sontag Sigmoids Distinguish More Efficiently Than Heavisides , 1989, Neural Computation.

[6]  David Haussler,et al.  Learnability and the Vapnik-Chervonenkis dimension , 1989, JACM.

[7]  John Shawe-Taylor Threshold Network Learning in the Presence of Equivalences , 1991, NIPS.

[8]  Balas K. Natarajan,et al.  Machine Learning: A Theoretical Approach , 1992 .

[9]  Martin Anthony,et al.  Computational learning theory: an introduction , 1992 .

[10]  Eduardo D. Sontag,et al.  Feedforward Nets for Interpolation and Classification , 1992, J. Comput. Syst. Sci..

[11]  Paul W. Goldberg,et al.  Bounding the Vapnik-Chervonenkis Dimension of Concept Classes Parameterized by Real Numbers , 1993, COLT '93.

[12]  Wolfgang Maass,et al.  Bounds for the computational power and learning complexity of analog neural nets , 1993, SIAM J. Comput..

[13]  K. Siu,et al.  Theoretical Advances in Neural Computation and Learning , 1994, Springer US.

[14]  Wolfgang Maass,et al.  Perspectives of Current Research about the Complexity of Learning on Neural Nets , 1994 .

[15]  Marek Karpinski,et al.  Polynomial bounds for VC dimension of sigmoidal neural networks , 1995, STOC '95.

[16]  Noga Alon,et al.  Scale-sensitive dimensions, uniform convergence, and learnability , 1997, JACM.