Almost Linear VC-Dimension Bounds for Piecewise Polynomial Networks

We compute upper and lower bounds on the VC dimension and pseudodimension of feedforward neural networks composed of piecewise polynomial activation functions. We show that if the number of layers is fixed, then the VC dimension and pseudo-dimension grow as W log W, where W is the number of parameters in the network. This result stands in opposition to the case where the number of layers is unbounded, in which case the VC dimension and pseudo-dimension grow as W2. We combine our results with recently established approximation error rates and determine error bounds for the problem of regression estimation by piecewise polynomial networks with unbounded weights.

[1]  L. Kantorovich,et al.  Functional analysis in normed spaces , 1952 .

[2]  H. Warren Lower bounds for approximation by nonlinear manifolds , 1968 .

[3]  S. Geman,et al.  Nonparametric Maximum Likelihood Estimation by the Method of Sieves , 1982 .

[4]  David Haussler,et al.  Learnability and the Vapnik-Chervonenkis dimension , 1989, JACM.

[5]  A. Barron Approximation and Estimation Bounds for Artificial Neural Networks , 1991, COLT '91.

[6]  David Haussler,et al.  Decision Theoretic Generalizations of the PAC Model for Neural Net and Other Learning Applications , 1992, Inf. Comput..

[7]  Paul W. Goldberg,et al.  Bounding the Vapnik-Chervonenkis Dimension of Concept Classes Parameterized by Real Numbers , 1993, COLT '93.

[8]  A. Sakurai,et al.  Tighter bounds of the VC-dimension of three layer networks , 1993 .

[9]  Allan Pinkus,et al.  Multilayer Feedforward Networks with a Non-Polynomial Activation Function Can Approximate Any Function , 1991, Neural Networks.

[10]  Eduardo D. Sontag,et al.  Finiteness results for sigmoidal “neural” networks , 1993, STOC.

[11]  Wolfgang Maass,et al.  Neural Nets with Superlinear VC-Dimension , 1994, Neural Computation.

[12]  Philip M. Long,et al.  Fat-shattering and the learnability of real-valued functions , 1994, COLT '94.

[13]  Gábor Lugosi,et al.  Nonparametric estimation via empirical risk minimization , 1995, IEEE Trans. Inf. Theory.

[14]  David Haussler,et al.  Sphere Packing Numbers for Subsets of the Boolean n-Cube with Bounded Vapnik-Chervonenkis Dimension , 1995, J. Comb. Theory, Ser. A.

[15]  H. N. Mhaskar,et al.  Neural Networks for Optimal Approximation of Smooth and Analytic Functions , 1996, Neural Computation.

[16]  Peter L. Bartlett,et al.  Efficient agnostic learning of neural networks with bounded fan-in , 1996, IEEE Trans. Inf. Theory.

[17]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[18]  Mathukumalli Vidyasagar,et al.  A Theory of Learning and Generalization: With Applications to Neural Networks and Control Systems , 1997 .

[19]  Eduardo D. Sontag,et al.  Neural Networks with Quadratic VC Dimension , 1995, J. Comput. Syst. Sci..

[20]  Mathukumalli Vidyasagar,et al.  A Theory of Learning and Generalization , 1997 .

[21]  Marek Karpinski,et al.  Polynomial Bounds for VC Dimension of Sigmoidal and General Pfaffian Neural Networks , 1997, J. Comput. Syst. Sci..

[22]  Peter L. Bartlett,et al.  The Sample Complexity of Pattern Classification with Neural Networks: The Size of the Weights is More Important than the Size of the Network , 1998, IEEE Trans. Inf. Theory.

[23]  Akito Sakurai Tight Bounds for the VC-Dimension of Piecewise Polynomial Networks , 1998, NIPS.

[24]  Sanjeev R. Kulkarni,et al.  Learning Pattern Classification - A Survey , 1998, IEEE Trans. Inf. Theory.

[25]  Peter L. Bartlett,et al.  Learning in Neural Networks: Theoretical Foundations , 1999 .

[26]  Peter L. Bartlett,et al.  Neural Network Learning - Theoretical Foundations , 1999 .

[27]  Allan Pinkus,et al.  Approximation theory of the MLP model in neural networks , 1999, Acta Numerica.

[28]  Ron Meir,et al.  On the near optimality of the stochastic approximation of smooth functions by neural networks , 2000, Adv. Comput. Math..