Nearly-tight VC-dimension and Pseudodimension Bounds for Piecewise Linear Neural Networks

We prove new upper and lower bounds on the VC-dimension of deep neural networks with the ReLU activation function. These bounds are tight for almost the entire range of parameters. Letting $W$ be the number of weights and $L$ be the number of layers, we prove that the VC-dimension is $O(W L \log(W))$, and provide examples with VC-dimension $\Omega( W L \log(W/L) )$. This improves both the previously known upper bounds and lower bounds. In terms of the number $U$ of non-linear units, we prove a tight bound $\Theta(W U)$ on the VC-dimension. All of these bounds generalize to arbitrary piecewise linear activation functions, and also hold for the pseudodimensions of these function classes. Combined with previous results, this gives an intriguing range of dependencies of the VC-dimension on depth for networks with different non-linearities: there is no dependence for piecewise-constant, linear dependence for piecewise-linear, and no more than quadratic dependence for general piecewise-polynomial.

[1]  R. Srikant,et al.  Why Deep Neural Networks for Function Approximation? , 2016, ICLR.

[2]  David Haussler,et al.  What Size Net Gives Valid Generalization? , 1989, Neural Computation.

[3]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[4]  Nadav Cohen,et al.  On the Expressive Power of Deep Learning: A Tensor Analysis , 2015, COLT 2016.

[5]  Eduardo D. Sontag,et al.  Feedback Stabilization Using Two-Hidden-Layer Nets , 1991, 1991 American Control Conference.

[6]  Ohad Shamir,et al.  Depth-Width Tradeoffs in Approximating Natural Functions with Neural Networks , 2016, ICML.

[7]  Wolfgang Maass,et al.  Neural Nets with Superlinear VC-Dimension , 1994, Neural Computation.

[8]  Peter L. Bartlett,et al.  Neural Network Learning - Theoretical Foundations , 1999 .

[9]  H. Warren Lower bounds for approximation by nonlinear manifolds , 1968 .

[10]  Ohad Shamir,et al.  The Power of Depth for Feedforward Neural Networks , 2015, COLT.

[11]  Paul W. Goldberg,et al.  Bounding the Vapnik-Chervonenkis Dimension of Concept Classes Parameterized by Real Numbers , 1993, COLT '93.

[12]  David Haussler,et al.  Learnability and the Vapnik-Chervonenkis dimension , 1989, JACM.

[13]  Kurt Hornik,et al.  Approximation capabilities of multilayer feedforward networks , 1991, Neural Networks.

[14]  Miguel R. D. Rodrigues,et al.  Generalization Error in Deep Learning , 2018, Applied and Numerical Harmonic Analysis.

[15]  Dmitry Yarotsky,et al.  Error bounds for approximations with deep ReLU networks , 2016, Neural Networks.

[16]  Helmut Bölcskei,et al.  Deep Neural Network Approximation Theory , 2019, IEEE Transactions on Information Theory.

[17]  Matus Telgarsky,et al.  Benefits of Depth in Neural Networks , 2016, COLT.

[18]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[19]  D. Pollard Empirical Processes: Theory and Applications , 1990 .

[20]  Peter L. Bartlett,et al.  Almost Linear VC-Dimension Bounds for Piecewise Polynomial Networks , 1998, Neural Computation.