Vapnik-Chervonenkis Dimension of Recurrent Neural Networks

Most of the work on the Vapnik-Chervonenkis dimension of neural networks has been focused on feedforward networks. However, recurrent networks are also widely used in learning applications, in particular when time is a relevant parameter. This paper provides lower and upper bounds for the VC dimension of such networks. Several types of activation functions are discussed, including threshold, polynomial, piecewise-polynomial and sigmoidal functions. The bounds depend on two independent parameters: the number w of weights in the network, and the length k of the input sequence. In contrast, for feedforward networks, VC dimension bounds can be expressed as a function of w only. An important difference between recurrent and feedforward nets is that a fixed recurrent net can receive inputs of arbitrary length. Therefore we are particularly interested in the case k≫w. Ignoring multiplicative constants, the main results say roughly the following: For architectures with activation σ = any fixed nonlinear polynomial, the VC dimension is ≈ wk. For architectures with activation σ = any fixed piecewise polynomial, the VC dimension is between wk and w2k. For architectures with activation σ = H (threshold nets), the VC dimension is between w log(k/w) and min{wk log wk, w2+w log wk}. For the standard sigmoid σ(x)=1/(1+e−x), the VC dimension is between wk and w4k2.

[1]  S. Grossberg,et al.  The Adaptive Brain , 1990 .

[2]  Eduardo D. Sontag,et al.  Neural Networks with Quadratic VC Dimension , 1995, J. Comput. Syst. Sci..

[3]  Paul W. Goldberg,et al.  Bounding the Vapnik-Chervonenkis Dimension of Concept Classes Parameterized by Real Numbers , 1993, COLT '93.

[4]  Barak A. Pearlmutter,et al.  VC Dimension of an Integrate-and-Fire Neuron Model , 1996, Neural Comput..

[5]  C. Lee Giles,et al.  Higher Order Recurrent Networks and Grammatical Inference , 1989, NIPS.

[6]  Hava T. Siegelmann,et al.  Analog computation via neural networks , 1993, [1993] The 2nd Israel Symposium on Theory and Computing Systems.

[7]  Eduardo D. Sontag,et al.  NEURAL NETS AS SYSTEMS MODELS AND CONTROLLERS , 1992 .

[8]  Eduardo D. Sontag,et al.  Mathematical Control Theory: Deterministic Finite Dimensional Systems , 1990 .

[9]  David Haussler,et al.  Learnability and the Vapnik-Chervonenkis dimension , 1989, JACM.

[10]  Yoshua Bengio,et al.  Neural networks for speech and sequence recognition , 1996 .

[11]  Eduardo D. Sontag,et al.  Sample complexity for learning recurrent perceptron mappings , 1995, IEEE Trans. Inf. Theory.

[12]  Hava T. Siegelmann,et al.  On the Computational Power of Neural Nets , 1995, J. Comput. Syst. Sci..

[13]  David Haussler,et al.  What Size Net Gives Valid Generalization? , 1989, Neural Computation.

[14]  Eduardo D. Sontag,et al.  Feedforward Nets for Interpolation and Classification , 1992, J. Comput. Syst. Sci..

[15]  Marek Karpinski,et al.  Polynomial Bounds for VC Dimension of Sigmoidal and General Pfaffian Neural Networks , 1997, J. Comput. Syst. Sci..

[16]  J J Hopfield,et al.  Neural networks and physical systems with emergent collective computational abilities. , 1982, Proceedings of the National Academy of Sciences of the United States of America.

[17]  Marek Karpinski,et al.  Polynomial bounds for VC dimension of sigmoidal neural networks , 1995, STOC '95.