论文信息 - The Capacity of a Bump

The Capacity of a Bump

Recently, several researchers have reported encouraging experimental results when using Gaussian or bump-like activation functions in multilayer perceptrons. Networks of this type usually require fewer hidden layers and units and often learn much faster than typical sigmoidal networks. To explain these results we consider a hyper-ridge network, which is a simple perceptron with no hidden units and a ridge activation function. If we are interested in partitioning p points in d dimensions into two classes then in the limit as d approaches infinity the capacity of a hyper-ridge and a perceptron is identical. However, we show that for p ≫ d, which is the usual case in practice, the ratio of hyper-ridge to perceptron dichotomies approaches p/2(d + 1).

Gary William Flake

[1] Vladimir Vapnik,et al. Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[2] Thomas M. Cover,et al. Geometrical and Statistical Properties of Systems of Linear Inequalities with Applications in Pattern Recognition , 1965, IEEE Trans. Electron. Comput..

[3] Nils J. Nilsson,et al. Learning Machines: Foundations of Trainable Pattern-Classifying Systems , 1965 .

[4] Gary William Flake,et al. Nonmonotonic activation functions in multilayer perceptrons , 1993 .

[5] Michael R. W. Dawson,et al. Modifying the Generalized Delta Rule to Train Networks of Non-monotonic Processors for Pattern Classification , 1992 .

[6] James D. Keeler,et al. Predicting the Future: Advantages of Semilocal Units , 1991, Neural Computation.

[7] E. Gardner. Maximum Storage Capacity in Neural Networks , 1987 .

[8] F. Girosi,et al. From regularization to radial, tensor and additive splines , 1993, Neural Networks for Signal Processing III - Proceedings of the 1993 IEEE-SP Workshop.