The Capacity of a Bump

Recently, several researchers have reported encouraging experimental results when using Gaussian or bump-like activation functions in multilayer perceptrons. Networks of this type usually require fewer hidden layers and units and often learn much faster than typical sigmoidal networks. To explain these results we consider a hyper-ridge network, which is a simple perceptron with no hidden units and a ridge activation function. If we are interested in partitioning p points in d dimensions into two classes then in the limit as d approaches infinity the capacity of a hyper-ridge and a perceptron is identical. However, we show that for p ≫ d, which is the usual case in practice, the ratio of hyper-ridge to perceptron dichotomies approaches p/2(d + 1).