Predicting the Future: Advantages of Semilocal Units

In investigating gaussian radial basis function (RBF) networks for their ability to model nonlinear time series, we have found that while RBF networks are much faster than standard sigmoid unit backpropagation for low-dimensional problems, their advantages diminish in high-dimensional input spaces. This is particularly troublesome if the input space contains irrelevant variables. We suggest that this limitation is due to the localized nature of RBFs. To gain the advantages of the highly nonlocal sigmoids and the speed advantages of RBFs, we propose a particular class of semilocal activation functions that is a natural interpolation between these two families. We present evidence that networks using these gaussian bar units avoid the slow learning problem of sigmoid unit networks, and, very importantly, are more accurate than RBF networks in the presence of irrelevant inputs. On the Mackey-Glass and Coupled Lattice Map problems, the speedup over sigmoid networks is so dramatic that the difference in training time between RBF and gaussian bar networks is minor. Gaussian bar architectures that superpose composed gaussians (gaussians-of-gaussians) to approximate the unknown function have the best performance. We postulate that an interesing behavior displayed by gaussian bar functions under gradient descent dynamics, which we call automatic connection pruning, is an important factor in the success of this representation.

[1]  J. Albus A Theory of Cerebellar Function , 1971 .

[2]  L. Glass,et al.  Oscillation and chaos in physiological control systems. , 1977, Science.

[3]  James P. Crutchfield,et al.  Geometry from a Time Series , 1980 .

[4]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[5]  J. Keeler,et al.  Robust space-time intermittency and 1/ f noise , 1986 .

[6]  Farmer,et al.  Predicting chaotic time series. , 1987, Physical review letters.

[7]  A. Lapedes,et al.  Nonlinear Signal Processing Using Neural Networks , 1987 .

[8]  James P. Crutchfield,et al.  Equations of Motion from a Data Series , 1987, Complex Syst..

[9]  J. Doyne Farmer,et al.  Exploiting Chaos to Predict the Future and Reduce Noise , 1989 .

[10]  John Moody,et al.  Fast Learning in Networks of Locally-Tuned Processing Units , 1989, Neural Computation.

[11]  James D. Keeler,et al.  Algorithms for Better Representation and Faster Learning in Radial Basis Function Networks , 1989, NIPS.

[12]  W. Kinzel,et al.  Layered neural networks , 1989 .

[13]  Martin Casdagli,et al.  Nonlinear prediction of chaotic time series , 1989 .

[14]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[15]  David E. Rumelhart,et al.  Predicting the Future: a Connectionist Approach , 1990, Int. J. Neural Syst..

[16]  James D. Keeler,et al.  Layered Neural Networks with Gaussian Hidden Units as Universal Approximations , 1990, Neural Computation.

[17]  Eytan Domany,et al.  Layered neural networks , 1991 .

[18]  Terence D. Sanger,et al.  A Tree-Structured Algorithm for Reducing Computation in Networks with Separable Basis Functions , 1991, Neural Computation.

[19]  P. A. Ramamoorthy,et al.  A new time-evolving neural network architecture and algorithm for nonlinear system identification using adaptive filtering techniques , 1992, [Proceedings] 1992 IEEE International Symposium on Circuits and Systems.