Square Unit Augmented, Radially Extended, Multilayer Perceptrons

Consider a multilayer perceptron (MLP) with d inputs, a single hidden sigmoidal layer and a linear output. By adding an additional d inputs to the network with values set to the square of the first d inputs, properties reminiscent of higher-order neural networks and radial basis function networks (RBFN) are added to the architecture with little added expense in terms of weight requirements. Of particular interest, this architecture has the ability to form localized features in a d-dimensional space with a single hidden node but can also span large volumes of the input space; thus, the architecture has the localized properties of an RBFN but does not suffer as badly from the curse of dimensionality. I refer to a network of this type as a SQuare Unit Augmented, Radially Extended, MultiLayer Perceptron (SQUARE-MLP or SMLP).

[1]  Vito Volterra,et al.  Theory of Functionals and of Integral and Integro-Differential Equations , 2005 .

[2]  P. Werbos,et al.  Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .

[3]  R. de Figueiredo The Volterra and Wiener theories of nonlinear systems , 1982, Proceedings of the IEEE.

[4]  C. L. Giles,et al.  Machine learning using higher order correlation networks , 1986 .

[5]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[6]  Robert M. Farber,et al.  How Neural Nets Work , 1987, NIPS.

[7]  A. Lapedes,et al.  Nonlinear signal processing using neural networks: Prediction and system modelling , 1987 .

[8]  Sukhan Lee,et al.  Multilayer feedforward potential function network , 1988, IEEE 1988 International Conference on Neural Networks.

[9]  K. Lang,et al.  Learning to tell two spirals apart , 1988 .

[10]  John Moody,et al.  Fast Learning in Networks of Locally-Tuned Processing Units , 1989, Neural Computation.

[11]  Christian Lebiere,et al.  The Cascade-Correlation Learning Architecture , 1989, NIPS.

[12]  Martin Casdagli,et al.  Nonlinear prediction of chaotic time series , 1989 .

[13]  Yoh-Han Pao,et al.  Adaptive pattern recognition and neural networks , 1989 .

[14]  Geoffrey E. Hinton,et al.  Proceedings of the 1988 Connectionist Models Summer School , 1989 .

[15]  Mahesan Niranjan,et al.  Neural networks and radial basis functions in classifying static speech patterns , 1990 .

[16]  Michael Finke,et al.  Estimating A-Posteriori Probabilities using Stochastic Network Models , 1993 .

[17]  R. Tibshirani,et al.  Flexible Discriminant Analysis by Optimal Scoring , 1994 .

[18]  G. Kane Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol 1: Foundations, vol 2: Psychological and Biological Models , 1994 .

[19]  Robert Tibshirani,et al.  Discriminant Adaptive Nearest Neighbor Classification , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[20]  S. Lawrence,et al.  Function Approximation with Neural Networks and Local Methods: Bias, Variance and Smoothness , 1996 .

[21]  Alan F. Murray,et al.  IEEE International Conference on Neural Networks , 1997 .

[22]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[23]  David S. Touretzky,et al.  Proceedings of the 1993 Connectionist Models Summer School , 2014 .