Networks and the best approximation property

Networks can be considered as approximation schemes. Multilayer networks of the perceptron type can approximate arbitrarily well continuous functions (Cybenko 1988, 1989; Funahashi 1989; Stinchcombe and White 1989). We prove that networks derived from regularization theory and including Radial Basis Functions (Poggio and Girosi 1989), have a similar property. From the point of view of approximation theory, however, the property of approximating continuous functions arbitrarily well is not sufficient for characterizing good approximation schemes. More critical is the property ofbest approximation. The main result of this paper is that multilayer perceptron networks, of the type used in backpropagation, do not have the best approximation property. For regularization networks (in particular Radial Basis Function networks) we prove existence and uniqueness of best approximation.

[1]  M. Stone Applications of the theory of Boolean rings to general topology , 1937 .

[2]  M. Stone The Generalized Weierstrass Approximation Theorem , 1948 .

[3]  A Tikhonov,et al.  Solution of Incorrectly Formulated Problems and the Regularization Method , 1963 .

[4]  J. Rice The approximation of functions , 1964 .

[5]  E. Cheney Introduction to approximation theory , 1966 .

[6]  J. Rice,et al.  Approximation from a curve of functions , 1967 .

[7]  Z. Rubinstein On the approximation by $C$-polynomials , 1968 .

[8]  A. N. Tikhonov,et al.  Solutions of ill-posed problems , 1977 .

[9]  V. A. Morozov,et al.  Methods for Solving Incorrectly Posed Problems , 1984 .

[10]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[11]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[12]  D. Braess Nonlinear Approximation Theory , 1986 .

[13]  C. Micchelli Interpolation of scattered data: Distance matrices and conditionally positive definite functions , 1986 .

[14]  M. Bertero Regularization methods for linear inverse problems , 1986 .

[15]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[16]  Terrence J. Sejnowski,et al.  Parallel Networks that Learn to Pronounce English Text , 1987, Complex Syst..

[17]  M. Bertero,et al.  Ill-posed problems in early vision , 1988, Proc. IEEE.

[18]  David S. Broomhead,et al.  Multivariable Functional Interpolation and Adaptive Networks , 1988, Complex Syst..

[19]  Tomaso A. Poggio,et al.  Representation properties of multilayer feedforward networks , 1988, Neural Networks.

[20]  D. Broomhead,et al.  Radial Basis Functions, Multi-Variable Functional Interpolation and Adaptive Networks , 1988 .

[21]  John Moody,et al.  Fast Learning in Networks of Locally-Tuned Processing Units , 1989, Neural Computation.

[22]  Ken-ichi Funahashi,et al.  On the approximate realization of continuous mappings by neural networks , 1989, Neural Networks.

[23]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..

[24]  S. M. Carroll,et al.  Construction of neural nets using the radon transform , 1989, International 1989 Joint Conference on Neural Networks.

[25]  H. White,et al.  Universal approximation using feedforward networks with non-sigmoid hidden layer activation functions , 1989, International 1989 Joint Conference on Neural Networks.

[26]  James D. Keeler,et al.  Layered Neural Networks with Gaussian Hidden Units as Universal Approximations , 1990, Neural Computation.

[27]  Tomaso A. Poggio,et al.  Extensions of a Theory of Networks for Approximation and Learning , 1990, NIPS.

[28]  Ronald,et al.  Learning representations by backpropagating errors , 2004 .