The New ERA in Supervised Learning

Conventional methods of supervised learning are inevitably faced with the problem of local minima; evidence is presented that second order methods such as the conjugate gradient and quasi-Newton techniques are particularly susceptible to being trapped in sub-optimal solutions. A new technique, expanded range approximation (ERA), is presented, which by the use of a homotopy on the range of the target outputs allows supervised learning methods to find a global minimum of the error function in almost every case. Copyright 1997 Elsevier Science Ltd. All Rights Reserved.

[1]  J. Slawny,et al.  Back propagation fails to separate where perceptrons succeed , 1989 .

[2]  Roberto Battiti,et al.  First- and Second-Order Methods for Learning: Between Steepest Descent and Newton's Method , 1992, Neural Computation.

[3]  L. Ingber Very fast simulated re-annealing , 1989 .

[4]  E. K. Blum,et al.  Approximation of Boolean Functions by Sigmoidal Networks: Part I: XOR and Other Two-Variable Functions , 1989, Neural Computation.

[5]  John E. Moody,et al.  Towards Faster Stochastic Gradient Search , 1991, NIPS.

[6]  John A Kinsella,et al.  Comparison and evaluation of variants of the conjugate gradient method for efficient learning in feed-forward neural networks with backward error propagation , 1992 .

[7]  Bedri C. Cetin,et al.  Terminal repeller unconstrained subenergy tunneling (trust) for fast global optimization , 1993 .

[8]  Alberto Tesi,et al.  On the Problem of Local Minima in Backpropagation , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[10]  P. Lisboa,et al.  Complete solution of the local minima in the XOR problem , 1991 .

[11]  P. Werbos,et al.  Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .

[12]  Yong Yao,et al.  Dynamic tunneling algorithm for global optimization , 1989, IEEE Trans. Syst. Man Cybern..

[13]  Eduardo D. Sontag,et al.  Backpropagation separates when perceptrons do , 1989, International 1989 Joint Conference on Neural Networks.

[14]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[15]  Andrew Blake,et al.  Visual Reconstruction , 1987, Deep Learning for EEG-Based Brain–Computer Interfaces.

[16]  A. V. Levy,et al.  The Tunneling Algorithm for the Global Minimization of Functions , 1985 .

[17]  H. Szu Fast simulated annealing , 1987 .

[18]  Adrian J. Shepherd,et al.  A CLASSICAL ALGORITHM FOR AVOIDING LOCAL MINIMA , 1994 .

[19]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[20]  H. Peng,et al.  A conformance tester for X.25 DTE implementations , 1989, IEEE Network.