Learning processes in neural networks.

We study the learning dynamics of neural networks from a general point of view. The environment from which the network learns is defined as a set of input stimuli. At discrete points in time, one of these stimuli is presented and an incremental learning step takes place. If the time between learning steps is drawn from a Poisson distribution, the dynamics of an ensemble of learning processes is described by a continuous-time master equation. A learning algorithm that enables a neural network to adapt to a changing environment must have a nonzero learning parameter. This constant adaptability, however, goes at cost of fluctuations in the plasticities, such as synapses and thresholds. The ensemble description allows us to study the asymptotic behavior of the plasticities for a large class of neural networks. For small learning parameters, we derive an expression for the size of the fluctuations in an unchanging environment. In a changing environment, there is a trade-off between adaptability and accuracy (i.e., size of the fluctuations). We use the networks of Grossberg [J. Stat. Phys. 48, 105 (1969)] and Oja [J. Math. Biol. 15, 267 (1982)] as simple examples to analyze and simulate the performance of neural networks in a changing environment. In some cases an optimal learning parameter can be calculated.

[1]  N. Wiener I Am a Mathematician , 1956 .

[2]  Shun-ichi Amari,et al.  A Theory of Adaptive Pattern Classifiers , 1967, IEEE Trans. Electron. Comput..

[3]  S. Grossberg On learning and energy-entropy dependence in recurrent and nonrecurrent signed networks , 1969 .

[4]  D. Bedeaux,et al.  On the Relation between Master Equations and Random Walks and Their Solutions , 1971 .

[5]  Harold J. Kushner,et al.  wchastic. approximation methods for constrained and unconstrained systems , 1978 .

[6]  E. Oja Simplified neuron model as a principal component analyzer , 1982, Journal of mathematical biology.

[7]  J J Hopfield,et al.  Neural networks and physical systems with emergent collective computational abilities. , 1982, Proceedings of the National Academy of Sciences of the United States of America.

[8]  C. Gardiner Handbook of Stochastic Methods , 1983 .

[9]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[10]  H. Kushner Robustness and Approximation of Escape Times and Large Deviations Estimates for Systems with Small Noise Effects , 1984 .

[11]  Geoffrey E. Hinton,et al.  A Learning Algorithm for Boltzmann Machines , 1985, Cogn. Sci..

[12]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[13]  Basilis Gidas,et al.  The Langevin Equation as a Global Minimization Algorithm , 1986 .

[14]  Emile H. L. Aarts,et al.  A pedestrian review of the theory and application of the simulated annealing algorithm , 1987 .

[15]  A V Lukashin,et al.  [Physical models of neural networks]. , 1987, Biofizika.

[16]  H. Kushner Asymptotic global behavior for stochastic approximation and diffusions with slowly decreasing noise effects: Global minimization via Monte Carlo , 1987 .

[17]  Teuvo Kohonen,et al.  Self-Organization and Associative Memory , 1988 .

[18]  Bruce E. Hajek,et al.  Cooling Schedules for Optimal Annealing , 1988, Math. Oper. Res..

[19]  J. Tsitsiklis A survey of large time asymptotics of simulated annealing algorithms , 1988 .

[20]  John N. Tsitsiklis,et al.  Markov Chains with Rare Transitions and Simulated Annealing , 1989, Math. Oper. Res..

[21]  Halbert White,et al.  Learning in Artificial Neural Networks: A Statistical Perspective , 1989, Neural Computation.

[22]  Geoffrey E. Hinton Connectionist Learning Procedures , 1989, Artif. Intell..

[23]  S. P. Luttrell,et al.  Self-organisation: a derivation from first principles of a class of learning algorithms , 1989, International 1989 Joint Conference on Neural Networks.

[24]  David M. Clark,et al.  A convergence theorem for Grossberg learning , 1990, Neural Networks.

[25]  H. G. Schuster,et al.  Fokker-Planck Description of Learning in Backpropagation Networks , 1990 .

[26]  Kurt Hornik,et al.  Convergence of learning algorithms with constant learning rates , 1991, IEEE Trans. Neural Networks.

[27]  Suzanna Becker,et al.  Unsupervised Learning Procedures for Neural Networks , 1991, Int. J. Neural Syst..

[28]  Tom Heskes,et al.  Neural networks learning in a changing environment , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[29]  N. E. Cotter,et al.  A diffusion process for global optimization in neural networks , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[30]  Klaus Schulten,et al.  Self-organizing maps and adaptive filters , 1991 .

[31]  Anders Krogh,et al.  Introduction to the theory of neural computation , 1994, The advanced book program.

[32]  Y. Kabashima,et al.  Finite time scaling of energy in simulated annealing , 1991 .

[33]  Bruno Apolloni,et al.  Simulated annealing approach in backpropagation , 1991, Neurocomputing.

[34]  John E. Moody,et al.  Towards Faster Stochastic Gradient Search , 1991, NIPS.

[35]  Tony Savage,et al.  Are artificial neural nets as smart as a rat , 1992 .

[36]  Tom Heskes,et al.  Retrieval of pattern sequences at variable speeds in a neural network with delays , 1992, Neural Networks.

[37]  John Moody,et al.  Learning rate schedules for faster stochastic gradient search , 1992, Neural Networks for Signal Processing II Proceedings of the 1992 IEEE Workshop.

[38]  G. Orr,et al.  Weight-space probability densities and convergence times for stochastic learning , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[39]  Hilbert J. Kappen,et al.  Global performance of learning rules , 1992 .

[40]  Heskes,et al.  Learning in neural networks with local minima. , 1992, Physical review. A, Atomic, molecular, and optical physics.

[41]  O. Catoni Rough Large Deviation Estimates for Simulated Annealing: Application to Exponential Schedules , 1992 .

[42]  Kurt Hornik,et al.  Convergence analysis of local feature extraction algorithms , 1992, Neural Networks.

[43]  Roberto Battiti,et al.  First- and Second-Order Methods for Learning: Between Steepest Descent and Newton's Method , 1992, Neural Computation.

[44]  Hilbert J. Kappen,et al.  Learning rules, stochastic processes, and local minima , 1992 .

[45]  Heskes,et al.  Learning-parameter adjustment in neural networks. , 1992, Physical review. A, Atomic, molecular, and optical physics.

[46]  Hilbert J. Kappen,et al.  On-line learning processes in artificial neural networks , 1993 .

[47]  G. Radons On stochastic dynamics of supervised learning , 1993 .

[48]  Hilbert J. Kappen,et al.  Error potentials for self-organization , 1993, IEEE International Conference on Neural Networks.

[49]  Heskes,et al.  Cooling schedules for learning in neural networks. , 1993, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.