How to Train Neural Networks

The purpose of this paper is to give a guidance in neural network modeling. Starting with the preprocessing of the data, we discuss different types of network architecture and show how these can be combined effectively. We analyze several cost functions to avoid unstable learning due to outliers and heteroscedasticity. The Observer - Observation Dilemma is solved by forcing the network to construct smooth approximation functions. Furthermore, we propose some pruning algorithms to optimize the network architecture. All these features and techniques are linked up to a complete and consistent training procedure (see figure 17.25 for an overview), such that the synergy of the methods is maximized.

[1]  M. Perrone Improving regression estimation: Averaging methods for variance reduction with extensions to general convex measure optimization , 1993 .

[2]  David E. Rumelhart,et al.  Generalization by Weight-Elimination with Application to Forecasting , 1990, NIPS.

[3]  Geoffrey E. Hinton,et al.  A Learning Algorithm for Boltzmann Machines , 1985, Cogn. Sci..

[4]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[5]  B. Hao,et al.  Directions in chaos , 1987 .

[6]  Athanasios Papoulis,et al.  Probability, Random Variables and Stochastic Processes , 1965 .

[7]  Ralph Neuneier,et al.  The Observer-Observation Dilemma in Neuro-Forecasting , 1997, NIPS.

[8]  Volker Tresp,et al.  Early Brain Damage , 1996, NIPS.

[9]  Terence D. Sanger,et al.  Optimal unsupervised learning in a single-layer linear feedforward neural network , 1989, Neural Networks.

[10]  E. Elton Modern portfolio theory and investment analysis , 1981 .

[11]  Gary William Flake,et al.  Square Unit Augmented, Radially Extended, Multilayer Perceptrons , 1996, Neural Networks: Tricks of the Trade.

[12]  A. N. Sharkovskiĭ Dynamic systems and turbulence , 1989 .

[13]  Neil Gershenfeld,et al.  An Experimentalist’s Introduction to the Observation of Dynamical Systems , 1988 .

[14]  Kurt Hornik,et al.  Approximation capabilities of multilayer feedforward networks , 1991, Neural Networks.

[15]  A. Refenes Neural Networks in the Capital Markets , 1994 .

[16]  P. M. Williams,et al.  Using Neural Networks to Model Conditional Multivariate Densities , 1996, Neural Computation.

[17]  Ralph Neuneier,et al.  Estimation of Conditional Densities: A Comparison of Neural Network Approaches , 1994 .

[18]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[19]  Rich Caruana,et al.  Multitask Learning , 1997, Machine Learning.

[20]  John E. Moody,et al.  Smoothing Regularizers for Projective Basis Function Networks , 1996, NIPS.

[21]  A. Weigend,et al.  Estimating the mean and variance of the target probability distribution , 1994, Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN'94).

[22]  Yann LeCun,et al.  Optimal Brain Damage , 1989, NIPS.

[23]  F. Takens Detecting strange attractors in turbulence , 1981 .

[24]  Ralph Neuneier,et al.  Optimal Asset Allocation using Adaptive Dynamic Programming , 1995, NIPS.

[25]  James A. Anderson,et al.  Neurocomputing: Foundations of Research , 1988 .

[26]  Jürgen Schmidhuber,et al.  Flat Minima , 1997, Neural Computation.