A statistical approach to learning and generalization in layered neural networks

A general statistical description of the problem of learning from examples is presented. Learning in layered networks is posed as a search in the network parameter space for a network that minimizes an additive error function of a statistically independent examples. By imposing the equivalence of the minimum error and the maximum likelihood criteria for training the network, the Gibbs distribution on the ensemble of networks with a fixed architecture is derived. The probability of correct prediction of a novel example can be expressed using the ensemble, serving as a measure to the network's generalization ability. The entropy of the prediction distribution is shown to be a consistent measure of the network's performance. The proposed formalism is applied to the problems of selecting an optimal architecture and the prediction of learning curves. >

[1]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[2]  Y. Tikochinsky,et al.  Alternative approach to maximum-entropy inference , 1984 .

[3]  Eric B. Baum,et al.  On the capabilities of multilayer perceptrons , 1988, J. Complex..

[4]  E. Gardner,et al.  Optimal storage properties of neural network models , 1988 .

[5]  David Haussler,et al.  What Size Net Gives Valid Generalization? , 1989, Neural Computation.

[6]  S. Edwards,et al.  Theory of spin glasses , 1975 .

[7]  Lawrence D. Jackel,et al.  Large Automatic Learning, Rule Extraction, and Generalization , 1987, Complex Syst..

[8]  Thomas M. Cover,et al.  Geometrical and Statistical Properties of Systems of Linear Inequalities with Applications in Pattern Recognition , 1965, IEEE Trans. Electron. Comput..

[9]  R. T. Cox Probability, frequency and reasonable expectation , 1990 .

[10]  Halbert White,et al.  Learning in Artificial Neural Networks: A Statistical Perspective , 1989, Neural Computation.

[11]  E. Gardner The space of interactions in neural network models , 1988 .

[12]  M. Stone Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[13]  J. Rissanen Stochastic Complexity and Modeling , 1986 .

[14]  David Haussler,et al.  Predicting (0, 1)-functions on randomly drawn points , 1988, [Proceedings 1988] 29th Annual Symposium on Foundations of Computer Science.

[15]  Naftali Tishby,et al.  Consistent inference of probabilities in layered networks: predictions and generalizations , 1989, International 1989 Joint Conference on Neural Networks.

[16]  Sompolinsky,et al.  Learning from examples in large neural networks. , 1990, Physical review letters.

[17]  Stefano Patarnello,et al.  Learning Networks of Neurons with Boolean Logic , 1987 .

[18]  Haim Sompolinsky,et al.  Learning from Examples in a Single-Layer Neural Network , 1990 .

[19]  Eytan Domany,et al.  Learning by Choice of Internal Representations , 1988, Complex Syst..

[20]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[21]  David Haussler,et al.  Learnability and the Vapnik-Chervonenkis dimension , 1989, JACM.

[22]  Vijay K. Samalam,et al.  Exhaustive Learning , 1990, Neural Computation.