论文信息 - A statistical approach to learning and generalization in layered neural networks

A statistical approach to learning and generalization in layered neural networks

A general statistical description of the problem of learning from examples is presented. Learning in layered networks is posed as a search in the network parameter space for a network that minimizes an additive error function of a statistically independent examples. By imposing the equivalence of the minimum error and the maximum likelihood criteria for training the network, the Gibbs distribution on the ensemble of networks with a fixed architecture is derived. The probability of correct prediction of a novel example can be expressed using the ensemble, serving as a measure to the network's generalization ability. The entropy of the prediction distribution is shown to be a consistent measure of the network's performance. The proposed formalism is applied to the problems of selecting an optimal architecture and the prediction of learning curves. >

[1] Leslie G. Valiant,et al. A theory of the learnable , 1984, STOC '84.

[2] Y. Tikochinsky,et al. Alternative approach to maximum-entropy inference , 1984 .

[3] Eric B. Baum,et al. On the capabilities of multilayer perceptrons , 1988, J. Complex..

[4] E. Gardner,et al. Optimal storage properties of neural network models , 1988 .

[5] David Haussler,et al. What Size Net Gives Valid Generalization? , 1989, Neural Computation.

[6] S. Edwards,et al. Theory of spin glasses , 1975 .

[7] Lawrence D. Jackel,et al. Large Automatic Learning, Rule Extraction, and Generalization , 1987, Complex Syst..

[8] Thomas M. Cover,et al. Geometrical and Statistical Properties of Systems of Linear Inequalities with Applications in Pattern Recognition , 1965, IEEE Trans. Electron. Comput..

[9] R. T. Cox. Probability, frequency and reasonable expectation , 1990 .

[10] Halbert White,et al. Learning in Artificial Neural Networks: A Statistical Perspective , 1989, Neural Computation.

[11] E. Gardner. The space of interactions in neural network models , 1988 .

[12] M. Stone. Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[13] J. Rissanen. Stochastic Complexity and Modeling , 1986 .

[14] David Haussler,et al. Predicting (0, 1)-functions on randomly drawn points , 1988, [Proceedings 1988] 29th Annual Symposium on Foundations of Computer Science.

[15] Naftali Tishby,et al. Consistent inference of probabilities in layered networks: predictions and generalizations , 1989, International 1989 Joint Conference on Neural Networks.

[16] Sompolinsky,et al. Learning from examples in large neural networks. , 1990, Physical review letters.

[17] Stefano Patarnello,et al. Learning Networks of Neurons with Boolean Logic , 1987 .

[18] Haim Sompolinsky,et al. Learning from Examples in a Single-Layer Neural Network , 1990 .

[19] Eytan Domany,et al. Learning by Choice of Internal Representations , 1988, Complex Syst..

[20] C. D. Gelatt,et al. Optimization by Simulated Annealing , 1983, Science.

[21] David Haussler,et al. Learnability and the Vapnik-Chervonenkis dimension , 1989, JACM.

[22] Vijay K. Samalam,et al. Exhaustive Learning , 1990, Neural Computation.