Universal property of learning curves under entropy loss

A learning curve shows how fast a learning machine improves it behaviour as the number of training examples increases. A study of the universal asymptotic behaviour of learning curves for general dichotomy machines is presented. It is proved rigorously that the average predictive entropy converges to zero as approximately d/t as the number of t of training examples increases, where d is the number of modifiable parameters of a machine, irrespectively of the architecture of the machine.<<ETX>>

[1]  Shun-ichi Amari,et al.  A Theory of Adaptive Pattern Classifiers , 1967, IEEE Trans. Electron. Comput..

[2]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[3]  J. Rissanen Stochastic Complexity and Modeling , 1986 .

[4]  David Haussler,et al.  Predicting (0, 1)-functions on randomly drawn points , 1988, [Proceedings 1988] 29th Annual Symposium on Foundations of Computer Science.

[5]  Esther Levin,et al.  A statistical approach to learning and generalization in layered neural networks , 1989, Proc. IEEE.

[6]  David Haussler,et al.  What Size Net Gives Valid Generalization? , 1989, Neural Computation.

[7]  Sompolinsky,et al.  Learning from examples in large neural networks. , 1990, Physical review letters.

[8]  Heskes,et al.  Learning processes in neural networks. , 1991, Physical review. A, Atomic, molecular, and optical physics.

[9]  Kenji Yamanishi A loss bound model for on-line stochastic prediction strategies , 1991, COLT '91.

[10]  H. Sebastian Seung,et al.  Learning curves in large neural networks , 1991, COLT '91.

[11]  David Haussler,et al.  Calculation of the learning curve of Bayes optimal classification algorithm for learning a perceptron with noise , 1991, COLT '91.

[12]  Shun-ichi Amari,et al.  Four Types of Learning Curves , 1992, Neural Computation.

[13]  Shun-ichi Amari,et al.  Statistical Theory of Learning Curves under Entropic Loss Criterion , 1993, Neural Computation.