Discovering Neural Nets with Low Kolmogorov Complexity and High Generalization Capability

[1]  Jürgen Schmidhuber,et al.  Low-Complexity Art , 2017 .

[2]  Jürgen Schmidhuber,et al.  Flat Minima , 1997, Neural Computation.

[3]  Jieyu Zhao,et al.  Simple Principles of Metalearning , 1996 .

[4]  Pattie Maes,et al.  Incremental Self-Improvement for Life-Time Multi-Agent Reinforcement Learning , 1996 .

[5]  Jürgen Schmidhuber,et al.  Solving POMDPs with Levin Search and EIRA , 1996, ICML.

[6]  Wolfgang J. Paul,et al.  Autonomous theory building systems , 1995, Ann. Oper. Res..

[7]  K. Siu,et al.  Theoretical Advances in Neural Computation and Learning , 1994, Springer US.

[8]  Jürgen Schmidhuber,et al.  Simplifying Neural Nets by Discovering Flat Minima , 1994, NIPS.

[9]  Wolfgang Maass,et al.  Perspectives of Current Research about the Complexity of Learning on Neural Nets , 1994 .

[10]  Geoffrey E. Hinton,et al.  Keeping Neural Networks Simple , 1993 .

[11]  Gustavo Deco,et al.  Elimination of Overtraining by a Mutual Information Network , 1993 .

[12]  J. Schmidhuber Reducing the Ratio Between Learning Complexity and Number of Time Varying Variables in Fully Recurrent Nets , 1993 .

[13]  Jürgen Schmidhuber,et al.  A ‘Self-Referential’ Weight Matrix , 1993 .

[14]  Shun-ichi Amari,et al.  Statistical Theory of Learning Curves under Entropic Loss Criterion , 1993, Neural Computation.

[15]  Ming Li,et al.  An Introduction to Kolmogorov Complexity and Its Applications , 2019, Texts in Computer Science.

[16]  Osamu Watanabe,et al.  Kolmogorov Complexity and Computational Complexity , 2012, EATCS Monographs on Theoretical Computer Science.

[17]  Babak Hassibi,et al.  Second Order Derivatives for Network Pruning: Optimal Brain Surgeon , 1992, NIPS.

[18]  Jürgen Schmidhuber,et al.  Learning Factorial Codes by Predictability Minimization , 1992, Neural Computation.

[19]  Zhaoping Li,et al.  Understanding Retinal Color Coding from First Principles , 1992, Neural Computation.

[20]  Geoffrey E. Hinton,et al.  Simplifying Neural Networks by Soft Weight-Sharing , 1992, Neural Computation.

[21]  David J. C. MacKay,et al.  A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[22]  E. Allender Applications of Time-Bounded Kolmogorov Complexity in Complexity Theory , 1992 .

[23]  Jürgen Schmidhuber,et al.  Learning Complex, Extended Sequences Using the Principle of History Compression , 1992, Neural Computation.

[24]  Vladimir Vapnik,et al.  Principles of Risk Minimization for Learning Theory , 1991, NIPS.

[25]  Isabelle Guyon,et al.  Structural Risk Minimization for Character Recognition , 1991, NIPS.

[26]  John E. Moody,et al.  The Effective Number of Parameters: An Analysis of Generalization and Regularization in Nonlinear Learning Systems , 1991, NIPS.

[27]  Anders Krogh,et al.  A Simple Weight Decay Can Improve Generalization , 1991, NIPS.

[28]  Andrew R. Barron,et al.  Complexity Regularization with Application to Artificial Neural Networks , 1991 .

[29]  Suzanna Becker,et al.  Unsupervised Learning Procedures for Neural Networks , 1991, Int. J. Neural Syst..

[30]  L. N. Kanal,et al.  Uncertainty in Artificial Intelligence 5 , 1990 .

[31]  Barak A. Pearlmutter,et al.  Chaitin-Kolmogorov Complexity and Generalization in Neural Networks , 1990, NIPS.

[32]  Yann LeCun,et al.  Second Order Properties of Error Surfaces: Learning Time and Generalization , 1990, NIPS 1990.

[33]  David E. Rumelhart,et al.  Predicting the Future: a Connectionist Approach , 1990, Int. J. Neural Syst..

[34]  Stephen I. Gallant,et al.  A connectionist learning algorithm with provable generalization and scaling bounds , 1990, Neural Networks.

[35]  Thomas G. Dietterich Limitations on Inductive Learning , 1989, ML.

[36]  Ming Li,et al.  A theory of learning simple concepts under simple distributions and average case complexity for the universal distribution , 1989, 30th Annual Symposium on Foundations of Computer Science.

[37]  Ming Li,et al.  The Minimum Description Length Principle and Its Application to Online Learning of Handprinted Characters , 1989, IJCAI.

[38]  Edwin P. D. Pednault,et al.  Some Experiments in Applying Inductive Inference Principles to Surface Reconstruction , 1989, IJCAI.

[39]  P. Gács,et al.  KOLMOGOROV'S CONTRIBUTIONS TO INFORMATION THEORY AND ALGORITHMIC COMPLEXITY , 1989 .

[40]  Ronald L. Rivest,et al.  Inferring Decision Trees Using the Minimum Description Length Principle , 1989, Inf. Comput..

[41]  C. Watkins Learning from delayed rewards , 1989 .

[42]  David Haussler,et al.  What Size Net Gives Valid Generalization? , 1989, Neural Computation.

[43]  Jürgen Schmidhuber,et al.  A Local Learning Algorithm for Dynamic Feedforward and Recurrent Networks , 1989 .

[44]  Charles H. Bennett Logical depth and physical complexity , 1988 .

[45]  David Haussler,et al.  Quantifying Inductive Bias: AI Learning Algorithms and Valiant's Learning Framework , 1988, Artif. Intell..

[46]  Ralph Linsker,et al.  Self-organization in a perceptual network , 1988, Computer.

[47]  Michael C. Mozer,et al.  Skeletonization: A Technique for Trimming the Fat from a Network via Relevance Assessment , 1988, NIPS.

[48]  Gregory. J. Chaitin,et al.  Algorithmic information theory , 1987, Cambridge tracts in theoretical computer science.

[49]  David Haussler,et al.  Occam's Razor , 1987, Inf. Process. Lett..

[50]  J. Rissanen Stochastic Complexity and Modeling , 1986 .

[51]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[52]  Ray J. Solomonoff,et al.  The Application of Algorithmic Probability to Problems in Artificial Intelligence , 1985, UAI.

[53]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[54]  Leonid A. Levin,et al.  Randomness Conservation Inequalities; Information and Independence in Mathematical Theories , 1984, Inf. Control..

[55]  Paul E. Utgoff,et al.  Shift of bias for inductive concept learning , 1984 .

[56]  Juris Hartmanis,et al.  Generalized Kolmogorov complexity and the structure of feasible computations , 1983, 24th Annual Symposium on Foundations of Computer Science (sfcs 1983).

[57]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[58]  J. Rissanen A UNIVERSAL PRIOR FOR INTEGERS AND ESTIMATION BY MINIMUM DESCRIPTION LENGTH , 1983 .

[59]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[60]  G. Chaitin Algorithmic Information Theory , 1987, IBM J. Res. Dev..

[61]  G. Chaitin A Theory of Program Size Formally Identical to Information Theory , 1975, JACM.

[62]  P. Werbos,et al.  Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .

[63]  L. Levin,et al.  THE COMPLEXITY OF FINITE OBJECTS AND THE DEVELOPMENT OF THE CONCEPTS OF INFORMATION AND RANDOMNESS BY MEANS OF THE THEORY OF ALGORITHMS , 1970 .

[64]  Gregory J. Chaitin,et al.  On the Length of Programs for Computing Finite Binary Sequences: statistical considerations , 1969, JACM.

[65]  C. S. Wallace,et al.  An Information Measure for Classification , 1968, Comput. J..

[66]  A. Kolmogorov Three approaches to the quantitative definition of information , 1968 .

[67]  Per Martin-Löf,et al.  The Definition of Random Sequences , 1966, Inf. Control..

[68]  Gregory J. Chaitin,et al.  On the Length of Programs for Computing Finite Binary Sequences , 1966, JACM.

[69]  Ray J. Solomonoff,et al.  A Formal Theory of Inductive Inference. Part II , 1964, Inf. Control..

[70]  A. Kolmogoroff Grundbegriffe der Wahrscheinlichkeitsrechnung , 1933 .