论文信息 - Discovering Neural Nets with Low Kolmogorov Complexity and High Generalization Capability - 字舞流文

Discovering Neural Nets with Low Kolmogorov Complexity and High Generalization Capability

Jürgen Schmidhuber | J. Schmidhuber

[1] Jürgen Schmidhuber,et al. Low-Complexity Art , 2017 .

[2] Jürgen Schmidhuber,et al. Flat Minima , 1997, Neural Computation.

[3] Jieyu Zhao,et al. Simple Principles of Metalearning , 1996 .

[4] Pattie Maes,et al. Incremental Self-Improvement for Life-Time Multi-Agent Reinforcement Learning , 1996 .

[5] Jürgen Schmidhuber,et al. Solving POMDPs with Levin Search and EIRA , 1996, ICML.

[6] Wolfgang J. Paul,et al. Autonomous theory building systems , 1995, Ann. Oper. Res..

[7] K. Siu,et al. Theoretical Advances in Neural Computation and Learning , 1994, Springer US.

[8] Jürgen Schmidhuber,et al. Simplifying Neural Nets by Discovering Flat Minima , 1994, NIPS.

[9] Wolfgang Maass,et al. Perspectives of Current Research about the Complexity of Learning on Neural Nets , 1994 .

[10] Geoffrey E. Hinton,et al. Keeping Neural Networks Simple , 1993 .

[11] Gustavo Deco,et al. Elimination of Overtraining by a Mutual Information Network , 1993 .

[12] J. Schmidhuber. Reducing the Ratio Between Learning Complexity and Number of Time Varying Variables in Fully Recurrent Nets , 1993 .

[13] Jürgen Schmidhuber,et al. A ‘Self-Referential’ Weight Matrix , 1993 .

[14] Shun-ichi Amari,et al. Statistical Theory of Learning Curves under Entropic Loss Criterion , 1993, Neural Computation.

[15] Ming Li,et al. An Introduction to Kolmogorov Complexity and Its Applications , 2019, Texts in Computer Science.

[16] Osamu Watanabe,et al. Kolmogorov Complexity and Computational Complexity , 2012, EATCS Monographs on Theoretical Computer Science.

[17] Babak Hassibi,et al. Second Order Derivatives for Network Pruning: Optimal Brain Surgeon , 1992, NIPS.

[18] Jürgen Schmidhuber,et al. Learning Factorial Codes by Predictability Minimization , 1992, Neural Computation.

[19] Zhaoping Li,et al. Understanding Retinal Color Coding from First Principles , 1992, Neural Computation.

[20] Geoffrey E. Hinton,et al. Simplifying Neural Networks by Soft Weight-Sharing , 1992, Neural Computation.

[21] David J. C. MacKay,et al. A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[22] E. Allender. Applications of Time-Bounded Kolmogorov Complexity in Complexity Theory , 1992 .

[23] Jürgen Schmidhuber,et al. Learning Complex, Extended Sequences Using the Principle of History Compression , 1992, Neural Computation.

[24] Vladimir Vapnik,et al. Principles of Risk Minimization for Learning Theory , 1991, NIPS.

[25] Isabelle Guyon,et al. Structural Risk Minimization for Character Recognition , 1991, NIPS.

[26] John E. Moody,et al. The Effective Number of Parameters: An Analysis of Generalization and Regularization in Nonlinear Learning Systems , 1991, NIPS.

[27] Anders Krogh,et al. A Simple Weight Decay Can Improve Generalization , 1991, NIPS.

[28] Andrew R. Barron,et al. Complexity Regularization with Application to Artificial Neural Networks , 1991 .

[29] Suzanna Becker,et al. Unsupervised Learning Procedures for Neural Networks , 1991, Int. J. Neural Syst..

[30] L. N. Kanal,et al. Uncertainty in Artificial Intelligence 5 , 1990 .

[31] Barak A. Pearlmutter,et al. Chaitin-Kolmogorov Complexity and Generalization in Neural Networks , 1990, NIPS.

[32] Yann LeCun,et al. Second Order Properties of Error Surfaces: Learning Time and Generalization , 1990, NIPS 1990.

[33] David E. Rumelhart,et al. Predicting the Future: a Connectionist Approach , 1990, Int. J. Neural Syst..

[34] Stephen I. Gallant,et al. A connectionist learning algorithm with provable generalization and scaling bounds , 1990, Neural Networks.

[35] Thomas G. Dietterich. Limitations on Inductive Learning , 1989, ML.

[36] Ming Li,et al. A theory of learning simple concepts under simple distributions and average case complexity for the universal distribution , 1989, 30th Annual Symposium on Foundations of Computer Science.

[37] Ming Li,et al. The Minimum Description Length Principle and Its Application to Online Learning of Handprinted Characters , 1989, IJCAI.

[38] Edwin P. D. Pednault,et al. Some Experiments in Applying Inductive Inference Principles to Surface Reconstruction , 1989, IJCAI.

[39] P. Gács,et al. KOLMOGOROV'S CONTRIBUTIONS TO INFORMATION THEORY AND ALGORITHMIC COMPLEXITY , 1989 .

[40] Ronald L. Rivest,et al. Inferring Decision Trees Using the Minimum Description Length Principle , 1989, Inf. Comput..

[41] C. Watkins. Learning from delayed rewards , 1989 .

[42] David Haussler,et al. What Size Net Gives Valid Generalization? , 1989, Neural Computation.

[43] Jürgen Schmidhuber,et al. A Local Learning Algorithm for Dynamic Feedforward and Recurrent Networks , 1989 .

[44] Charles H. Bennett. Logical depth and physical complexity , 1988 .

[45] David Haussler,et al. Quantifying Inductive Bias: AI Learning Algorithms and Valiant's Learning Framework , 1988, Artif. Intell..

[46] Ralph Linsker,et al. Self-organization in a perceptual network , 1988, Computer.

[47] Michael C. Mozer,et al. Skeletonization: A Technique for Trimming the Fat from a Network via Relevance Assessment , 1988, NIPS.

[48] Gregory. J. Chaitin,et al. Algorithmic information theory , 1987, Cambridge tracts in theoretical computer science.

[49] David Haussler,et al. Occam's Razor , 1987, Inf. Process. Lett..

[50] J. Rissanen. Stochastic Complexity and Modeling , 1986 .

[51] Geoffrey E. Hinton,et al. Learning internal representations by error propagation , 1986 .

[52] Ray J. Solomonoff,et al. The Application of Algorithmic Probability to Problems in Artificial Intelligence , 1985, UAI.

[53] Leslie G. Valiant,et al. A theory of the learnable , 1984, STOC '84.

[54] Leonid A. Levin,et al. Randomness Conservation Inequalities; Information and Independence in Mathematical Theories , 1984, Inf. Control..

[55] Paul E. Utgoff,et al. Shift of bias for inductive concept learning , 1984 .

[56] Juris Hartmanis,et al. Generalized Kolmogorov complexity and the structure of feasible computations , 1983, 24th Annual Symposium on Foundations of Computer Science (sfcs 1983).

[57] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[58] J. Rissanen. A UNIVERSAL PRIOR FOR INTEGERS AND ESTIMATION BY MINIMUM DESCRIPTION LENGTH , 1983 .

[59] J. Rissanen,et al. Modeling By Shortest Data Description* , 1978, Autom..

[60] G. Chaitin. Algorithmic Information Theory , 1987, IBM J. Res. Dev..

[61] G. Chaitin. A Theory of Program Size Formally Identical to Information Theory , 1975, JACM.

[62] P. Werbos,et al. Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .

[63] L. Levin,et al. THE COMPLEXITY OF FINITE OBJECTS AND THE DEVELOPMENT OF THE CONCEPTS OF INFORMATION AND RANDOMNESS BY MEANS OF THE THEORY OF ALGORITHMS , 1970 .

[64] Gregory J. Chaitin,et al. On the Length of Programs for Computing Finite Binary Sequences: statistical considerations , 1969, JACM.

[65] C. S. Wallace,et al. An Information Measure for Classification , 1968, Comput. J..

[66] A. Kolmogorov. Three approaches to the quantitative definition of information , 1968 .

[67] Per Martin-Löf,et al. The Definition of Random Sequences , 1966, Inf. Control..

[68] Gregory J. Chaitin,et al. On the Length of Programs for Computing Finite Binary Sequences , 1966, JACM.

[69] Ray J. Solomonoff,et al. A Formal Theory of Inductive Inference. Part II , 1964, Inf. Control..

[70] A. Kolmogoroff. Grundbegriffe der Wahrscheinlichkeitsrechnung , 1933 .