### Reinforcement Learning with Self-Modifying Policies

暂无分享，去创建一个

[1] Andrew McCallum,et al. Instance-Based Utile Distinctions for Reinforcement Learning with Hidden State , 1995, ICML.

[2] Michael I. Jordan,et al. Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems , 1994, NIPS.

[3] J. Urgen Schmidhuber. Discovering Problem Solutions with Low Kolmogorov Complexity and High Generalization Capability , 1994 .

[4] Mark S. Boddy,et al. Deliberation Scheduling for Problem Solving in Time-Constrained Environments , 1994, Artif. Intell..

[5] Juergen Schmidhuber,et al. Incremental self-improvement for life-time multi-agent reinforcement learning , 1996 .

[6] Long-Ji Lin,et al. Reinforcement learning for robots using neural networks , 1992 .

[7] A. Kolmogorov. Three approaches to the quantitative definition of information , 1968 .

[8] Gregory J. Chaitin,et al. On the Length of Programs for Computing Finite Binary Sequences: statistical considerations , 1969, JACM.

[9] Paul E. Utgoff,et al. Shift of bias for inductive concept learning , 1984 .

[10] Jürgen Schmidhuber. A ‘Self-Referential’ Weight Matrix , 1993 .

[11] Mark B. Ring. Continual learning in reinforcement environments , 1995, GMD-Bericht.

[12] Ray J. Solomonoff,et al. A Formal Theory of Inductive Inference. Part I , 1964, Inf. Control..

[13] Dave Cliff,et al. Adding Temporary Memory to ZCS , 1994, Adapt. Behav..

[14] Thomas G. Dietterich. Machine learning , 1996, CSUR.

[15] Jürgen Schmidhuber,et al. Solving POMDPs with Levin Search and EIRA , 1996, ICML.

[16] J. Bather,et al. Multi‐Armed Bandit Allocation Indices , 1990 .

[17] Osamu Watanabe,et al. Kolmogorov Complexity and Computational Complexity , 2012, EATCS Monographs on Theoretical Computer Science.

[18] Corso Elvezia. Hq-learning: Discovering Markovian Subgoals for Non-markovian Reinforcement Learning , 1996 .

[19] Christian M. Ernst,et al. Multi-armed Bandit Allocation Indices , 1989 .

[20] Dana H. Ballard,et al. Active Perception and Reinforcement Learning , 1990, Neural Computation.

[21] Russell Greiner. PALO: A Probabilistic Hill-Climbing Algorithm , 1996, Artif. Intell..

[22] Leonid A. Levin,et al. Randomness Conservation Inequalities; Information and Independence in Mathematical Theories , 1984, Inf. Control..

[23] Ray J. Solomonoff. The Application of Algorithmic Probability to Problems in Artificial Intelligence , 1985, UAI.

[24] Ming Li,et al. An Introduction to Kolmogorov Complexity and Its Applications , 1993, Texts and Monographs in Computer Science.

[25] Juergen Schmidhuber. On learning how to learn learning strategies , 1994 .

[26] Leslie Pack Kaelbling,et al. Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[27] P. W. Jones,et al. Bandit Problems, Sequential Allocation of Experiments , 1987 .

[28] J. Schmidhuber. A neural network that embeds its own meta-levels , 1993, IEEE International Conference on Neural Networks.

[29] San Cristóbal Mateo,et al. The Lack of A Priori Distinctions Between Learning Algorithms , 1996 .

[30] Stuart J. Russell,et al. Principles of Metareasoning , 1989, Artif. Intell..

[31] Corso Elvezia. Discovering Solutions with Low Kolmogorov Complexity and High Generalization Capability , 1995 .

[32] Pravin Varaiya,et al. Stochastic Systems: Estimation, Identification, and Adaptive Control , 1986 .

[33] Andrew G. Barto,et al. Improving Elevator Performance Using Reinforcement Learning , 1995, NIPS.

[34] Lonnie Chrisman,et al. Reinforcement Learning with Perceptual Aliasing: The Perceptual Distinctions Approach , 1992, AAAI.

[35] Ray J. Solomonoff,et al. A Formal Theory of Inductive Inference. Part II , 1964, Inf. Control..

[36] Jürgen Schmidhuber,et al. Discovering Neural Nets with Low Kolmogorov Complexity and High Generalization Capability , 1997, Neural Networks.

[37] Jürgen Schmidhuber,et al. Reinforcement Learning in Markovian and Non-Markovian Environments , 1990, NIPS.

[38] Pattie Maes,et al. Incremental Self-Improvement for Life-Time Multi-Agent Reinforcement Learning , 1996 .

[39] Douglas B. Lenat,et al. Theory Formation by Heuristic Search , 1983, Artif. Intell..

[40] Michael L. Littman,et al. Memoryless policies: theoretical limitations and practical results , 1994 .