论文信息 - HQ-Learning - 字舞流文

HQ-Learning

HQ-learning is a hierarchical extension of Q(λ)-learning designed to solve certain types of partially observable Markov decision problems (POMDPs). HQ automatically decomposes POMDPs into sequences of simpler subtasks that can be solved by memoryless policies learnable by reactive subagents. HQ can solve partially observable mazes with more states than those used in most previous POMDP work.

Jürgen Schmidhuber | Marco Wiering | J. Schmidhuber | M. Wiering

[1] E. J. Sondik,et al. The Optimal Control of Partially Observable Markov Decision Processes. , 1971 .

[2] W. J. Studden,et al. Theory Of Optimal Experiments , 1972 .

[3] B. Widrow,et al. The truck backer-upper: an example of self-learning in neural networks , 1989, International 1989 Joint Conference on Neural Networks.

[4] Jürgen Schmidhuber,et al. Reinforcement Learning in Markovian and Non-Markovian Environments , 1990, NIPS.

[5] Jürgen Schmidhuber,et al. Learning to generate sub-goals for action sequences , 1991 .

[6] Jürgen Schmidhuber,et al. Curious model-building control systems , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.

[7] Satinder P. Singh,et al. The Efficient Learning of Multiple Task Sequences , 1991, NIPS.

[8] S. Thrun. Eecient Exploration in Reinforcement Learning , 1992 .

[9] Geoffrey E. Hinton,et al. Feudal Reinforcement Learning , 1992, NIPS.

[10] Jürgen Schmidhuber,et al. Learning Complex, Extended Sequences Using the Principle of History Compression , 1992, Neural Computation.

[11] Long-Ji Lin,et al. Reinforcement learning for robots using neural networks , 1992 .

[12] Michael I. Jordan,et al. Forward Models: Supervised Learning with a Distal Teacher , 1992, Cogn. Sci..

[13] Satinder Singh. The Ecient Learning of Multiple Task Sequences , 1992 .

[14] Lonnie Chrisman,et al. Reinforcement Learning with Perceptual Aliasing: The Perceptual Distinctions Approach , 1992, AAAI.

[15] Steven Douglas Whitehead,et al. Reinforcement learning for the adaptive control of perception and action , 1992 .

[16] David A. Cohn,et al. Neural Network Exploration Using Optimal Experiment Design , 1993, NIPS.

[17] Andrew McCallum,et al. Overcoming Incomplete Perception with Utile Distinction Memory , 1993, ICML.

[18] Astro Teller,et al. The evolution of mental models , 1994 .

[19] Dana Ron,et al. Learning probabilistic automata with variable memory length , 1994, COLT '94.

[20] Michael L. Littman,et al. Memoryless policies: theoretical limitations and practical results , 1994 .

[21] Michael I. Jordan,et al. Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems , 1994, NIPS.

[22] Dave Cliff,et al. Adding Temporary Memory to ZCS , 1994, Adapt. Behav..

[23] Stewart W. Wilson. ZCS: A Zeroth Level Classifier System , 1994, Evolutionary Computation.

[24] Mark B. Ring. Continual learning in reinforcement environments , 1995, GMD-Bericht.

[25] S. Hochreiter,et al. REINFORCEMENT DRIVEN INFORMATION ACQUISITION IN NONDETERMINISTIC ENVIRONMENTS , 1995 .

[26] Chen K. Tham,et al. Reinforcement learning of multiple tasks using a hierarchical CMAC architecture , 1995, Robotics Auton. Syst..

[27] Richard S. Sutton,et al. TD Models: Modeling the World at a Mixture of Time Scales , 1995, ICML.

[28] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..

[29] Stuart J. Russell,et al. Approximating Optimal Policies for Partially Observable Stochastic Domains , 1995, IJCAI.

[30] Stewart W. Wilson. Classifier Fitness Based on Accuracy , 1995, Evolutionary Computation.

[31] Leslie Pack Kaelbling,et al. Learning Policies for Partially Observable Environments: Scaling Up , 1997, ICML.

[32] Yoshua Bengio,et al. Hierarchical Recurrent Neural Networks for Long-Term Dependencies , 1995, NIPS.

[33] Pattie Maes,et al. Emergent Hierarchical Control Structures: Learning Reactive/Hierarchical Relationships in Reinforcement Environments , 1996 .

[34] Explore/Exploit Strategies in Autonomy , 1996 .

[35] Wenju Liu,et al. Planning in Stochastic Domains: Problem Characteristics and Approximation , 1996 .

[36] Bruce L. Digney. Emergent Hierarchical Control Structures: Learning Reactive/Hierarchical Relationships in Reinforcem , 1996 .

[37] Craig Boutilier,et al. Computing Optimal Policies for Partially Observable Decision Processes Using Compact Representations , 1996, AAAI/IAAI, Vol. 2.

[38] Jürgen Schmidhuber,et al. Solving POMDPs with Levin Search and EIRA , 1996, ICML.

[39] Maja J. Matarić,et al. Action Selection methods using Reinforcement Learning , 1996 .

[40] Michael L. Littman,et al. Algorithms for Sequential Decision Making , 1996 .

[41] Juergen Schmidhuber,et al. Incremental self-improvement for life-time multi-agent reinforcement learning , 1996 .

[42] Maja J. Matarić,et al. Learning to Use Selective Attention and Short-Term Memory in Sequential Tasks , 1996 .

[43] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[44] J. Schmidhuber. What''s interesting? , 1997 .

[45] Rafal Salustowicz,et al. Probabilistic Incremental Program Evolution , 1997, Evolutionary Computation.

[46] Jürgen Schmidhuber,et al. Probabilistic Incremental Program Evolution: Stochastic Search Through Program Space , 1997, ECML.

[47] Leslie Pack Kaelbling,et al. Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[48] Jürgen Schmidhuber,et al. Reinforcement Learning with Self-Modifying Policies , 1998, Learning to Learn.

[49] Reid G. Simmons,et al. The Effect of Representation and Knowledge on Goal-Directed Exploration with Reinforcement-Learning Algorithms , 2005, Machine Learning.