Reinforcement learning on explicitly specified time scales

In recent years hierarchical concepts of temporal abstraction have been integrated in the reinforcement learning framework to improve scalability. However, existing approaches are limited to domains where a decomposition into subtasks is known a priori. In this article we propose the concept of explicitly selecting time scale related abstract actions if no subgoal related abstract actions are available. This concept is realised with multi-step actions on different time scales that are combined in one single action set. We exploit the special structure of the action set in the MSA-Q-learning algorithm. This approach is suited for learning optimal policies in “unstructured” domains where a decomposition into subtasks is not known in advance or does not exist at all. By learning different explicitly specified time scales simultaneously, we achieve a considerable improvement of learning speed, which we demonstrate on several benchmark problems.

[1]  Leslie Pack Kaelbling,et al.  Hierarchical Learning in Stochastic Domains: Preliminary Results , 1993, ICML.

[2]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[3]  Nils J. Nilsson,et al.  Problem-solving methods in artificial intelligence , 1971, McGraw-Hill computer science series.

[4]  Milos Hauskrecht,et al.  Hierarchical Solution of Markov Decision Processes using Macro-actions , 1998, UAI.

[5]  Geoffrey E. Hinton,et al.  Feudal Reinforcement Learning , 1992, NIPS.

[6]  Stephan Pareigis,et al.  Adaptive Choice of Grid and Time in Reinforcement Learning , 1997, NIPS.

[7]  Gary Boone,et al.  Minimum-time control of the Acrobot , 1997, Proceedings of International Conference on Robotics and Automation.

[8]  Amy McGovern,et al.  AcQuire-macros: An Algorithm for Automatically Learning Macro-actions , 1998 .

[9]  Doina Precup,et al.  Using Options for Knowledge Transfer in Reinforcement Learning , 1999 .

[10]  Andrew W. Moore,et al.  The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces , 2004, Machine Learning.

[11]  Martin A. Riedmiller,et al.  Speeding-up Reinforcement Learning with Multi-step Actions , 2002, ICANN.

[12]  Shigenobu Kobayashi,et al.  Efficient Non-Linear Control by Combining Q-learning with Local Linear Controllers , 1999, ICML.

[13]  Scott Moore Applying Online Search Techniques to Reinforcement Learning , 1998 .

[14]  Jette Randløv,et al.  Learning Macro-Actions in Reinforcement Learning , 1998, NIPS.

[15]  Andrew G. Barto,et al.  Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..

[16]  Andrew W. Moore,et al.  Variable Resolution Discretization for High-Accuracy Solutions of Optimal Control Problems , 1999, IJCAI.

[17]  Peter D. Lawrence,et al.  Transition Point Dynamic Programming , 1993, NIPS.

[18]  Thomas Dean,et al.  Decomposition Techniques for Planning in Stochastic Domains , 1995, IJCAI.

[19]  Thomas G. Dietterich Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[20]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[21]  Martin A. Riedmiller,et al.  High Quality Thermostat Control by Reinforcement Learning - A Case Study , 1998 .

[22]  Richard S. Sutton,et al.  TD Models: Modeling the World at a Mixture of Time Scales , 1995, ICML.

[23]  Longxin Lin Self-Improving Reactive Agents Based on Reinforcement Learning, Planning and Teaching , 2004, Machine Learning.

[24]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[25]  Michael I. Jordan,et al.  MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .

[26]  Richard E. Korf,et al.  Macro-Operators: A Weak Method for Learning , 1985, Artif. Intell..

[27]  Martin A. Riedmiller,et al.  Learning to Control at Multiple Time Scales , 2003, ICANN.

[28]  Richard S. Sutton,et al.  Roles of Macro-Actions in Accelerating Reinforcement Learning , 1998 .

[29]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[30]  Satinder P. Singh,et al.  Scaling Reinforcement Learning Algorithms by Learning Variable Temporal Resolution Models , 1992, ML.

[31]  Chris Watkins,et al.  Learning from delayed rewards , 1989 .

[32]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[33]  András Lörincz,et al.  Module-Based Reinforcement Learning: Experiments with a Real Robot , 1998, Machine Learning.

[34]  Steven Douglas Whitehead,et al.  Reinforcement learning for the adaptive control of perception and action , 1992 .

[35]  Scott Davies,et al.  Multidimensional Triangulation and Interpolation for Reinforcement Learning , 1996, NIPS.

[36]  Andrew G. Barto,et al.  Automatic Discovery of Subgoals in Reinforcement Learning using Diverse Density , 2001, ICML.

[37]  Glenn A. Iba,et al.  A heuristic approach to the discovery of macro-operators , 2004, Machine Learning.

[38]  Ronald E. Parr,et al.  Hierarchical control and learning for markov decision processes , 1998 .