A comparison of direct and model-based reinforcement learning

This paper compares direct reinforcement learning (no explicit model) and model-based reinforcement learning on a simple task: pendulum swing up. We find that in this task model-based approaches support reinforcement learning from smaller amounts of training data and efficient handling of changing goals.

[1]  K. J. Craik,et al.  The nature of explanation , 1944 .

[2]  Robert E. Larson,et al.  State increment dynamic programming , 1968 .

[3]  Michael I. Jordan,et al.  Learning to Control an Unstable System with Forward Modeling , 1989, NIPS.

[4]  Vijaykumar Gullapalli,et al.  A stochastic reinforcement learning algorithm for learning real-valued functions , 1990, Neural Networks.

[5]  Richard S. Sutton,et al.  Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[6]  Richard S. Sutton,et al.  Reinforcement Learning is Direct Adaptive Optimal Control , 1992, 1991 American Control Conference.

[7]  Peter J. Millington,et al.  Associative reinforcement learning for optimal control , 1991 .

[8]  Richard S. Sutton,et al.  Dyna, an integrated architecture for learning, planning, and reacting , 1990, SGAR.

[9]  Richard S. Sutton,et al.  Planning by Incremental Dynamic Programming , 1991, ML.

[10]  Steven J. Bradtke,et al.  Reinforcement Learning Applied to Linear Quadratic Regulation , 1992, NIPS.

[11]  R. J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[12]  C. Atkeson,et al.  Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time , 1993, Machine Learning.

[13]  Christopher G. Atkeson,et al.  Using Local Trajectory Optimizers to Speed Up Global Optimization in Dynamic Programming , 1993, NIPS.

[14]  Andrew W. Moore,et al.  The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces , 2004, Machine Learning.

[15]  Leemon C Baird,et al.  Reinforcement Learning With High-Dimensional, Continuous Actions , 1993 .

[16]  R. E. Eckmiller,et al.  To swing up an inverted Pendulum using stochastic real-valued Reinforcement Learning , 1994 .

[17]  Andrew G. Barto,et al.  Adaptive linear quadratic control using policy iteration , 1994, Proceedings of 1994 American Control Conference - ACC '94.

[18]  Claude-Nicolas Fiechter,et al.  Efficient reinforcement learning , 1994, COLT '94.

[19]  Gerald Tesauro,et al.  Temporal Difference Learning and TD-Gammon , 1995, J. Int. Comput. Games Assoc..

[20]  Mark W. Spong,et al.  The swing up control problem for the Acrobot , 1995 .

[21]  Kenji Doya,et al.  Temporal Difference Learning in Continuous Time and Space , 1995, NIPS.

[22]  Gerald Tesauro,et al.  Temporal difference learning and TD-Gammon , 1995, CACM.

[23]  Andrew G. Barto,et al.  Improving Elevator Performance Using Reinforcement Learning , 1995, NIPS.

[24]  Wei Zhang,et al.  A Reinforcement Learning Approach to job-shop Scheduling , 1995, IJCAI.

[25]  Richard S. Sutton,et al.  Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding , 1995, NIPS.

[26]  Andrew G. Barto,et al.  Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..

[27]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[28]  John Rust Numerical dynamic programming in economics , 1996 .

[29]  M. Hasselmo,et al.  Temporal Diierence Learning in Continuous Time and Space , 1996 .

[30]  H. Bersini,et al.  Three connectionist implementations of dynamic programming for optimal control: a preliminary comparative analysis , 1996, Proceedings of International Workshop on Neural Networks for Identification, Control, Robotics and Signal/Image Processing.

[31]  Gary Boone,et al.  Efficient reinforcement learning: model-based Acrobot control , 1997, Proceedings of International Conference on Robotics and Automation.

[32]  Ashwin Ram,et al.  Experiments with Reinforcement Learning in Problems with Continuous State and Action Spaces , 1997, Adapt. Behav..

[33]  Stefan Schaal,et al.  Learning tasks from a single demonstration , 1997, Proceedings of International Conference on Robotics and Automation.

[34]  Edward Grant,et al.  Learning Control , 1993, Encyclopedia of Machine Learning.