论文信息 - Lyapunov-Constrained Action Sets for Reinforcement Learning

Lyapunov-Constrained Action Sets for Reinforcement Learning

Lyapunov analysis is a standard approach to studying the stability of dynamical systems and to designing controllers. We propose to design the actions of a reinforcement learning (RL) agent to be descending on a Lyapunov function. For minimum cost-to-target problems, this has the theoretical benefit of guaranteeing that the agent will reach a goal state on every trial, regardless of the RL algorithm it uses. In practice, Lyapunov-descent constraints can significantly shorten learning trials, improve initial and worst-case performance, and accelerate learning. Although this method of constraining actions may limit the extent to which an RL agent can minimize cost, it allows one to construct robust RL systems for problems in which Lyapunov domain knowledge is available. This includes many important individual problems as well as general classes of problems, such as the control of feedback linearizable systems (e.g., industrial robots) and continuous-state path-planning problems. We demonstrate the general approach on two simulated control problems: pendulum swing-up and robot arm control.

Andrew G. Barto | Theodore J. Perkins | A. Barto | T. Perkins

[1] R. E. Kalman,et al. Control System Analysis and Design Via the “Second Method” of Lyapunov: II—Discrete-Time Systems , 1960 .

[2] Francis L. Merat,et al. Introduction to robotics: Mechanics and control , 1987, IEEE J. Robotics Autom..

[3] W. Grantham,et al. Lyapunov optimal feedback control of a nonlinear inverted pendulum , 1989 .

[4] Daniel E. Koditschek,et al. Exact robot navigation using artificial potential functions , 1992, IEEE Trans. Robotics Autom..

[5] Roderic A. Grupen,et al. Robust Reinforcement Learning in Motion Planning , 1993, NIPS.

[6] Roderic A. Grupen,et al. The applications of harmonic functions to robotics , 1993, J. Field Robotics.

[7] Gerald Tesauro,et al. TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.

[8] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..

[9] Richard S. Sutton,et al. Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding , 1995, NIPS.

[10] Paul E. Utgoff,et al. On integrating apprentice learning and reinforcement learning , 1996 .

[11] Gary Boone,et al. Efficient reinforcement learning: model-based Acrobot control , 1997, Proceedings of International Conference on Robotics and Automation.

[12] Gary Boone,et al. Minimum-time control of the Acrobot , 1997, Proceedings of International Conference on Robotics and Automation.

[13] Ronald E. Parr,et al. Hierarchical control and learning for markov decision processes , 1998 .

[14] Stuart J. Russell,et al. Bayesian Q-Learning , 1998, AAAI/IAAI.

[15] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[16] Thomas G. Dietterich. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[17] Doina Precup,et al. Temporal abstraction in reinforcement learning , 2000, ICML 2000.

[18] Gerald DeJong,et al. Hidden Strengths and Limitations: An Empirical Investigation of Reinforcement Learning , 2000, ICML.

[19] Jude W. Shavlik,et al. Creating Advice-Taking Reinforcement Learners , 1998, Machine Learning.