Lyapunov Design for Safe Reinforcement Learning
暂无分享,去创建一个
[1] R. Kalman,et al. Control system analysis and design via the second method of lyapunov: (I) continuous-time systems (II) discrete time systems , 1959 .
[2] R. E. Kalman,et al. Control System Analysis and Design Via the “Second Method” of Lyapunov: I—Continuous-Time Systems , 1960 .
[3] R. E. Kalman,et al. Control System Analysis and Design Via the “Second Method” of Lyapunov: II—Discrete-Time Systems , 1960 .
[4] D. Sworder,et al. Introduction to stochastic control , 1972 .
[5] Stephen Grossberg,et al. Absolute stability of global pattern formation and parallel memory storage by competitive neural networks , 1983, IEEE Transactions on Systems, Man, and Cybernetics.
[6] Francis L. Merat,et al. Introduction to robotics: Mechanics and control , 1987, IEEE J. Robotics Autom..
[7] Eduardo D. Sontag,et al. Mathematical Control Theory: Deterministic Finite Dimensional Systems , 1990 .
[8] Daniel E. Koditschek,et al. Exact robot navigation using artificial potential functions , 1992, IEEE Trans. Robotics Autom..
[9] W. Fleming,et al. Controlled Markov processes and viscosity solutions , 1992 .
[10] Vijaykumar Gullapalli,et al. Reinforcement learning and its application to control , 1992 .
[11] Roderic A. Grupen,et al. Robust Reinforcement Learning in Motion Planning , 1993, NIPS.
[12] Richard L. Tweedie,et al. Markov Chains and Stochastic Stability , 1993, Communications and Control Engineering Series.
[13] Roderic A. Grupen,et al. The applications of harmonic functions to robotics , 1993, J. Field Robotics.
[14] Gerald Tesauro,et al. TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.
[15] Oren Etzioni,et al. The First Law of Robotics (A Call to Arms) , 1994, AAAI.
[16] Gerald Tesauro,et al. Temporal Difference Learning and TD-Gammon , 1995, J. Int. Comput. Games Assoc..
[17] Mark W. Spong,et al. The swing up control problem for the Acrobot , 1995 .
[18] Gerald Tesauro,et al. Temporal difference learning and TD-Gammon , 1995, CACM.
[19] Miroslav Krstic,et al. Nonlinear and adaptive control de-sign , 1995 .
[20] Thomas G. Dietterich,et al. High-Performance Job-Shop Scheduling With A Time-Delay TD(λ) Network , 1995, NIPS 1995.
[21] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[22] R. Freeman,et al. Robust Nonlinear Control Design: State-Space and Lyapunov Techniques , 1996 .
[23] Jeff G. Schneider,et al. Exploiting Model Uncertainty Estimates for Safe Dynamic Control Learning , 1996, NIPS.
[24] Gary Boone,et al. Efficient reinforcement learning: model-based Acrobot control , 1997, Proceedings of International Conference on Robotics and Automation.
[25] Gary Boone,et al. Minimum-time control of the Acrobot , 1997, Proceedings of International Conference on Robotics and Automation.
[26] Roderic A. Grupen,et al. A feedback control structure for on-line learning tasks , 1997, Robotics Auton. Syst..
[27] R. Grupen. Learning Robot Control - Using Control Policies as Abstract Actions , 1998 .
[28] Roderic A. Grupen,et al. A Control Structure For Learning Locomotion Gaits , 1998 .
[29] Eduardo D. Sontag,et al. Mathematical control theory: deterministic finite dimensional systems (2nd ed.) , 1998 .
[30] John N. Tsitsiklis,et al. Optimal stopping of Markov processes: Hilbert space theory, approximation algorithms, and an application to pricing high-dimensional financial derivatives , 1999, IEEE Trans. Autom. Control..
[31] Geoffrey J. Gordon,et al. Approximate solutions to markov decision processes , 1999 .
[32] John N. Tsitsiklis,et al. A survey of computational complexity results in systems and control , 2000, Autom..
[33] Diana F. Gordon,et al. Asimovian Adaptive Agents , 2000, J. Artif. Intell. Res..
[34] Steven Seidman,et al. A synthesis of reinforcement learning and robust control theory , 2000 .
[35] H. Kushner. Numerical Methods for Stochastic Control Problems in Continuous Time , 2000 .
[36] Emilio Frazzoli,et al. Online techniques for behavioral programming , 2000, Proceedings of the 39th IEEE Conference on Decision and Control (Cat. No.00CH37187).
[37] Gerald DeJong,et al. Hidden Strengths and Limitations: An Empirical Investigation of Reinforcement Learning , 2000, ICML.
[38] Andrew G. Barto,et al. Heuristic Search in Infinite State Spaces Guided by Lyapunov Analysis , 2001, IJCAI.
[39] Andrew G. Barto,et al. Lyapunov-Constrained Action Sets for Reinforcement Learning , 2001, ICML.
[40] Theodore J. Perkins,et al. Lyapunov methods for safe intelligent agent design , 2002 .
[41] J. J. Hopfield,et al. “Neural” computation of decisions in optimization problems , 1985, Biological Cybernetics.
[42] Andrew W. Moore,et al. Variable Resolution Discretization in Optimal Control , 2002, Machine Learning.
[43] Andrew G. Barto,et al. Elevator Group Control Using Multiple Reinforcement Learning Agents , 1998, Machine Learning.