论文信息 - Reactive Reinforcement Learning in Asynchronous Environments

Reactive Reinforcement Learning in Asynchronous Environments

The relationship between a reinforcement learning (RL) agent and an asynchronous environment is often ignored. Frequently used models of the interaction between an agent and its environment, such as Markov Decision Processes (MDP) or Semi-Markov Decision Processes (SMDP), do not capture the fact that, in an asynchronous environment, the state of the environment may change during computation performed by the agent. In an asynchronous environment, minimizing reaction time—the time it takes for an agent to react to an observation—also minimizes the time in which the state of the environment may change following observation. In many environments, the reaction time of an agent directly impacts task performance by permitting the environment to transition into either an undesirable terminal state or a state where performing the chosen action is inappropriate. We propose a class of reactive reinforcement learning algorithms that address this problem of asynchronous environments by immediately acting after observing new state information. We compare a reactive SARSA learning algorithm with the conventional SARSA learning algorithm on two asynchronous robotic tasks (emergency stopping and impact prevention), and show that the reactive RL algorithm reduces the reaction time of the agent by approximately the duration of the algorithm's learning update. This new class of reactive algorithms may facilitate safer control and faster decision making without any change to standard learning guarantees.

[1] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[2] Gerald Tesauro,et al. TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.

[3] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .

[4] Peter Norvig,et al. Artificial Intelligence: A Modern Approach , 1995 .

[5] Andrew G. Barto,et al. Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..

[6] Peter Dayan,et al. Q-learning , 1992, Machine Learning.

[7] Tommi S. Jaakkola,et al. Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms , 2000, Machine Learning.

[8] Peter Dayan,et al. Technical Note: Q-Learning , 2004, Machine Learning.

[9] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[10] David E. Culler,et al. TinyOS: An Operating System for Sensor Networks , 2005, Ambient Intelligence.

[11] Brian Tanner,et al. RL-Glue: Language-Independent Software for Reinforcement-Learning Experiments , 2009, J. Mach. Learn. Res..

[12] Patrick M. Pilarski,et al. Horde: a scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction , 2011, AAMAS.

[13] Thomas Degris,et al. Scaling-up Knowledge for a Cognizant Robot , 2012, AAAI Spring Symposium: Designing Intelligent Robots.

[14] Peter Stone,et al. RTMBA: A Real-Time Model-Based Reinforcement Learning Architecture for robot control , 2011, 2012 IEEE International Conference on Robotics and Automation.

[15] Jan Peters,et al. Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[16] M. R. Dawson,et al. DEVELOPMENT OF THE BENTO ARM : AN IMPROVED ROBOTIC ARM FOR MYOELECTRIC TRAINING AND RESEARCH , 2014 .

[17] J. Mallow,et al. Superior memorizers employ different neural networks for encoding and recall , 2015, Front. Syst. Neurosci..

[18] P. Pilarski. Prosthetic Devices as Goal-Seeking Agents , 2015 .

[19] Wouter Caarls,et al. Parallel Online Temporal Difference Learning for Motor Control , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[20] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[21] Wei Gao,et al. Intention-Net: Integrating Planning and Deep Learning for Goal-Directed Autonomous Navigation , 2017, CoRL.

[22] M. Grassi,et al. Correction: Musicians have better memory than nonmusicians: A meta-analysis , 2018, PloS one.