论文信息 - Prediction Driven Behavior: Learning Predictions that Drive Fixed Responses

Prediction Driven Behavior: Learning Predictions that Drive Fixed Responses

We introduce a new method for robot control that combines prediction learning with a fixed, crafted response---the robot learns to make a temporally-extended prediction during its normal operation, and the prediction is used to select actions as part of a fixed behavioral response. Our method is inspired by Pavlovian conditioning experiments in which an animal's behavior adapts as it learns to predict an event. Surprisingly the animal's behavior changes even in the absence of any benefit to the animal (i.e. the animal is not modifying its behavior to maximize reward). Our method for robot control combines a fixed response with online prediction learning, thereby producing an adaptive behavior. This method is different from standard non-adaptive control methods and also from adaptive reward-maximizing control methods. We show that this method improves upon the performance of two reactive controls, with visible benefits within 2.5 minutes of real-time learning on the robot. In the first experiment, the robot turns off its motors when it predicts a future over-current condition, which reduces the time spent in unsafe over-current conditions and improves efficiency. In the second experiment, the robot starts to move when it predicts a human-issued request, which reduces the apparent latency of the human-robot interface.

Richard S. Sutton | Joseph Modayil | R. Sutton | Joseph Modayil

[1] N. Mackintosh. The psychology of animal learning , 1974 .

[2] R. Rescorla. Pavlovian conditioning. It's not what you think it is. , 1988, The American psychologist.

[3] J. Morén,et al. Dynamics of a Classical Conditioning Model , 1998 .

[4] Patrick van der Smagt. Benchmarking cerebellar control Robotics and Autonomous Systems 32 (2000) 237-251 , 2000 .

[5] Benjamin Kuipers,et al. The Spatial Semantic Hierarchy , 2000, Artif. Intell..

[6] Patrick van der Smagt. Benchmarking cerebellar control , 2000, Robotics Auton. Syst..

[7] Richard S. Sutton,et al. Predictive Representations of State , 2001, NIPS.

[8] JOHN W. Moore. A Neuroscientist's Guide to Classical Conditioning , 2002 .

[9] E. Kehoe,et al. Fundamental Behavioral Methods and Findings in Classical Conditioning , 2002 .

[10] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[11] P. Dean,et al. Evidence from retractor bulbi EMG for linearized motor control of conditioned nictitating membrane responses. , 2007, Journal of neurophysiology.

[12] Richard S. Sutton,et al. Stimulus Representation and the Timing of Reward-Prediction Errors in Models of the Dopamine System , 2008, Neural Computation.

[13] Alborz Geramifard,et al. Dyna-Style Planning with Linear Function Approximation and Prioritized Sweeping , 2008, UAI.

[14] Francesco Mannella,et al. Navigation via Pavlovian conditioning: a robotic bio-constrained model of autoshaping in rats , 2009, EpiRob.

[15] G. Hesslow,et al. Learning Stimulus Intervals—Adaptive Timing of Conditioned Purkinje Cell Responses , 2011, The Cerebellum.

[16] Patrick M. Pilarski,et al. Horde: a scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction , 2011, AAMAS.

[17] Richard S. Sutton,et al. Scaling life-long off-policy learning , 2012, 2012 IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL).

[18] Richard S. Sutton,et al. Multi-timescale nexting in a reinforcement learning robot , 2011, Adapt. Behav..