Autonomous shaping: knowledge transfer in reinforcement learning

We introduce the use of learned shaping rewards in reinforcement learning tasks, where an agent uses prior experience on a sequence of tasks to learn a portable predictor that estimates intermediate rewards, resulting in accelerated learning in later tasks that are related but distinct. Such agents can be trained on a sequence of relatively easy tasks in order to develop a more informative measure of reward that can be transferred to improve performance on more difficult tasks without requiring a hand coded shaping function. We use a rod positioning task to show that this significantly improves performance even after a very brief training period.

[1]  Richard S. Sutton,et al.  Training and Tracking in Robotics , 1985, IJCAI.

[2]  Richard E. Korf,et al.  Real-Time Heuristic Search , 1990, Artif. Intell..

[3]  Vijaykumar Gullapalli,et al.  Reinforcement learning and its application to control , 1992 .

[4]  Sebastian Thrun,et al.  Efficient Exploration In Reinforcement Learning , 1992 .

[5]  Andrew W. Moore,et al.  Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.

[6]  Gerald Tesauro,et al.  TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.

[7]  Maja J. Mataric,et al.  Reward Functions for Accelerated Learning , 1994, ICML.

[8]  Mahesan Niranjan,et al.  On-line Q-learning using connectionist systems , 1994 .

[9]  Sebastian Thrun,et al.  Finding Structure in Reinforcement Learning , 1994, NIPS.

[10]  Gerald Tesauro,et al.  Temporal Difference Learning and TD-Gammon , 1995, J. Int. Comput. Games Assoc..

[11]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[12]  Gavin Adrian Rummery Problem solving with reinforcement learning , 1995 .

[13]  Richard S. Sutton,et al.  Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding , 1995, NIPS.

[14]  Andrew G. Barto,et al.  Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..

[15]  Richard S. Sutton,et al.  Reinforcement Learning with Replacing Eligibility Traces , 2005, Machine Learning.

[16]  Gillian M. Hayes,et al.  Robot Shaping --- Principles, Methods and Architectures , 1996 .

[17]  Marco Colombetti,et al.  Robot Shaping: An Experiment in Behavior Engineering , 1997 .

[18]  Maja J. Mataric,et al.  Reinforcement Learning in the Multi-Robot Domain , 1997, Auton. Robots.

[19]  Sven Koenig,et al.  Exploring Unknown Environments with Real-Time Search or Reinforcement Learning , 1998, NIPS.

[20]  Benjamin Van Roy Learning and value function approximation in complex decision processes , 1998 .

[21]  Preben Alstrøm,et al.  Learning to Drive a Bicycle Using Reinforcement Learning and Shaping , 1998, ICML.

[22]  Andrew Y. Ng,et al.  Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[23]  Daniel S. Bernstein,et al.  Reusing Old Policies to Accelerate Learning on New MDPs , 1999 .

[24]  Andrew G. Barto,et al.  Combining Reinforcement Learning with a Local Control Algorithm , 2000, ICML.

[25]  Manuela M. Veloso,et al.  Layered Learning , 2000, ECML.

[26]  James L. McClelland,et al.  Autonomous Mental Development by Robots and Animals , 2001, Science.

[27]  Andrew G. Barto,et al.  PolicyBlocks: An Algorithm for Creating Useful Macro-Actions in Reinforcement Learning , 2002, ICML.

[28]  Eric Wiewiora,et al.  Potential-Based Shaping and Q-Value Initialization are Equivalent , 2003, J. Artif. Intell. Res..

[29]  Richard S. Sutton,et al.  Reinforcement learning with replacing eligibility traces , 2004, Machine Learning.

[30]  Gerald Tesauro,et al.  Practical issues in temporal difference learning , 1992, Machine Learning.

[31]  Gillian M. Hayes,et al.  Estimating Future Reward in Reinforcement Learning Animats using Associative Learning , 2004 .

[32]  Andrew W. Moore,et al.  Prioritized sweeping: Reinforcement learning with less data and less time , 2004, Machine Learning.

[33]  Aude Billard,et al.  Estimating Future Reward in Reinforcement Learning Animats using Associative Learning , 2004 .

[34]  Peter Stone,et al.  Value Functions for RL-Based Behavior Transfer: A Comparative Study , 2005, AAAI.

[35]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[36]  Reid G. Simmons,et al.  The Effect of Representation and Knowledge on Goal-Directed Exploration with Reinforcement-Learning Algorithms , 2005, Machine Learning.

[37]  Sridhar Mahadevan,et al.  Proto-value functions: developmental reinforcement learning , 2005, ICML.