论文信息 - Autonomous shaping: knowledge transfer in reinforcement learning - 字舞流文

Autonomous shaping: knowledge transfer in reinforcement learning

We introduce the use of learned shaping rewards in reinforcement learning tasks, where an agent uses prior experience on a sequence of tasks to learn a portable predictor that estimates intermediate rewards, resulting in accelerated learning in later tasks that are related but distinct. Such agents can be trained on a sequence of relatively easy tasks in order to develop a more informative measure of reward that can be transferred to improve performance on more difficult tasks without requiring a hand coded shaping function. We use a rod positioning task to show that this significantly improves performance even after a very brief training period.

Andrew G. Barto | George Konidaris | A. Barto | G. Konidaris

[1] Richard S. Sutton,et al. Training and Tracking in Robotics , 1985, IJCAI.

[2] Richard E. Korf,et al. Real-Time Heuristic Search , 1990, Artif. Intell..

[3] Vijaykumar Gullapalli,et al. Reinforcement learning and its application to control , 1992 .

[4] Sebastian Thrun,et al. Efficient Exploration In Reinforcement Learning , 1992 .

[5] Andrew W. Moore,et al. Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.

[6] Gerald Tesauro,et al. TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.

[7] Maja J. Mataric,et al. Reward Functions for Accelerated Learning , 1994, ICML.

[8] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .

[9] Sebastian Thrun,et al. Finding Structure in Reinforcement Learning , 1994, NIPS.

[10] Gerald Tesauro,et al. Temporal Difference Learning and TD-Gammon , 1995, J. Int. Comput. Games Assoc..

[11] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[12] Gavin Adrian Rummery. Problem solving with reinforcement learning , 1995 .

[13] Richard S. Sutton,et al. Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding , 1995, NIPS.

[14] Andrew G. Barto,et al. Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..

[15] Richard S. Sutton,et al. Reinforcement Learning with Replacing Eligibility Traces , 2005, Machine Learning.

[16] Gillian M. Hayes,et al. Robot Shaping --- Principles, Methods and Architectures , 1996 .

[17] Marco Colombetti,et al. Robot Shaping: An Experiment in Behavior Engineering , 1997 .

[18] Maja J. Mataric,et al. Reinforcement Learning in the Multi-Robot Domain , 1997, Auton. Robots.

[19] Sven Koenig,et al. Exploring Unknown Environments with Real-Time Search or Reinforcement Learning , 1998, NIPS.

[20] Benjamin Van Roy. Learning and value function approximation in complex decision processes , 1998 .

[21] Preben Alstrøm,et al. Learning to Drive a Bicycle Using Reinforcement Learning and Shaping , 1998, ICML.

[22] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[23] Daniel S. Bernstein,et al. Reusing Old Policies to Accelerate Learning on New MDPs , 1999 .

[24] Andrew G. Barto,et al. Combining Reinforcement Learning with a Local Control Algorithm , 2000, ICML.

[25] Manuela M. Veloso,et al. Layered Learning , 2000, ECML.

[26] James L. McClelland,et al. Autonomous Mental Development by Robots and Animals , 2001, Science.

[27] Andrew G. Barto,et al. PolicyBlocks: An Algorithm for Creating Useful Macro-Actions in Reinforcement Learning , 2002, ICML.

[28] Eric Wiewiora,et al. Potential-Based Shaping and Q-Value Initialization are Equivalent , 2003, J. Artif. Intell. Res..

[29] Richard S. Sutton,et al. Reinforcement learning with replacing eligibility traces , 2004, Machine Learning.

[30] Gerald Tesauro,et al. Practical issues in temporal difference learning , 1992, Machine Learning.

[31] Gillian M. Hayes,et al. Estimating Future Reward in Reinforcement Learning Animats using Associative Learning , 2004 .

[32] Andrew W. Moore,et al. Prioritized sweeping: Reinforcement learning with less data and less time , 2004, Machine Learning.

[33] Aude Billard,et al. Estimating Future Reward in Reinforcement Learning Animats using Associative Learning , 2004 .

[34] Peter Stone,et al. Value Functions for RL-Based Behavior Transfer: A Comparative Study , 2005, AAAI.

[35] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[36] Reid G. Simmons,et al. The Effect of Representation and Knowledge on Goal-Directed Exploration with Reinforcement-Learning Algorithms , 2005, Machine Learning.

[37] Sridhar Mahadevan,et al. Proto-value functions: developmental reinforcement learning , 2005, ICML.