Budgeted Knowledge Transfer for State-Wise Heterogeneous RL Agents

In this paper we introduce a budgeted knowledge transfer algorithm for non-homogeneous reinforcement learning agents. Here the source and the target agents are completely identical except in their state representations. The algorithm uses functional space (Q-value space) as the transfer-learning media. In this method, the target agent's functional points (Q-values) are estimated in an automatically selected lower-dimension subspace in order to accelerate knowledge transfer. The target agent searches that subspace using an exploration policy and selects actions accordingly during the period of its knowledge transfer in order to facilitate gaining an appropriate estimate of its Q-table. We show both analytically and empirically that this method decreases the required learning budget for the target agent.

[1]  Peter Stone,et al.  Value Functions for RL-Based Behavior Transfer: A Comparative Study , 2005, AAAI.

[2]  Vishal Soni,et al.  Using Homomorphisms to Transfer Options across Continuous Reinforcement Learning Domains , 2006, AAAI.

[3]  Peter Stone,et al.  Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[4]  Alan Fern,et al.  Multi-task reinforcement learning: a hierarchical Bayesian approach , 2007, ICML '07.

[5]  Jan Ramon,et al.  Transfer learning for reinforcement learning through goal and policy parametrization , 2006, ICML 2006.

[6]  Andrew W. Moore,et al.  Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time , 1993, Machine Learning.

[7]  Qiang Yang,et al.  Transfer Learning via Dimensionality Reduction , 2008, AAAI.

[8]  Peter Stone,et al.  Transferring Instances for Model-Based Reinforcement Learning , 2008, ECML/PKDD.

[9]  Shimon Whiteson,et al.  Transfer via inter-task mappings in policy search reinforcement learning , 2007, AAMAS '07.

[10]  Jude Shavlik,et al.  Chapter 11 Transfer Learning , 2009 .

[11]  Peter A. Flach,et al.  Evaluation Measures for Multi-class Subgroup Discovery , 2009, ECML/PKDD.

[12]  Masayuki Yamamura,et al.  Multitask reinforcement learning on the distribution of MDPs , 2003, Proceedings 2003 IEEE International Symposium on Computational Intelligence in Robotics and Automation. Computational Intelligence in Robotics and Automation for the New Millennium (Cat. No.03EX694).

[13]  Raymond J. Mooney,et al.  Mapping and Revising Markov Logic Networks for Transfer Learning , 2007, AAAI.