Context Transfer in Reinforcement Learning Using Action-Value Functions

This paper discusses the notion of context transfer in reinforcement learning tasks. Context transfer, as defined in this paper, implies knowledge transfer between source and target tasks that share the same environment dynamics and reward function but have different states or action spaces. In other words, the agents learn the same task while using different sensors and actuators. This requires the existence of an underlying common Markov decision process (MDP) to which all the agents' MDPs can be mapped. This is formulated in terms of the notion of MDP homomorphism. The learning framework is Q-learning. To transfer the knowledge between these tasks, the feature space is used as a translator and is expressed as a partial mapping between the state-action spaces of different tasks. The Q-values learned during the learning process of the source tasks are mapped to the sets of Q-values for the target task. These transferred Q-values are merged together and used to initialize the learning process of the target task. An interval-based approach is used to represent and merge the knowledge of the source tasks. Empirical results show that the transferred initialization can be beneficial to the learning process of the target task.

[1]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[2]  Peter Stone,et al.  Cross-domain transfer for reinforcement learning , 2007, ICML '07.

[3]  Gavriel Salomon,et al.  T RANSFER OF LEARNING , 1992 .

[4]  Balaraman Ravindran,et al.  Symmetries and Model Minimization in Markov Decision Processes , 2001 .

[5]  Parviz Jabedar-Maralani,et al.  Relative sets and rough sets , 2001 .

[6]  Matthew L. Ginsberg,et al.  Readings in Nonmonotonic Reasoning , 1987, AAAI 1987.

[7]  Anestis Fachantidis,et al.  Knowledge transfer in reinforcement learning , 2016 .

[8]  Harm van Seijen,et al.  Switching between different state representations in reinforcement learning , 2008 .

[9]  Peter Stone,et al.  Representation Transfer for Reinforcement Learning , 2007, AAAI Fall Symposium: Computational Approaches to Representation Change during Learning and Development.

[10]  Peter Stone,et al.  Transferring Instances for Model-Based Reinforcement Learning , 2008, ECML/PKDD.

[11]  Amin Mousavi,et al.  Double-faced rough sets and rough communication , 2002, Inf. Sci..

[12]  Peter Stone,et al.  Transfer Learning via Inter-Task Mappings for Temporal Difference Learning , 2007, J. Mach. Learn. Res..

[13]  Andrew G. Barto,et al.  Autonomous shaping: knowledge transfer in reinforcement learning , 2006, ICML.

[14]  Vishal Soni,et al.  Using Homomorphisms to Transfer Options across Continuous Reinforcement Learning Domains , 2006, AAAI.

[15]  Peter Stone,et al.  Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[16]  Balaraman Ravindran,et al.  Relativized Options: Choosing the Right Transformation , 2003, ICML.

[17]  Matthew L. Ginsberg,et al.  Multivalued logics: a uniform approach to reasoning in artificial intelligence , 1988, Comput. Intell..

[18]  Balaraman Ravindran,et al.  Model Minimization in Hierarchical Reinforcement Learning , 2002, SARA.

[19]  Jude W. Shavlik,et al.  Relational Macros for Transfer in Reinforcement Learning , 2007, ILP.

[20]  P. W. Jones,et al.  Bandit Problems, Sequential Allocation of Experiments , 1987 .

[21]  Andrew G. Barto,et al.  Transfer in Reinforcement Learning via Shared Features , 2012, J. Mach. Learn. Res..