On The Transferability of Deep-Q Networks

Transfer Learning (TL) is an efficient machine learning paradigm that allows overcoming some of the hurdles that characterize the successful training of deep neural networks, ranging from long training times to the needs of large datasets. While exploiting TL is a well established and successful training practice in Supervised Learning (SL), its applicability in Deep Reinforcement Learning (DRL) is rarer. In this paper, we study the level of transferability of three different variants of Deep-Q Networks on popular DRL benchmarks as well as on a set of novel, carefully designed control tasks. Our results show that transferring neural networks in a DRL context can be particularly challenging and is a process which in most cases results in negative transfer. In the attempt of understanding why Deep-Q Networks transfer so poorly, we gain novel insights into the training dynamics that characterizes this family of algorithms.

[1]  Philip Bachman,et al.  Deep Reinforcement Learning that Matters , 2017, AAAI.

[2]  Jürgen Schmidhuber,et al.  World Models , 2018, ArXiv.

[3]  Lambert Schomaker,et al.  Deep Learning Policy Quantization , 2018, ICAART.

[4]  Gilles Louppe,et al.  The Deep Quality-Value Family of Deep Reinforcement Learning Algorithms , 2020, 2020 International Joint Conference on Neural Networks (IJCNN).

[5]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[6]  Marlos C. Machado,et al.  Generalization and Regularization in DQN , 2018, ArXiv.

[7]  Pablo Samuel Castro,et al.  Revisiting Rainbow: Promoting more insightful and inclusive deep reinforcement learning research , 2021, ICML.

[8]  Sergey Levine,et al.  Model-Based Reinforcement Learning for Atari , 2019, ICLR.

[9]  Mohammad Norouzi,et al.  Dream to Control: Learning Behaviors by Latent Imagination , 2019, ICLR.

[10]  Tengyu Ma,et al.  A Model-based Approach for Sample-efficient Multi-task Reinforcement Learning , 2019, ArXiv.

[11]  Marc G. Bellemare,et al.  The Arcade Learning Environment: An Evaluation Platform for General Agents (Extended Abstract) , 2012, IJCAI.

[12]  Matthia Sabatelli,et al.  Fractional Transfer Learning for Deep Model-Based Reinforcement Learning , 2021, ArXiv.

[13]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[14]  Alexei A. Efros,et al.  What makes ImageNet good for transfer learning? , 2016, ArXiv.

[15]  Ali Farhadi,et al.  Target-driven visual navigation in indoor scenes using deep reinforcement learning , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[16]  Raphaël Marée,et al.  Comparison of Deep Transfer Learning Strategies for Digital Pathology , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[17]  Yoon-Chul Kim,et al.  Evaluation of transfer learning in deep convolutional neural network models for cardiac short axis slice classification , 2021, Scientific Reports.

[18]  Pieter Abbeel,et al.  Improving Computational Efficiency in Visual Reinforcement Learning via Stored Embeddings , 2021, NeurIPS.

[19]  Sergey Levine,et al.  High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.

[20]  Tom Schaul,et al.  Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.

[21]  Alessandro Lazaric,et al.  Transfer in Reinforcement Learning: A Framework and a Survey , 2012, Reinforcement Learning.

[22]  Peter Henderson,et al.  An Introduction to Deep Reinforcement Learning , 2018, Found. Trends Mach. Learn..

[23]  Ruslan Salakhutdinov,et al.  Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning , 2015, ICLR.

[24]  D. Gerdes,et al.  Transfer learning for galaxy morphology from one survey to another , 2018, Monthly Notices of the Royal Astronomical Society.

[25]  Haitao Wang,et al.  Deep reinforcement learning with experience replay based on SARSA , 2016, 2016 IEEE Symposium Series on Computational Intelligence (SSCI).

[26]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[27]  Alex Graves,et al.  Recurrent Models of Visual Attention , 2014, NIPS.

[28]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[29]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[30]  U. Rieder,et al.  Markov Decision Processes , 2010 .

[31]  Thomas Mensink,et al.  Factors of Influence for Transfer Learning Across Diverse Appearance Domains and Task Types , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Sarah L. Dance,et al.  Deep learning for automated river-level monitoring through river-camera images: an approach based on water segmentation and transfer learning , 2021, Hydrology and Earth System Sciences.

[33]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[34]  Hui Xiong,et al.  A Comprehensive Survey on Transfer Learning , 2019, Proceedings of the IEEE.

[35]  Peter Stone,et al.  Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[36]  Empirical Analysis of Policy Gradient Algorithms where Starting States are Sampled accordingly to Most Frequently Visited States , 2020 .

[37]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[38]  Herke van Hoof,et al.  Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.

[39]  Ruben Villegas,et al.  Learning Latent Dynamics for Planning from Pixels , 2018, ICML.

[40]  Marcello Restelli,et al.  Transfer of Value Functions via Variational Methods , 2018, NeurIPS.

[41]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[42]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[43]  Walter Daelemans,et al.  Deep Transfer Learning for Art Classification Problems , 2018, ECCV Workshops.

[44]  Nando de Freitas,et al.  Sample Efficient Actor-Critic with Experience Replay , 2016, ICLR.

[45]  Gilles Louppe,et al.  Deep Quality-Value (DQV) Learning , 2019, BNAIC/BENELEARN.

[46]  Mohammad Norouzi,et al.  Mastering Atari with Discrete World Models , 2020, ICLR.

[47]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[48]  Jacob Tyo,et al.  How Transferable are the Representations Learned by Deep Q Agents? , 2020, ArXiv.