On Inductive Biases in Deep Reinforcement Learning

Many deep reinforcement learning algorithms contain inductive biases that sculpt the agent's objective and its interface to the environment. These inductive biases can take many forms, including domain knowledge and pretuned hyper-parameters. In general, there is a trade-off between generality and performance when algorithms use such biases. Stronger biases can lead to faster learning, but weaker biases can potentially lead to more general algorithms. This trade-off is important because inductive biases are not free; substantial effort may be required to obtain relevant domain knowledge or to tune hyper-parameters effectively. In this paper, we re-examine several domain-specific components that bias the objective and the environmental interface of common deep reinforcement learning agents. We investigated whether the performance deteriorates when these components are replaced with adaptive solutions from the literature. In our experiments, performance sometimes decreased with the adaptive components, as one might expect when comparing to components crafted for the domain, but sometimes the adaptive components performed better. We investigated the main benefit of having fewer domain-specific components, by comparing the learning performance of the two systems on a different set of continuous control problems, without additional tuning of either system. As hypothesized, the system with adaptive components performed better on many of the new tasks.

[1]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[2]  Doina Precup,et al.  The Option-Critic Architecture , 2016, AAAI.

[3]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[4]  Jürgen Schmidhuber,et al.  HQ-Learning , 1997, Adapt. Behav..

[5]  Yoshua Bengio,et al.  Gradient-Based Optimization of Hyperparameters , 2000, Neural Computation.

[6]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[7]  Balaraman Ravindran,et al.  Dynamic Action Repetition for Deep Reinforcement Learning , 2017, AAAI.

[8]  Marlos C. Machado,et al.  Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents , 2017, J. Artif. Intell. Res..

[9]  David Budden,et al.  Distributed Prioritized Experience Replay , 2018, ICLR.

[10]  Demis Hassabis,et al.  Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm , 2017, ArXiv.

[11]  Richard S. Sutton,et al.  Adapting Bias by Gradient Descent: An Incremental Version of Delta-Bar-Delta , 1992, AAAI.

[12]  Wojciech Czarnecki,et al.  Multi-task Deep Reinforcement Learning with PopArt , 2018, AAAI.

[13]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[14]  Razvan Pascanu,et al.  Understanding the exploding gradient problem , 2012, ArXiv.

[15]  Tom Schaul,et al.  Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.

[16]  Doina Precup,et al.  Intra-Option Learning about Temporally Abstract Actions , 1998, ICML.

[17]  David Silver,et al.  Learning values across many orders of magnitude , 2016, NIPS.

[18]  Amir Massoud Farahmand,et al.  Action-Gap Phenomenon in Reinforcement Learning , 2011, NIPS.

[19]  Marc G. Bellemare,et al.  Increasing the Action Gap: New Operators for Reinforcement Learning , 2015, AAAI.

[20]  Shane Legg,et al.  IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.

[21]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[22]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[23]  Sergey Levine,et al.  Learning Hand-Eye Coordination for Robotic Grasping with Large-Scale Data Collection , 2016, ISER.

[24]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[25]  David Silver,et al.  Meta-Gradient Reinforcement Learning , 2018, NeurIPS.

[26]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[27]  Alexei A. Efros,et al.  Investigating Human Priors for Playing Video Games , 2018, ICML.

[28]  Tom Schaul,et al.  Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.

[29]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[30]  Katja Hofmann,et al.  The Malmo Platform for Artificial Intelligence Experimentation , 2016, IJCAI.

[31]  Geoffrey E. Hinton,et al.  Feudal Reinforcement Learning , 1992, NIPS.

[32]  Katherine D. Kinzler,et al.  Core knowledge. , 2007, Developmental science.

[33]  Marc G. Bellemare,et al.  The Arcade Learning Environment: An Evaluation Platform for General Agents (Extended Abstract) , 2012, IJCAI.