A Comparative Analysis of Expected and Distributional Reinforcement Learning

Since their introduction a year ago, distributional approaches to reinforcement learning (distributional RL) have produced strong results relative to the standard approach which models expected values (expected RL). However, aside from convergence guarantees, there have been few theoretical results investigating the reasons behind the improvements distributional RL provides. In this paper we begin the investigation into this fundamental question by analyzing the differences in the tabular, linear approximation, and non-linear approximation settings. We prove that in many realizations of the tabular and linear approximation settings, distributional RL behaves exactly the same as expected RL. In cases where the two methods behave differently, distributional RL can in fact hurt performance when it does not induce identical behaviour. We then continue with an empirical analysis comparing distributional and expected RL methods in control settings with non-linear approximators to tease apart where the improvements from distributional RL methods are coming from.

[1]  Mahesan Niranjan,et al.  On-line Q-learning using connectionist systems , 1994 .

[2]  S. Ioffe,et al.  Temporal Differences-Based Policy Iteration and Applications in Neuro-Dynamic Programming , 1996 .

[3]  H. Thorisson Coupling, stationarity, and regeneration , 2000 .

[4]  Alison L Gibbs,et al.  On Choosing and Bounding Probability Metrics , 2002, math/0209021.

[5]  장윤희,et al.  Y. , 2003, Industrial and Labor Relations Terms.

[6]  Doina Precup,et al.  Methods for Computing State Similarity in Markov Decision Processes , 2006, UAI.

[7]  Eduardo F. Morales,et al.  An Introduction to Reinforcement Learning , 2011 .

[8]  George Konidaris,et al.  Value Function Approximation in Reinforcement Learning Using the Fourier Basis , 2011, AAAI.

[9]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[10]  Gorjan Alagic,et al.  #p , 2019, Quantum information & computation.

[11]  Léon Bottou,et al.  Wasserstein Generative Adversarial Networks , 2017, ICML.

[12]  Marc G. Bellemare,et al.  A Distributional Perspective on Reinforcement Learning , 2017, ICML.

[13]  Marc G. Bellemare,et al.  The Cramer Distance as a Solution to Biased Wasserstein Gradients , 2017, ArXiv.

[14]  Tom Schaul,et al.  Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.

[15]  Yee Whye Teh,et al.  An Analysis of Categorical Distributional Reinforcement Learning , 2018, AISTATS.

[16]  Martha White,et al.  Improving Regression Performance with Distributional Losses , 2018, ICML.

[17]  Rémi Munos,et al.  Implicit Quantile Networks for Distributional Reinforcement Learning , 2018, ICML.

[18]  Marc G. Bellemare,et al.  Distributional Reinforcement Learning with Quantile Regression , 2017, AAAI.

[19]  Nicolas Le Roux,et al.  Distributional reinforcement learning with linear function approximation , 2019, AISTATS.

[20]  Tsuyoshi Murata,et al.  {m , 1934, ACML.