Visualizing Dynamics: from t-SNE to SEMI-MDPs

Deep Reinforcement Learning (DRL) is a trending field of research, showing great promise in many challenging problems such as playing Atari, solving Go and controlling robots. While DRL agents perform well in practice we are still missing the tools to analayze their performance and visualize the temporal abstractions that they learn. In this paper, we present a novel method that automatically discovers an internal Semi Markov Decision Process (SMDP) model in the Deep Q Network's (DQN) learned representation. We suggest a novel visualization method that represents the SMDP model by a directed graph and visualize it above a t-SNE map. We show how can we interpret the agent's policy and give evidence for the hierarchical state aggregation that DQNs are learning automatically. Our algorithm is fully automatic, does not require any domain specific knowledge and is evaluated by a novel likelihood based evaluation criteria.

[1]  Ronald Parr,et al.  Flexible Decomposition Algorithms for Weakly Coupled Markov Decision Problems , 1998, UAI.

[2]  Laurens van der Maaten,et al.  Accelerating t-SNE using tree-based algorithms , 2014, J. Mach. Learn. Res..

[3]  Shie Mannor,et al.  Graying the black box: Understanding DQNs , 2016, ICML.

[4]  Thomas G. Dietterich Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[5]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[6]  Doina Precup,et al.  Learning Options in Reinforcement Learning , 2002, SARA.

[7]  Eduardo D. Sontag,et al.  Adaptation and regulation with signal detection implies internal model , 2003, Syst. Control. Lett..

[8]  Thomas Dean,et al.  Decomposition Techniques for Planning in Stochastic Domains , 1995, IJCAI.

[9]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[10]  Aric Hagberg,et al.  Exploring Network Structure, Dynamics, and Function using NetworkX , 2008 .

[11]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[12]  Shie Mannor,et al.  Model selection in markovian processes , 2013, KDD.

[13]  Long-Ji Lin,et al.  Reinforcement learning for robots using neural networks , 1992 .

[14]  J. Doyle,et al.  Robust perfect adaptation in bacterial chemotaxis through integral feedback control. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[15]  W. Wonham,et al.  The internal model principle for linear multivariable regulators , 1975 .

[16]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[17]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[18]  Shie Mannor,et al.  A Deep Hierarchical Approach to Lifelong Learning in Minecraft , 2016, AAAI.

[19]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[20]  Razvan Pascanu,et al.  Policy Distillation , 2015, ICLR.

[21]  J. H. Ward Hierarchical Grouping to Optimize an Objective Function , 1963 .

[22]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[23]  Melba M. Crawford,et al.  Manifold-Learning-Based Feature Extraction for Classification of Hyperspectral Data: A Review of Advances in Manifold Learning , 2014, IEEE Signal Processing Magazine.

[24]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[25]  Joshua B. Tenenbaum,et al.  Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation , 2016, NIPS.

[26]  Milos Hauskrecht,et al.  Hierarchical Solution of Markov Decision Processes using Macro-actions , 1998, UAI.

[27]  Pascal Vincent,et al.  Visualizing Higher-Layer Features of a Deep Network , 2009 .

[28]  Jingzhou Liu,et al.  Visualizing Large-scale and High-dimensional Data , 2016, WWW.

[29]  W. Wonham,et al.  The internal model principle for linear multivariable regulators , 1975 .

[30]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .