GELATO: Geometrically Enriched Latent Model for Offline Reinforcement Learning

Offline reinforcement learning approaches can generally be divided to proximal and uncertaintyaware methods. In this work, we demonstrate the benefit of combining the two in a latent variational model. We impose a latent representation of states and actions and leverage its intrinsic Riemannian geometry to measure distance of latent samples to the data. Our proposed metrics measure both the quality of out of distribution samples as well as the discrepancy of examples in the data. We integrate our metrics in a model-based offline optimization framework, in which proximity and uncertainty can be carefully controlled. We illustrate the geodesics on a simple grid-like environment, depicting its natural inherent topology. Finally, we analyze our approach and improve upon contemporary offline RL benchmarks.

[1]  Lantao Yu,et al.  MOPO: Model-based Offline Policy Optimization , 2020, NeurIPS.

[2]  Oriol Vinyals,et al.  Neural Discrete Representation Learning , 2017, NIPS.

[3]  S. Levine,et al.  Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems , 2020, ArXiv.

[4]  Pieter Abbeel,et al.  Decoupling Representation Learning from Reinforcement Learning , 2020, ICML.

[5]  Philip S. Thomas,et al.  Learning Action Representations for Reinforcement Learning , 2019, ICML.

[6]  Martin A. Riedmiller Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.

[7]  Thorsten Joachims,et al.  MOReL : Model-Based Offline Reinforcement Learning , 2020, NeurIPS.

[8]  Justin Bayer,et al.  Fast Approximate Geodesics for Deep Generative Models , 2018, ICANN.

[9]  Doina Precup,et al.  Off-Policy Deep Reinforcement Learning without Exploration , 2018, ICML.

[10]  S. Levine,et al.  Conservative Q-Learning for Offline Reinforcement Learning , 2020, NeurIPS.

[11]  Eyke Hüllermeier,et al.  Reliable classification: Learning classifiers that distinguish aleatoric and epistemic uncertainty , 2014, Inf. Sci..

[12]  Louis Wehenkel,et al.  Batch mode reinforcement learning based on the synthesis of artificial trajectories , 2013, Ann. Oper. Res..

[13]  Soren Hauberg,et al.  Variational Autoencoders with Riemannian Brownian Motion Priors , 2020, ICML.

[14]  Max Welling,et al.  Semi-supervised Learning with Deep Generative Models , 2014, NIPS.

[15]  Jan Kautz,et al.  NVAE: A Deep Hierarchical Variational Autoencoder , 2020, NeurIPS.

[16]  Karol Hausman,et al.  Learning an Embedding Space for Transferable Robot Skills , 2018, ICLR.

[17]  Lars Kai Hansen,et al.  Latent Space Oddity: on the Curvature of Deep Generative Models , 2017, ICLR.

[18]  Mohammad Norouzi,et al.  An Optimistic Perspective on Offline Reinforcement Learning , 2020, ICML.

[19]  Charles Blundell,et al.  Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.

[20]  Haitham Bou-Ammar,et al.  Multi-View Reinforcement Learning , 2019, NeurIPS.

[21]  Peter Auer,et al.  Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..

[22]  Che Wang,et al.  BAIL: Best-Action Imitation Learning for Batch Deep Reinforcement Learning , 2019, NeurIPS.

[23]  Sergey Levine,et al.  D4RL: Datasets for Deep Data-Driven Reinforcement Learning , 2020, ArXiv.

[24]  Benjamin Van Roy,et al.  Deep Exploration via Bootstrapped DQN , 2016, NIPS.

[25]  Sergey Levine,et al.  Diagnosing Bottlenecks in Deep Q-learning Algorithms , 2019, ICML.

[26]  Shie Mannor,et al.  Learning Embedded Maps of Markov Processes , 2001, ICML.

[27]  Zhuowen Tu,et al.  Guided Variational Autoencoder for Disentanglement Learning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Yoshua Bengio,et al.  Revisiting Fundamentals of Experience Replay , 2020, ICML.

[29]  Bernhard Schölkopf,et al.  Geometrically Enriched Latent Spaces , 2020, AISTATS.

[30]  Pieter Abbeel,et al.  Variational Lossy Autoencoder , 2016, ICLR.

[31]  Joelle Pineau,et al.  A Hierarchical Latent Variable Encoder-Decoder Model for Generating Dialogues , 2016, AAAI.

[32]  Patrick van der Smagt,et al.  Learning Hierarchical Priors in VAEs , 2019, NeurIPS.

[33]  Patrick van der Smagt,et al.  Learning Flat Latent Manifolds with VAEs , 2020, ICML.

[34]  Ian Osband,et al.  The Uncertainty Bellman Equation and Exploration , 2017, ICML.

[35]  Pierre Geurts,et al.  Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..

[36]  Sergey Levine,et al.  Simple and Effective VAE Training with Calibrated Decoders , 2020, ICML.

[37]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[38]  Sergey Levine,et al.  Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction , 2019, NeurIPS.

[39]  Sergey Levine,et al.  When to Trust Your Model: Model-Based Policy Optimization , 2019, NeurIPS.

[40]  Ruosong Wang,et al.  What are the Statistical Limits of Offline RL with Linear Function Approximation? , 2020, ICLR.

[41]  Jasper Snoek,et al.  Likelihood Ratios for Out-of-Distribution Detection , 2019, NeurIPS.

[42]  Lars Kai Hansen,et al.  Maximum Likelihood Estimation of Riemannian Metrics from Euclidean Data , 2017, GSI.

[43]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[44]  Yu Zhang,et al.  Unsupervised Learning of Disentangled and Interpretable Representations from Sequential Data , 2017, NIPS.

[45]  Mohammad Norouzi,et al.  Dream to Control: Learning Behaviors by Latent Imagination , 2019, ICLR.

[46]  Jordi Grau-Moya,et al.  Disentangled Skill Embeddings for Reinforcement Learning , 2019, ArXiv.

[47]  Andrea Zanette,et al.  Exponential Lower Bounds for Batch Reinforcement Learning: Batch RL can be Exponentially Harder than Online RL , 2020, ICML.

[48]  Shie Mannor,et al.  The Natural Language of Actions , 2019, ICML.

[49]  Marc G. Bellemare,et al.  The Importance of Pessimism in Fixed-Dataset Policy Optimization , 2020, ArXiv.

[50]  David Lopez-Paz,et al.  Optimizing the Latent Space of Generative Networks , 2017, ICML.

[51]  Adam Roberts,et al.  Latent Constraints: Learning to Generate Conditionally from Unconditional Generative Models , 2017, ICLR.

[52]  Yoshua Bengio,et al.  NICE: Non-linear Independent Components Estimation , 2014, ICLR.

[53]  Steffen Udluft,et al.  Overcoming Model Bias for Robust Offline Deep Reinforcement Learning , 2020, Eng. Appl. Artif. Intell..

[54]  Ole Winther,et al.  Ladder Variational Autoencoders , 2016, NIPS.

[55]  Houqiang Li,et al.  Masked Contrastive Representation Learning for Reinforcement Learning , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[56]  Stefan Schaal,et al.  Is imitation learning the route to humanoid robots? , 1999, Trends in Cognitive Sciences.

[57]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[58]  Romain Laroche,et al.  Safe Policy Improvement with Baseline Bootstrapping , 2017, ICML.

[59]  Yali Amit,et al.  Likelihood Regret: An Out-of-Distribution Detection Score For Variational Auto-encoder , 2020, NeurIPS.

[60]  Xueyan Jiang,et al.  Metrics for Deep Generative Models , 2017, AISTATS.

[61]  Richard S. Sutton,et al.  Predictive Representations of State , 2001, NIPS.

[62]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[63]  Yee Whye Teh,et al.  Do Deep Generative Models Know What They Don't Know? , 2018, ICLR.

[64]  Jeff Johnson,et al.  Billion-Scale Similarity Search with GPUs , 2017, IEEE Transactions on Big Data.

[65]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[66]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[67]  Mohammad Norouzi,et al.  Mastering Atari with Discrete World Models , 2020, ICLR.