Contrastive Behavioral Similarity Embeddings for Generalization in Reinforcement Learning

Reinforcement learning methods trained on few environments rarely learn policies that generalize to unseen environments. To improve generalization, we incorporate the inherent sequential structure in reinforcement learning into the representation learning process. This approach is orthogonal to recent approaches, which rarely exploit this structure explicitly. Specifically, we introduce a theoretically motivated policy similarity metric (PSM) for measuring behavioral similarity between states. PSM assigns high similarity to states for which the optimal policies in those states as well as in future states are similar. We also present a contrastive representation learning procedure to embed any state similarity metric, which we instantiate with PSM to obtain policy similarity embeddings (PSEs1). We demonstrate that PSEs improve generalization on diverse benchmarks, including LQR with spurious correlations, a jumping task from pixels, and Distracting DM Control Suite. Source code would be made available at agarwl.github.io/pse.

[1]  Kim G. Larsen,et al.  Bisimulation through Probabilistic Testing , 1991, Inf. Comput..

[2]  P. J. Haley,et al.  Extrapolation limitations of multilayer feedforward neural networks , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[3]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[4]  Servicio Geológico Colombiano Sgc Volume 4 , 2013, Journal of Diabetes Investigation.

[5]  Robert Givan,et al.  Equivalence notions and model minimization in Markov decision processes , 2003, Artif. Intell..

[6]  Doina Precup,et al.  Metrics for Finite Markov Decision Processes , 2004, AAAI.

[7]  Doina Precup,et al.  Methods for Computing State Similarity in Markov Decision Processes , 2006, UAI.

[8]  Yann LeCun,et al.  Dimensionality Reduction by Learning an Invariant Mapping , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[9]  C. Villani Optimal Transport: Old and New , 2008 .

[10]  Peter Stone,et al.  Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[11]  Doina Precup,et al.  Using Bisimulation for Policy Transfer in MDPs , 2010, AAAI.

[12]  Doina Precup,et al.  Bisimulation Metrics for Continuous Markov Decision Processes , 2011, SIAM J. Comput..

[13]  Doina Precup,et al.  Bisimulation Metrics are Optimal Value Functions , 2014, UAI.

[14]  Honglak Lee,et al.  Zero-Shot Task Generalization with Multi-Task Deep Reinforcement Learning , 2017, ICML.

[15]  Luc Van Gool,et al.  The 2017 DAVIS Challenge on Video Object Segmentation , 2017, ArXiv.

[16]  Wojciech Zaremba,et al.  Domain randomization for transferring deep neural networks from simulation to the real world , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[17]  Finale Doshi-Velez,et al.  Robust and Efficient Transfer Learning with Hidden Parameter Markov Decision Processes , 2017, AAAI.

[18]  D. Sculley,et al.  Google Vizier: A Service for Black-Box Optimization , 2017, KDD.

[19]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[20]  Sham M. Kakade,et al.  Towards Generalization and Simplicity in Continuous Control , 2017, NIPS.

[21]  Leland McInnes,et al.  UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction , 2018, ArXiv.

[22]  Swami Sankaranarayanan,et al.  MetaReg: Towards Domain Generalization using Meta-Regularization , 2018, NeurIPS.

[23]  Samy Bengio,et al.  A Study on Overfitting in Deep Reinforcement Learning , 2018, ArXiv.

[24]  Nathan Srebro,et al.  Implicit Regularization in Matrix Factorization , 2017, 2018 Information Theory and Applications Workshop (ITA).

[25]  Joelle Pineau,et al.  Natural Environment Benchmarks for Reinforcement Learning , 2018, ArXiv.

[26]  Marlos C. Machado,et al.  Generalization and Regularization in DQN , 2018, ArXiv.

[27]  Julian Togelius,et al.  Illuminating Generalization in Deep Reinforcement Learning through Procedural Level Generation , 2018, 1806.10729.

[28]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[29]  Joelle Pineau,et al.  A Dissection of Overfitting and Generalization in Continuous Reinforcement Learning , 2018, ArXiv.

[30]  Oriol Vinyals,et al.  Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.

[31]  Yongxin Yang,et al.  Learning to Generalize: Meta-Learning for Domain Generalization , 2017, AAAI.

[32]  Dawn Xiaodong Song,et al.  Assessing Generalization in Deep Reinforcement Learning , 2018, ArXiv.

[33]  Philip Bachman,et al.  Learning Invariances for Policy Generalization , 2018, ICLR.

[34]  Sanjeev Arora,et al.  Implicit Regularization in Deep Matrix Factorization , 2019, NeurIPS.

[35]  Sam Devlin,et al.  Generalization in Reinforcement Learning with Selective Noise Injection and Information Bottleneck , 2019, NeurIPS.

[36]  Dale Schuurmans,et al.  Learning to Generalize from Sparse and Underspecified Rewards , 2019, ICML.

[37]  Benjamin Recht,et al.  A Tour of Reinforcement Learning: The View from Continuous Control , 2018, Annu. Rev. Control. Robotics Auton. Syst..

[38]  Julian Togelius,et al.  Obstacle Tower: A Generalization Challenge in Vision, Control, and Planning , 2019, IJCAI.

[39]  Taehoon Kim,et al.  Quantifying Generalization in Reinforcement Learning , 2018, ICML.

[40]  Ilya Kostrikov,et al.  Automatic Data Augmentation for Generalization in Deep Reinforcement Learning , 2020, ArXiv.

[41]  Regina Barzilay,et al.  Domain Extrapolation via Regret Minimization , 2020, ArXiv.

[42]  R Devon Hjelm,et al.  Deep Reinforcement and InfoMax Learning , 2020, NeurIPS.

[43]  Julian Togelius,et al.  Rotation, Translation, and Cropping for Zero-Shot Generalization , 2020, 2020 IEEE Conference on Games (CoG).

[44]  Geoffrey E. Hinton,et al.  A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[45]  Pablo Samuel Castro,et al.  Scalable methods for computing state similarity in deterministic Markov Decision Processes , 2019, AAAI.

[46]  Honglak Lee,et al.  Predictive Information Accelerates Learning in RL , 2020, NeurIPS.

[47]  Yuval Tassa,et al.  dm_control: Software and Tasks for Continuous Control , 2020, Softw. Impacts.

[48]  Doina Precup,et al.  Invariant Causal Prediction for Block MDPs , 2020, ICML.

[49]  Xingyou Song,et al.  Observational Overfitting in Reinforcement Learning , 2019, ICLR.

[50]  Felipe Petroski Such,et al.  Generalized Hidden Parameter MDPs Transferable Model-based RL in a Handful of Trials , 2020, AAAI.

[51]  P. Abbeel,et al.  Reinforcement Learning with Augmented Data , 2020, NeurIPS.

[52]  Pieter Abbeel,et al.  CURL: Contrastive Unsupervised Representations for Reinforcement Learning , 2020, ICML.

[53]  Frans A. Oliehoek,et al.  Plannable Approximations to MDP Homomorphisms: Equivariance under Actions , 2020, AAMAS.

[54]  Yujin Tang,et al.  Neuroevolution of self-interpretable agents , 2020, GECCO.

[55]  Jinwoo Shin,et al.  Network Randomization: A Simple Technique for Generalization in Deep Reinforcement Learning , 2019, ICLR.

[56]  R Devon Hjelm,et al.  Data-Efficient Reinforcement Learning with Momentum Predictive Representations , 2020, ArXiv.

[57]  S. Levine,et al.  Learning Invariant Representations for Reinforcement Learning without Reconstruction , 2020, ICLR.

[58]  Pieter Abbeel,et al.  Decoupling Representation Learning from Reinforcement Learning , 2020, ICML.

[59]  Rico Jonschkowski,et al.  The Distracting Control Suite - A Challenging Benchmark for Reinforcement Learning from Pixels , 2021, ArXiv.

[60]  R. Fergus,et al.  Image Augmentation Is All You Need: Regularizing Deep Reinforcement Learning from Pixels , 2020, ICLR.

[61]  Ken-ichi Kawarabayashi,et al.  How Neural Networks Extrapolate: From Feedforward to Graph Neural Networks , 2020, ICLR.

[62]  Anirudha Majumdar,et al.  Invariant Policy Optimization: Towards Stronger Generalization in Reinforcement Learning , 2020, L4DC.

[63]  Michael L. Littman,et al.  Measuring and Characterizing Generalization in Deep Reinforcement Learning , 2018, Applied AI Letters.