Light-weight probing of unsupervised representations for Reinforcement Learning

Unsupervised visual representation learning offers the opportunity to leverage large corpora of unlabeled trajectories to form useful visual representations, which can benefit the training of reinforcement learning (RL) algorithms. However, evaluating the fitness of such representations requires training RL algorithms which is computationally intensive and has high variance outcomes. To alleviate this issue, we design an evaluation protocol for unsupervised RL representations with lower variance and up to 600x lower computational cost. Inspired by the vision community, we propose two linear probing tasks: predicting the reward observed in a given state, and predicting the action of an expert in a given state. These two tasks are generally applicable to many RL domains, and we show through rigorous experimentation that they correlate strongly with the actual downstream control performance on the Atari100k Benchmark. This provides a better method for exploring the space of pretraining algorithms without the need of running RL evaluations for every setting. Leveraging this framework, we further improve existing self-supervised learning (SSL) recipes for RL, highlighting the importance of the forward model, the size of the visual backbone, and the precise formulation of the unsupervised objective.

[1]  Houqiang Li,et al.  Masked Contrastive Representation Learning for Reinforcement Learning , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Yonghui Wu,et al.  Self-supervised Learning with Random-projection Quantizer for Speech Recognition , 2022, ICML.

[3]  Cuiling Lan,et al.  Mask-based Latent Reconstruction for Reinforcement Learning , 2022, NeurIPS.

[4]  Lerrel Pinto,et al.  The Surprising Effectiveness of Representation Learning for Visual Imitation , 2021, Robotics: Science and Systems.

[5]  Yann LeCun,et al.  VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning , 2021, ICLR.

[6]  P. D. Loor,et al.  Unsupervised Learning of State Representation using Balanced View Spatial Deep InfoMax: Evaluation on Atari Games , 2022, ICAART.

[7]  Pieter Abbeel,et al.  Mastering Atari Games with Limited Data , 2021, NeurIPS.

[8]  Pieter Abbeel,et al.  URLB: Unsupervised Reinforcement Learning Benchmark , 2021, NeurIPS Datasets and Benchmarks.

[9]  Marc G. Bellemare,et al.  Deep Reinforcement Learning at the Edge of the Statistical Precipice , 2021, NeurIPS.

[10]  Philip Bachman,et al.  Pretraining Representations for Data-Efficient Reinforcement Learning , 2021, NeurIPS.

[11]  Phillip Isola,et al.  Curious Representation Learning for Embodied Intelligence , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[12]  Julien Mairal,et al.  Emerging Properties in Self-Supervised Vision Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[13]  Yann Ollivier,et al.  Learning One Representation to Optimize All Rewards , 2021, NeurIPS.

[14]  Yann LeCun,et al.  Barlow Twins: Self-Supervised Learning via Redundancy Reduction , 2021, ICML.

[15]  Alessandro Lazaric,et al.  Reinforcement Learning with Prototypical Representations , 2021, ICML.

[16]  Xinlei Chen,et al.  Exploring Simple Siamese Representation Learning , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Mohammad Norouzi,et al.  Mastering Atari with Discrete World Models , 2020, ICLR.

[18]  Pieter Abbeel,et al.  Decoupling Representation Learning from Reinforcement Learning , 2020, ICML.

[19]  Aaron C. Courville,et al.  Data-Efficient Reinforcement Learning with Self-Predictive Representations , 2020, ICLR.

[20]  R. Fergus,et al.  Image Augmentation Is All You Need: Regularizing Deep Reinforcement Learning from Pixels , 2020, ICLR.

[21]  Julien Mairal,et al.  Unsupervised Learning of Visual Features by Contrasting Cluster Assignments , 2020, NeurIPS.

[22]  Pierre H. Richemond,et al.  Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning , 2020, NeurIPS.

[23]  Stephen R. Clark,et al.  Probing Emergent Semantics in Predictive Agents via Question Answering , 2020, ICML.

[24]  Pieter Abbeel,et al.  CURL: Contrastive Unsupervised Representations for Reinforcement Learning , 2020, ICML.

[25]  Kaiming He,et al.  Improved Baselines with Momentum Contrastive Learning , 2020, ArXiv.

[26]  Geoffrey E. Hinton,et al.  A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[27]  Demis Hassabis,et al.  Mastering Atari, Go, chess and shogi by planning with a learned model , 2019, Nature.

[28]  Rishabh Agarwal,et al.  An Optimistic Perspective on Offline Reinforcement Learning , 2019, ICML.

[29]  Sergey Levine,et al.  Model-Based Reinforcement Learning for Atari , 2019, ICLR.

[30]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Matthieu Cord,et al.  Online Bag-of-Visual-Words Generation for Unsupervised Representation Learning , 2020, ArXiv.

[32]  Julien Mairal,et al.  Cyanure: An Open-Source Toolbox for Empirical Risk Minimization for Python, C++, and soon more , 2019, ArXiv.

[33]  Evan Racah,et al.  Supervise Thyself: Examining Self-Supervised Representations in Interactive Environments , 2019, ArXiv.

[34]  Yoshua Bengio,et al.  Unsupervised State Representation Learning in Atari , 2019, NeurIPS.

[35]  Marc G. Bellemare,et al.  DeepMDP: Learning Continuous Latent Space Models for Representation Learning , 2019, ICML.

[36]  Amos J. Storkey,et al.  Exploration by Random Network Distillation , 2018, ICLR.

[37]  Alexei A. Efros,et al.  Large-Scale Study of Curiosity-Driven Learning , 2018, ICLR.

[38]  Julien Mairal,et al.  An Inexact Variable Metric Proximal Point Algorithm for Generic Quasi-Newton Acceleration , 2016, SIAM J. Optim..

[39]  Rémi Munos,et al.  Neural Predictive Belief Representations , 2018, ArXiv.

[40]  Nando de Freitas,et al.  Playing hard exploration games by watching YouTube , 2018, NeurIPS.

[41]  Nicolas Usunier,et al.  Forward Modeling for Partial Observation Strategy Games - A StarCraft Defogger , 2018, NeurIPS.

[42]  Tom Schaul,et al.  Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.

[43]  Philip Bachman,et al.  Deep Reinforcement Learning that Matters , 2017, AAAI.

[44]  Marlos C. Machado,et al.  Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents , 2017, J. Artif. Intell. Res..

[45]  Aaron C. Courville,et al.  FiLM: Visual Reasoning with a General Conditioning Layer , 2017, AAAI.

[46]  Sergey Levine,et al.  Time-Contrastive Networks: Self-Supervised Learning from Video , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[47]  Tom Schaul,et al.  Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.

[48]  Vincent Dumoulin,et al.  Deconvolution and Checkerboard Artifacts , 2016 .

[49]  Honglak Lee,et al.  Control of Memory, Active Perception, and Action in Minecraft , 2016, ICML.

[50]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[51]  Julien Mairal,et al.  Incremental Majorization-Minimization Optimization with Application to Large-Scale Machine Learning , 2014, SIAM J. Optim..

[52]  Yoshua Bengio,et al.  Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation , 2013, ArXiv.

[53]  Marc G. Bellemare,et al.  The Arcade Learning Environment: An Evaluation Platform for General Agents (Extended Abstract) , 2012, IJCAI.

[54]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[55]  Martin A. Riedmiller,et al.  Batch Reinforcement Learning , 2012, Reinforcement Learning.