论文信息 - Simultaneously Learning Vision and Feature-based Control Policies for Real-world Ball-in-a-Cup - 字舞流文

Simultaneously Learning Vision and Feature-based Control Policies for Real-world Ball-in-a-Cup

We present a method for fast training of vision based control policies on real robots. The key idea behind our method is to perform multi-task Reinforcement Learning with auxiliary tasks that differ not only in the reward to be optimized but also in the state-space in which they operate. In particular, we allow auxiliary task policies to utilize task features that are available only at training-time. This allows for fast learning of auxiliary policies, which subsequently generate good data for training the main, vision-based control policies. This method can be seen as an extension of the Scheduled Auxiliary Control (SAC-X) framework. We demonstrate the efficacy of our method by using both a simulated and real-world Ball-in-a-Cup game controlled by a robot arm. In simulation, our approach leads to significant learning speed-ups when compared to standard SAC-X. On the real robot we show that the task can be learned from-scratch, i.e., with no transfer from simulation and no imitation learning. Videos of our learned policies running on the real robot can be found at this https URL.

Murilo F. Martins | Martin A. Riedmiller | Thomas Lampe | Francesco Nori | Abbas Abdolmaleki | Jost Tobias Springenberg | Roland Hafner | Michael Neunert | Tim Hertweck | Devin Schwab

[1] Marc G. Bellemare,et al. Safe and Efficient Off-Policy Reinforcement Learning , 2016, NIPS.

[2] Ruslan Salakhutdinov,et al. Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning , 2015, ICLR.

[3] Marcin Andrychowicz,et al. Asymmetric Actor Critic for Image-Based Robot Learning , 2017, Robotics: Science and Systems.

[4] Sergey Levine,et al. Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection , 2016, Int. J. Robotics Res..

[5] Li Fei-Fei,et al. MentorNet: Learning Data-Driven Curriculum for Very Deep Neural Networks on Corrupted Labels , 2017, ICML.

[6] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[7] Razvan Pascanu,et al. Sim-to-Real Robot Learning from Pixels with Progressive Nets , 2016, CoRL.

[8] Jason Weston,et al. Curriculum learning , 2009, ICML '09.

[9] Pierre-Yves Oudeyer,et al. Intrinsically Motivated Goal Exploration Processes with Automatic Curriculum Learning , 2017, J. Mach. Learn. Res..

[10] Yuval Tassa,et al. Learning Continuous Control Policies by Stochastic Value Gradients , 2015, NIPS.

[11] Jan Peters,et al. Learning motor primitives for robotics , 2009, 2009 IEEE International Conference on Robotics and Automation.

[12] Geoffrey J. Gordon,et al. A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[13] Pieter Abbeel,et al. Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[14] Darwin G. Caldwell,et al. Robot motor skill coordination with EM-based Reinforcement Learning , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[15] Jürgen Schmidhuber,et al. PowerPlay: Training an Increasingly General Problem Solver by Continually Searching for the Simplest Still Unsolvable Problem , 2011, Front. Psychol..

[16] Henry Zhu,et al. Soft Actor-Critic Algorithms and Applications , 2018, ArXiv.

[17] Pieter Abbeel,et al. An Application of Reinforcement Learning to Aerobatic Helicopter Flight , 2006, NIPS.

[18] Wojciech Czarnecki,et al. Multi-task Deep Reinforcement Learning with PopArt , 2018, AAAI.

[19] Patrick M. Pilarski,et al. Horde: a scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction , 2011, AAMAS.

[20] Tom Schaul,et al. Universal Value Function Approximators , 2015, ICML.

[21] Marcin Andrychowicz,et al. Hindsight Experience Replay , 2017, NIPS.

[22] Yuval Tassa,et al. Emergence of Locomotion Behaviours in Rich Environments , 2017, ArXiv.

[23] Jan Peters,et al. Using Bayesian Dynamical Systems for Motion Template Libraries , 2008, NIPS.

[24] Jakub W. Pachocki,et al. Learning dexterous in-hand manipulation , 2018, Int. J. Robotics Res..

[25] Yuval Tassa,et al. Maximum a Posteriori Policy Optimisation , 2018, ICLR.

[26] Martin A. Riedmiller,et al. Learning by Playing - Solving Sparse Reward Tasks from Scratch , 2018, ICML.

[27] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[28] Shane Legg,et al. IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.