High Acceleration Reinforcement Learning for Real-World Juggling with Binary Rewards

Robots that can learn in the physical world will be important to en-able robots to escape their stiff and pre-programmed movements. For dynamic high-acceleration tasks, such as juggling, learning in the real-world is particularly challenging as one must push the limits of the robot and its actuation without harming the system, amplifying the necessity of sample efficiency and safety for robot learning algorithms. In contrast to prior work which mainly focuses on the learning algorithm, we propose a learning system, that directly incorporates these requirements in the design of the policy representation, initialization, and optimization. We demonstrate that this system enables the high-speed Barrett WAM manipulator to learn juggling two balls from 56 minutes of experience with a binary reward signal. The final policy juggles continuously for up to 33 minutes or about 4500 repeated catches. The videos documenting the learning process and the evaluation can be found at this https URL

[1]  Jakub W. Pachocki,et al.  Learning dexterous in-hand manipulation , 2018, Int. J. Robotics Res..

[2]  Sergey Levine,et al.  Residual Reinforcement Learning for Robot Control , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[3]  Jan Peters,et al.  Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[4]  Marcin Andrychowicz,et al.  Solving Rubik's Cube with a Robot Hand , 2019, ArXiv.

[5]  Christopher G. Atkeson,et al.  Robot Catching: Towards Engaging Human-Humanoid Interaction , 2002, Auton. Robots.

[6]  Yasemin Altun,et al.  Relative Entropy Policy Search , 2010 .

[7]  Stefan Schaal,et al.  Open loop stable control strategies for robot juggling , 1993, [1993] Proceedings IEEE International Conference on Robotics and Automation.

[8]  Christoph H. Lampert,et al.  Movement templates for learning of hitting and batting , 2010, 2010 IEEE International Conference on Robotics and Automation.

[9]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[10]  Abhinav Gupta,et al.  Supersizing self-supervision: Learning to grasp from 50K tries and 700 robot hours , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[11]  OpenAI Learning Dexterous In-Hand Manipulation. , 2018 .

[12]  Carl E. Rasmussen,et al.  PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[13]  Herke van Hoof,et al.  Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.

[14]  Stefan Schaal,et al.  Learning passive motor control strategies with genetic algorithms , 1993 .

[15]  Daniel E. Koditschek,et al.  Further progress in robot juggling: the spatial two-juggle , 1993, [1993] Proceedings IEEE International Conference on Robotics and Automation.

[16]  Jan Peters,et al.  Noname manuscript No. (will be inserted by the editor) Policy Search for Motor Primitives in Robotics , 2022 .

[17]  Sham M. Kakade,et al.  A Natural Policy Gradient , 2001, NIPS.

[18]  Sergey Levine,et al.  QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation , 2018, CoRL.

[19]  Murilo F. Martins,et al.  Simultaneously Learning Vision and Feature-based Control Policies for Real-world Ball-in-a-Cup , 2019, Robotics: Science and Systems.

[20]  Rui Wang,et al.  Paired Open-Ended Trailblazer (POET): Endlessly Generating Increasingly Complex and Diverse Learning Environments and Their Solutions , 2019, ArXiv.

[21]  Stefan Schaal,et al.  Natural Actor-Critic , 2003, Neurocomputing.

[22]  Pieter Abbeel,et al.  An Application of Reinforcement Learning to Aerobatic Helicopter Flight , 2006, NIPS.

[23]  Yevgen Chebotar,et al.  Closing the Sim-to-Real Loop: Adapting Simulation Randomization with Real World Experience , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[24]  S. Schaal,et al.  Robot juggling: implementation of memory-based learning , 1994, IEEE Control Systems.

[25]  Luís Paulo Reis,et al.  Deriving and improving CMA-ES with information geometric trust regions , 2017, GECCO.

[26]  Raffaello D'Andrea,et al.  Bouncing an Unconstrained Ball in Three Dimensions with a Blind Juggling Robot , 2009, 2009 IEEE International Conference on Robotics and Automation.

[27]  Raffaello D'Andrea,et al.  Quadrocopter ball juggling , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[28]  S. Schaal,et al.  One-Handed Juggling: A Dynamical Approach to a Rhythmic Movement Task. , 1996, Journal of motor behavior.

[29]  Sergey Levine,et al.  Deep Dynamics Models for Learning Dexterous Manipulation , 2019, CoRL.

[30]  David J. Reinkensmeyer,et al.  Task-level robot learning , 1988, Proceedings. 1988 IEEE International Conference on Robotics and Automation.

[31]  Yuval Tassa,et al.  Emergence of Locomotion Behaviours in Rich Environments , 2017, ArXiv.

[32]  Sergey Levine,et al.  Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection , 2016, Int. J. Robotics Res..

[33]  Jan Peters,et al.  Reinforcement Learning to Adjust Robot Movements to New Situations , 2010, IJCAI.

[34]  Christopher G. Atkeson,et al.  Task-level robot learning: juggling a tennis ball more accurately , 1989, Proceedings, 1989 International Conference on Robotics and Automation.

[35]  Jan Peters,et al.  A Survey on Policy Search for Robotics , 2013, Found. Trends Robotics.

[36]  Oliver Kroemer,et al.  Learning to select and generalize striking movements in robot table tennis , 2012, AAAI Fall Symposium: Robots Learning Interactively from Human Teachers.

[37]  Masahiro Fujita,et al.  Motion planning and control for a robot performer , 1993, [1993] Proceedings IEEE International Conference on Robotics and Automation.

[38]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[39]  Sergey Levine,et al.  Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models , 2018, NeurIPS.

[40]  Pieter Abbeel,et al.  BADGR: An Autonomous Self-Supervised Learning-Based Navigation System , 2020, ArXiv.

[41]  Jan Peters,et al.  Self-Paced Contextual Reinforcement Learning , 2019, CoRL.

[42]  Bernhard Schölkopf,et al.  Learning to Play Table Tennis From Scratch using Muscular Robots , 2020, ArXiv.

[43]  Wojciech Zaremba,et al.  Domain randomization for transferring deep neural networks from simulation to the real world , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[44]  Matthew Glisson,et al.  Playing catch and juggling with a humanoid robot , 2012, 2012 12th IEEE-RAS International Conference on Humanoid Robots (Humanoids 2012).

[45]  Daniel E. Koditschek,et al.  Distributed real-time control of a spatial robot juggler , 1992, Computer.

[46]  Darwin G. Caldwell,et al.  Robot motor skill coordination with EM-based Reinforcement Learning , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[47]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[48]  Yasuhiro Masutani,et al.  A study on juggling tasks , 1991, Proceedings IROS '91:IEEE/RSJ International Workshop on Intelligent Robots and Systems '91.

[49]  Akio Namiki,et al.  Two ball juggling with high-speed hand-arm and high-speed vision system , 2012, 2012 IEEE International Conference on Robotics and Automation.