Hierarchical Policy Design for Sample-Efficient Learning of Robot Table Tennis Through Self-Play

Training robots with physical bodies requires developing new methods and action representations that allow the learning agents to explore the space of policies efficiently. This work studies sample-efficient learning of complex policies in the context of robot table tennis. It incorporates learning into a hierarchical control framework using a model-free strategy layer (which requires complex reasoning about opponents that is difficult to do in a model-based way), model-based prediction of external objects (which are difficult to control directly with analytic control methods, but governed by learnable and relatively simple laws of physics), and analytic controllers for the robot itself. Human demonstrations are used to train dynamics models, which together with the analytic controller allow any robot that is physically capable to play table tennis without training episodes. Using only about 7,000 demonstrated trajectories, a striking policy can hit ball targets with about 20 cm error. Self-play is used to train cooperative and adversarial strategies on top of model-based striking skills trained from human demonstrations. After only about 24,000 strikes in self-play the agent learns to best exploit the human dynamics models for longer cooperative games. Further experiments demonstrate that more flexible variants of the policy can discover new strikes not demonstrated by humans and achieve higher performance at the expense of lower sample-efficiency. Experiments are carried out in a virtual reality environment using sensory observations that are obtainable in the real world. The high sample-efficiency demonstrated in the evaluations show that the proposed method is suitable for learning directly on physical robots without transfer of models or policies from simulation. Supplementary material available at this https URL

[1]  Wojciech Zaremba,et al.  OpenAI Gym , 2016, ArXiv.

[2]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[3]  R. Mahjourian Neuroevolutionary Planning for Robotic Control , 2016 .

[4]  Hua-Tsung Chen,et al.  Ball tracking and 3D trajectory approximation with applications to tactics analysis from single-camera volleyball sequences , 2012, Multimedia Tools and Applications.

[5]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[6]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[7]  Bernhard Schölkopf,et al.  Anticipatory action selection for human-robot table tennis , 2017, Artif. Intell..

[8]  Torsten Kröger,et al.  Opening the door to new sensor-based robot applications—The Reflexxes Motion Libraries , 2011, 2011 IEEE International Conference on Robotics and Automation.

[9]  Sergey Levine,et al.  Data-Efficient Hierarchical Reinforcement Learning , 2018, NeurIPS.

[10]  Tom Schaul,et al.  StarCraft II: A New Challenge for Reinforcement Learning , 2017, ArXiv.

[11]  Doina Precup,et al.  Temporal abstraction in reinforcement learning , 2000, ICML 2000.

[12]  Benjamin Recht,et al.  Simple random search provides a competitive approach to reinforcement learning , 2018, ArXiv.

[13]  L. Angel,et al.  RoboTenis: design, dynamic modeling and preliminary control , 2005, Proceedings, 2005 IEEE/ASME International Conference on Advanced Intelligent Mechatronics..

[14]  James Davidson,et al.  TensorFlow Agents: Efficient Batched Reinforcement Learning in TensorFlow , 2017, ArXiv.

[15]  Tom Schaul,et al.  FeUdal Networks for Hierarchical Reinforcement Learning , 2017, ICML.

[16]  Oliver Kroemer,et al.  Learning to select and generalize striking movements in robot table tennis , 2012, AAAI Fall Symposium: Robots Learning Interactively from Human Teachers.

[17]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[18]  Jan Peters,et al.  A biomimetic approach to robot table tennis , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[19]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[20]  Geoffrey E. Hinton,et al.  Feudal Reinforcement Learning , 1992, NIPS.

[21]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[22]  Yongduek Seo,et al.  Where Are the Ball and Players? Soccer Game Analysis with Color Based Tracking and Image Mosaick , 1997, ICIAP.

[23]  Bernhard Schölkopf,et al.  Learning strategies in table tennis using inverse reinforcement learning , 2014, Biological Cybernetics.

[24]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[25]  R. Rubinstein The Cross-Entropy Method for Combinatorial and Continuous Optimization , 1999 .

[26]  Marina Bosch,et al.  A Robot Ping Pong Player Experiment In Real Time Intelligent Control , 2016 .

[27]  Katharina Mülling Modeling and learning of complex motor tasks: a case study with robot table tennis (Modellierung und Lernen von komplexen motorischen Aufgaben anhand von Fallstudien in Roboter-Tischtennis) , 2013 .

[28]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[29]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[30]  Fumio Miyazaki,et al.  Learning to Dynamically Manipulate: A Table Tennis Robot Controls a Ball and Rallies with a Human Being , 2006 .