论文信息 - Supervision via competition: Robot adversaries for learning tasks

Supervision via competition: Robot adversaries for learning tasks

There has been a recent paradigm shift in robotics to data-driven learning for planning and control. Due to large number of experiences required for training, most of these approaches use a self-supervised paradigm: using sensors to measure success/failure. However, in most cases, these sensors provide weak supervision at best. In this work, we propose an adversarial learning framework that pits an adversary against the robot learning the task. In an effort to defeat the adversary, the original robot learns to perform the task with more robustness leading to overall improved performance. We show that this adversarial framework forces the robot to learn a better grasping model in order to overcome the adversary. By grasping 82% of presented novel objects compared to 68% without an adversary, we demonstrate the utility of creating adversaries. We also demonstrate via experiments that having robots in adversarial setting might be a better learning strategy as compared to having collaborative multiple robots. For supplementary video see: youtu.be/QfK3Bqhc6Sk

[1] T. Başar,et al. Dynamic Noncooperative Game Theory , 1982 .

[2] John F. Canny,et al. Planning optimal grasps , 1992, Proceedings 1992 IEEE International Conference on Robotics and Automation.

[3] Terence D. Sanger,et al. Neural network learning control of robot manipulators using gradually increasing task difficulty , 1994, IEEE Trans. Robotics Autom..

[4] Tamer Başar,et al. H1-Optimal Control and Related Minimax Design Problems , 1995 .

[5] Kah Kay Sung,et al. Learning and example selection for object and pattern detection , 1995 .

[6] T. Basar,et al. H∞-0ptimal Control and Related Minimax Design Problems: A Dynamic Game Approach , 1996, IEEE Trans. Autom. Control..

[7] Takeo Kanade,et al. Neural Network-Based Face Detection , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[8] Manuela M. Veloso,et al. Multiagent Systems: A Survey from a Machine Learning Perspective , 2000, Auton. Robots.

[9] Steven M. LaValle,et al. Robot Motion Planning: A Game-Theoretic Foundation , 2000, Algorithmica.

[10] Vijay Kumar,et al. Robotic grasping and contact: a review , 2000, Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No.00CH37065).

[11] Sanjoy Dasgupta,et al. Analysis of a greedy active learning strategy , 2004, NIPS.

[12] Martial Hebert,et al. Enabling learning from large datasets: applying active learning to mobile robotics , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[13] Sean Luke,et al. Cooperative Multi-Agent Learning: The State of the Art , 2005, Autonomous Agents and Multi-Agent Systems.

[14] Burr Settles,et al. Active Learning Literature Survey , 2009 .

[15] Siddhartha S. Srinivasa,et al. A Framework for Push-Grasping in Clutter , 2011, Robotics: Science and Systems.

[16] Ling Xu,et al. Physical Human Interactive Guidance: Identifying Grasping Principles From Human-Planned Grasps , 2012, IEEE Transactions on Robotics.

[17] Peter K. Allen,et al. Pose error robust grasping from contact wrench space metrics , 2012, 2012 IEEE International Conference on Robotics and Automation.

[18] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[19] James J. Kuffner,et al. Physically Based Grasp Quality Evaluation Under Pose Uncertainty , 2013, IEEE Transactions on Robotics.

[20] Danica Kragic,et al. Data-Driven Grasp Synthesis—A Survey , 2013, IEEE Transactions on Robotics.

[21] Iasonas Kokkinos,et al. Fracking Deep Convolutional Image Descriptors , 2014, ArXiv.

[22] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[23] Honglak Lee,et al. Deep learning for detecting robotic grasps , 2013, Int. J. Robotics Res..

[24] Frank Hutter,et al. Online Batch Selection for Faster Training of Neural Networks , 2015, ArXiv.

[25] Jeannette Bohg,et al. Leveraging big data for grasp planning , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[26] Nitish Srivastava. Unsupervised Learning of Visual Representations using Videos , 2015 .

[27] Jonathon Shlens,et al. Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[28] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[29] Filip De Turck,et al. VIME: Variational Information Maximizing Exploration , 2016, NIPS.

[30] Fernando Diaz,et al. Exploratory Gradient Boosting for Reinforcement Learning in Complex Domains , 2016, ArXiv.

[31] Benjamin Van Roy,et al. Deep Exploration via Bootstrapped DQN , 2016, NIPS.

[32] Abhinav Gupta,et al. Training Region-Based Object Detectors with Online Hard Example Mining , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33] Tom Schaul,et al. Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.

[34] Sergey Levine,et al. End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[35] Tom Schaul,et al. Prioritized Experience Replay , 2015, ICLR.

[36] J. Schulman,et al. Variational Information Maximizing Exploration , 2016 .

[37] Abhinav Gupta,et al. The Curious Robot: Learning Visual Representations via Physical Interactions , 2016, ECCV.

[38] Abhinav Gupta,et al. Supersizing self-supervision: Learning to grasp from 50K tries and 700 robot hours , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[39] Mathieu Aubry,et al. Dex-Net 1.0: A cloud-based network of 3D objects for robust grasp planning using a Multi-Armed Bandit model with correlated rewards , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[40] John Langford,et al. Efficient Exploration in Reinforcement Learning , 2017, Encyclopedia of Machine Learning and Data Mining.

[41] Aaron C. Courville,et al. Adversarially Learned Inference , 2016, ICLR.

[42] Sergey Levine,et al. Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection , 2016, Int. J. Robotics Res..