Learning objective functions for manipulation

We present an approach to learning objective functions for robotic manipulation based on inverse reinforcement learning. Our path integral inverse reinforcement learning algorithm can deal with high-dimensional continuous state-action spaces, and only requires local optimality of demonstrated trajectories. We use L1 regularization in order to achieve feature selection, and propose an efficient algorithm to minimize the resulting convex objective function. We demonstrate our approach by applying it to two core problems in robotic manipulation. First, we learn a cost function for redundancy resolution in inverse kinematics. Second, we use our method to learn a cost function over trajectories, which is then used in optimization-based motion planning for grasping and manipulation tasks. Experimental results show that our method outperforms previous algorithms in high-dimensional settings.

[1]  Q. Ye The signed Euclidean distance transform and its applications , 1988, [1988 Proceedings] 9th International Conference on Pattern Recognition.

[2]  Jon Rigelsford,et al.  Modelling and Control of Robot Manipulators , 2000 .

[3]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[4]  Rob Malouf,et al.  A Comparison of Algorithms for Maximum Entropy Parameter Estimation , 2002, CoNLL.

[5]  A. Ng Feature selection, L1 vs. L2 regularization, and rotational invariance , 2004, Twenty-first international conference on Machine learning - ICML '04.

[6]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[7]  P. Bickel,et al.  Curse-of-dimensionality revisited : Collapse of importance sampling in very large scale systems , 2005 .

[8]  Gavin C. Cawley,et al.  Sparse Multinomial Logistic Regression via Bayesian L1 Regularisation , 2006, NIPS.

[9]  Pieter Abbeel,et al.  Hierarchical Apprenticeship Learning with Application to Quadruped Locomotion , 2007, NIPS.

[10]  Jianfeng Gao,et al.  Scalable training of L1-regularized log-linear models , 2007, ICML '07.

[11]  Anind K. Dey,et al.  Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[12]  Jan Peters,et al.  Learning motor primitives for robotics , 2009, 2009 IEEE International Conference on Robotics and Automation.

[13]  Siddhartha S. Srinivasa,et al.  CHOMP: Gradient optimization techniques for efficient motion planning , 2009, 2009 IEEE International Conference on Robotics and Automation.

[14]  David Silver,et al.  Learning to search: Functional gradient techniques for imitation learning , 2009, Auton. Robots.

[15]  Siddhartha S. Srinivasa,et al.  Manipulation planning with Workspace Goal Regions , 2009, 2009 IEEE International Conference on Robotics and Automation.

[16]  Stefan Schaal,et al.  Learning locomotion over rough terrain using terrain templates , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[17]  Evangelos A. Theodorou,et al.  Inverse Reinforcement Learning with PI 2 , 2010 .

[18]  Emanuel Todorov,et al.  Inverse Optimal Control with Linearly-Solvable MDPs , 2010, ICML.

[19]  Emilio Frazzoli,et al.  Incremental Sampling-based Algorithms for Optimal Motion Planning , 2010, Robotics: Science and Systems.

[20]  Stefan Schaal,et al.  A Generalized Path Integral Control Approach to Reinforcement Learning , 2010, J. Mach. Learn. Res..

[21]  Michael Beetz,et al.  Generality and legibility in mobile manipulation , 2010, Auton. Robots.

[22]  Stefan Schaal,et al.  STOMP: Stochastic trajectory optimization for motion planning , 2011, 2011 IEEE International Conference on Robotics and Automation.

[23]  Stefan Schaal,et al.  Skill learning and task outcome prediction for manipulation , 2011, 2011 IEEE International Conference on Robotics and Automation.

[24]  Timothy Bretl,et al.  Maximum entropy inverse reinforcement learning in continuous state spaces with path integrals , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[25]  Siddhartha S. Srinivasa,et al.  Manipulation planning with goal sets using constrained trajectory optimization , 2011, 2011 IEEE International Conference on Robotics and Automation.

[26]  Christopher G. Atkeson,et al.  Optimization and learning for rough terrain legged locomotion , 2011, Int. J. Robotics Res..

[27]  Jan Peters,et al.  Relative Entropy Inverse Reinforcement Learning , 2011, AISTATS.

[28]  Stefan Schaal,et al.  Towards Associative Skill Memories , 2012, 2012 12th IEEE-RAS International Conference on Humanoid Robots (Humanoids 2012).