Generalizing Regrasping with Supervised Policy Learning

We present a method for learning a general regrasping behavior by using supervised policy learning. First, we use reinforcement learning to learn linear regrasping policies, with a small number of parameters, for single objects. Next, a general high-dimensional regrasping policy is learned in a supervised manner by using the outputs of the individual policies. In our experiments with multiple objects, we show that learning low-dimensional policies makes the reinforcement learning feasible with a small amount of data. Our experiments indicate that the general high-dimensional policy learned using our method is able to outperform the respective linear policies on each of the single objects that they were trained on. Moreover, the general policy is able to generalize to a novel object that was not present during training.

[1]  T.,et al.  Training Feedforward Networks with the Marquardt Algorithm , 2004 .

[2]  A. Bruckstein,et al.  K-SVD : An Algorithm for Designing of Overcomplete Dictionaries for Sparse Representation , 2005 .

[3]  M. Elad,et al.  $rm K$-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation , 2006, IEEE Transactions on Signal Processing.

[4]  Y. Yao,et al.  On Early Stopping in Gradient Descent Learning , 2007 .

[5]  Veronica J. Santos,et al.  Biomimetic Tactile Sensor Array , 2008, Adv. Robotics.

[6]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[7]  Yasemin Altun,et al.  Relative Entropy Policy Search , 2010 .

[8]  Dieter Fox,et al.  Hierarchical Matching Pursuit for Image Classification: Architecture and Fast Algorithms , 2011, NIPS.

[9]  Sergey Levine,et al.  Guided Policy Search , 2013, ICML.

[10]  Jan Peters,et al.  A Survey on Policy Search for Robotics , 2013, Found. Trends Robotics.

[11]  Danica Kragic,et al.  Data-Driven Grasp Synthesis—A Survey , 2013, IEEE Transactions on Robotics.

[12]  Danica Kragic,et al.  ST-HMP: Unsupervised Spatio-Temporal feature learning for tactile data , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[13]  Peter K. Allen,et al.  Stable grasping under pose uncertainty using tactile feedback , 2014, Auton. Robots.

[14]  Danica Kragic,et al.  Learning of grasp adaptation through experience and tactile sensing , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[15]  Gaurav S. Sukhatme,et al.  Force estimation and slip detection/classification for grip control using a biomimetic tactile sensor , 2015, 2015 IEEE-RAS 15th International Conference on Humanoid Robots (Humanoids).

[16]  Nolan Wagener,et al.  Learning contact-rich manipulation skills with guided policy search , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[17]  Gaurav S. Sukhatme,et al.  BiGS: BioTac Grasp Stability Dataset , 2016 .

[18]  Gaurav S. Sukhatme,et al.  Self-supervised regrasping using spatio-temporal tactile features and reinforcement learning , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[19]  Sergey Levine,et al.  Deep spatial autoencoders for visuomotor learning , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[20]  Martin V. Butz,et al.  Self-supervised regrasping using spatio-temporal tactile features and reinforcement learning , 2016, IROS 2016.