CaTGrasp: Learning Category-Level Task-Relevant Grasping in Clutter from Simulation

Task-relevant grasping is critical for industrial assembly, where downstream manipulation tasks constrain the set of valid grasps. Learning how to perform this task, however, is challenging, since task-relevant grasp labels are hard to define and annotate. There is also yet no consensus on proper representations for modeling or off-the-shelf tools for performing task-relevant grasps. This work proposes a framework to learn task-relevant grasping for industrial objects without the need of time-consuming real-world data collection or manual annotation. To achieve this, the entire framework is trained solely in simulation, including supervised training with synthetic label generation and self-supervised, hand-object interaction. In the context of this framework, this paper proposes a novel, object-centric canonical representation at the category level, which allows establishing dense correspondence across object instances and transferring task-relevant grasps to novel instances. Extensive experiments on task-relevant grasping of densely-cluttered industrial objects are conducted in both simulation and real-world setups, demonstrating the effectiveness of the proposed framework. Code and data will be released upon acceptance at https://sites.google.com/view/catgrasp.

[1]  Xinyu Liu,et al.  Dex-Net 2.0: Deep Learning to Plan Robust Grasps with Synthetic Point Clouds and Analytic Grasp Metrics , 2017, Robotics: Science and Systems.

[2]  Laurens van der Maaten,et al.  Submanifold Sparse Convolutional Networks , 2017, ArXiv.

[3]  Kuan-Ting Yu,et al.  Multi-view self-supervised deep learning for 6D pose estimation in the Amazon Picking Challenge , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[4]  Kate Saenko,et al.  Grasp Pose Detection in Point Clouds , 2017, Int. J. Robotics Res..

[5]  Sven J. Dickinson,et al.  GIFT: Generalizable Interaction-aware Functional Tool Affordances without Labels , 2021, Robotics: Science and Systems.

[6]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Ronald P. A. Petrick,et al.  Self-Assessment of Grasp Affordance Transfer , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[8]  Sven Behnke,et al.  Transferring Category-Based Functional Grasping Skills by Latent Space Non-Rigid Registration , 2018, IEEE Robotics and Automation Letters.

[9]  Wei Gao,et al.  kPAM: KeyPoint Affordances for Category-Level Robotic Manipulation , 2019, ISRR.

[10]  Manuel Lopes,et al.  Learning Object Affordances: From Sensory--Motor Coordination to Imitation , 2008, IEEE Transactions on Robotics.

[11]  Cewu Lu,et al.  GraspNet-1Billion: A Large-Scale Benchmark for General Object Grasping , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Larry H. Matthies,et al.  Task-oriented grasping with semantic and geometric scene understanding , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[13]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[14]  Silvio Savarese,et al.  Learning task-oriented grasping for tool manipulation from simulated self-supervision , 2018, Robotics: Science and Systems.

[15]  Hao Su,et al.  S4G: Amodal Single-view Single-Shot SE(3) Grasp Detection in Cluttered Scenes , 2019, CoRL.

[16]  Danica Kragic,et al.  Learning Task-Oriented Grasping From Human Activity Datasets , 2019, IEEE Robotics and Automation Letters.

[17]  Jiyu Cheng,et al.  Learning Multi-Object Dense Descriptor for Autonomous Goal-Conditioned Grasping , 2021, IEEE Robotics and Automation Letters.

[18]  Oliver Kroemer,et al.  Towards Robotic Assembly by Predicting Robust, Precise and Task-oriented Grasps , 2020, CoRL.

[19]  Leonidas J. Guibas,et al.  Normalized Object Coordinate Space for Category-Level 6D Object Pose and Size Estimation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Patricio A. Vela,et al.  An Affordance Keypoint Detection Network for Robot Manipulation , 2021, IEEE Robotics and Automation Letters.

[21]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[22]  Dieter Fox,et al.  6-DOF GraspNet: Variational Grasp Generation for Object Manipulation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[23]  Kostas E. Bekris,et al.  Vision-driven Compliant Manipulation for Reliable, High-Precision Assembly Tasks , 2021, Robotics: Science and Systems.

[24]  Dieter Fox,et al.  Unseen Object Instance Segmentation for Robotic Environments , 2020, IEEE Transactions on Robotics.

[25]  Patricio A. Vela,et al.  Learning Affordance Segmentation for Real-World Robotic Manipulation via Synthetic Images , 2019, IEEE Robotics and Automation Letters.

[26]  Dinesh Manocha,et al.  FCL: A general purpose library for collision and proximity queries , 2012, 2012 IEEE International Conference on Robotics and Automation.

[27]  Timothy Patten,et al.  DGCM-Net: Dense Geometrical Correspondence Matching Network for Incremental Experience-Based Robotic Grasping , 2020, Frontiers in Robotics and AI.

[28]  Dieter Fox,et al.  PoseCNN: A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered Scenes , 2017, Robotics: Science and Systems.

[29]  Dieter Fox,et al.  Contact-GraspNet: Efficient 6-DoF Grasp Generation in Cluttered Scenes , 2021, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[30]  Danica Kragic,et al.  Affordance detection for task-specific grasping using deep learning , 2017, 2017 IEEE-RAS 17th International Conference on Humanoid Robotics (Humanoids).

[31]  Kostas E. Bekris,et al.  BundleTrack: 6D Pose Tracking for Novel Objects without Instance or Category-Level 3D Models , 2021, 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[32]  Andrea L. Thomaz,et al.  Learning Labeled Robot Affordance Models Using Simulations and Crowdsourcing , 2020, Robotics: Science and Systems.

[33]  Shuran Song,et al.  Fit2Form: 3D Generative Model for Robot Gripper Form Design , 2020, CoRL.

[34]  Ken Goldberg,et al.  Learning ambidextrous robot grasping policies , 2019, Science Robotics.

[35]  Luc De Raedt,et al.  Semantic and geometric reasoning for robotic grasping: a probabilistic logic approach , 2018, Auton. Robots.

[36]  Max Q.-H. Meng,et al.  High Accuracy and Efficiency Grasp Pose Detection Scheme with Dense Predictions , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[37]  Abdeslam Boularias,et al.  Scene-level Pose Estimation for Multiple Instances of Densely Packed Objects , 2019, CoRL.

[38]  Jacopo Aleotti,et al.  Point Cloud Projective Analysis for Part-Based Grasp Planning , 2020, IEEE Robotics and Automation Letters.

[39]  Darwin G. Caldwell,et al.  AffordanceNet: An End-to-End Deep Learning Approach for Object Affordance Detection , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[40]  Dieter Fox,et al.  Deep Object Pose Estimation for Semantic Robotic Grasping of Household Objects , 2018, CoRL.

[41]  Dongwon Park,et al.  Real-Time, Highly Accurate Robotic Grasp Detection using Fully Convolutional Neural Network with Rotation Ensemble Module , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[42]  Li Jiang,et al.  PointGroup: Dual-Set Point Grouping for 3D Instance Segmentation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Silvio Savarese,et al.  KETO: Learning Keypoint Representations for Tool Manipulation , 2019, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[44]  Wojciech Zaremba,et al.  Domain randomization for transferring deep neural networks from simulation to the real world , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[45]  Russ Tedrake,et al.  Dense Object Nets: Learning Dense Visual Object Descriptors By and For Robotic Manipulation , 2018, CoRL.

[46]  Shiao-Li Tsao,et al.  Multi-step Pick-and-Place Tasks Using Object-centric Dense Correspondences , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[47]  Kostas E. Bekris,et al.  se(3)-TrackNet: Data-driven 6D Pose Tracking by Calibrating Image Residuals in Synthetic Domains , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[48]  Oleg O. Sushkov,et al.  Robust Multi-Modal Policies for Industrial Assembly via Reinforcement Learning and Demonstrations: A Large-Scale Study , 2021, Robotics: Science and Systems.

[49]  Fuchun Sun,et al.  PointNetGPD: Detecting Grasp Configurations from Point Sets , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[50]  Roland Siegwart,et al.  Volumetric Grasping Network: Real-time 6 DOF Grasp Detection in Clutter , 2021, ArXiv.

[51]  Nanning Zheng,et al.  Task-oriented Grasping in Object Stacking Scenes with CRF-based Semantic Model , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).