GRAB: A Dataset of Whole-Body Human Grasping of Objects

Training computers to understand, model, and synthesize human grasping requires a rich dataset containing complex 3D object shapes, detailed contact information, hand pose and shape, and the 3D body motion over time. While "grasping" is commonly thought of as a single hand stably lifting an object, we capture the motion of the entire body and adopt the generalized notion of "whole-body grasps". Thus, we collect a new dataset, called GRAB (GRasping Actions with Bodies), of whole-body grasps, containing full 3D shape and pose sequences of 10 subjects interacting with 51 everyday objects of varying shape and size. Given MoCap markers, we fit the full 3D body shape and pose, including the articulated face and hands, as well as the 3D object pose. This gives detailed 3D meshes over time, from which we compute contact between the body and object. This is a unique dataset, that goes well beyond existing ones for modeling and understanding how humans grasp and manipulate objects, how their full body is involved, and how interaction varies with the task. We illustrate the practical value of GRAB with an example application; we train GrabNet, a conditional generative network, to predict 3D hand grasps for unseen 3D object shapes. The dataset and code are available for research purposes at this https URL.

[1]  Wojciech Matusik,et al.  Learning the signatures of the human grasp using a scalable tactile glove , 2019, Nature.

[2]  Marc Pollefeys,et al.  H+O: Unified Egocentric Recognition of 3D Hand-Object Poses and Interactions , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Matthias Nießner,et al.  PiGraphs: learning interaction snapshots from observations , 2016, ACM Trans. Graph..

[4]  Michael J. Black,et al.  SMPL: A Skinned Multi-Person Linear Model , 2015, ACM Trans. Graph..

[5]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[6]  Michael J. Black,et al.  Learning a model of facial shape and expression from 4D scans , 2017, ACM Trans. Graph..

[7]  Sebastian Starke,et al.  Neural state machine for character-scene interactions , 2019, ACM Trans. Graph..

[8]  Christoph Lassner,et al.  Efficient Learning on Point Clouds With Basis Point Sets , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[9]  Pat Hanrahan,et al.  SceneGrok: inferring action maps in 3D environments , 2014, ACM Trans. Graph..

[10]  Dieter Fox,et al.  DexPilot: Vision-Based Teleoperation of Dexterous Robotic Hand-Arm System , 2019, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[11]  Niels Henze,et al.  Fingers' Range and Comfortable Area for One-Handed Smartphone Interaction Beyond the Touchscreen , 2018, CHI.

[12]  Danica Kragic,et al.  Hands in action: real-time 3D reconstruction of hands in interaction with objects , 2010, 2010 IEEE International Conference on Robotics and Automation.

[13]  Dimitrios Tzionas,et al.  Monocular Expressive Body Regression through Body-Driven Attention , 2020, ECCV.

[14]  Victor Zordan,et al.  Proceedings of the 2005 ACM SIGGRAPH/Eurographics Symposium on Computer Animation, SCA 2005, Los Angeles, CA, USA, July 29-31, 2005 , 2005, Symposium on Computer Animation.

[15]  Michael J. Black,et al.  Generating 3D faces using Convolutional Mesh Autoencoders , 2018, ECCV.

[16]  Danica Kragic,et al.  Learning Task-Oriented Grasping From Human Activity Datasets , 2019, IEEE Robotics and Automation Letters.

[17]  H. Ruff,et al.  Infants' manipulative exploration of objects: Effects of age and object characteristics. , 1984 .

[18]  Qionghai Dai,et al.  Video-based hand manipulation capture through composite motion control , 2013, ACM Trans. Graph..

[19]  Dinesh K. Pai,et al.  Interaction capture and synthesis , 2005, ACM Trans. Graph..

[20]  Dimitrios Tzionas,et al.  Resolving 3D Human Pose Ambiguities With 3D Scene Constraints , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[21]  Michael J. Black,et al.  Dynamic FAUST: Registering Human Bodies in Motion , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Feng Xu,et al.  InteractionFusion: real-time reconstruction of hand poses and deformable objects in hand-object interactions , 2019, ACM Trans. Graph..

[23]  Katsushi Ikeuchi,et al.  A sensor fusion approach for recognizing continuous human grasping sequences using hidden Markov models , 2005, IEEE Transactions on Robotics.

[24]  Wolfram Burgard,et al.  Learning manipulation actions from human demonstrations , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[25]  Peter K. Allen,et al.  Graspit! A versatile simulator for robotic grasping , 2004, IEEE Robotics & Automation Magazine.

[26]  Mahdi Rad,et al.  HO-3D: A Multi-User, Multi-Object Dataset for Joint 3D Hand-Object Pose Estimation , 2019, ArXiv.

[27]  Antti Oulasvirta,et al.  Real-Time Joint Tracking of a Hand Manipulating an Object from RGB-D Input , 2016, ECCV.

[28]  Dimitrios Tzionas,et al.  Expressive Body Capture: 3D Hands, Face, and Body From a Single Image , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Dimitrios Tzionas,et al.  Learning Multi-human Optical Flow , 2020, International Journal of Computer Vision.

[30]  Masanobu Yamamoto,et al.  Scene constraints-aided tracking of human body , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[31]  Leonidas J. Guibas,et al.  ShapeNet: An Information-Rich 3D Model Repository , 2015, ArXiv.

[32]  Michael J. Black,et al.  Capture, Learning, and Synthesis of 3D Speaking Styles , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Danica Kragic,et al.  The GRASP Taxonomy of Human Grasp Types , 2016, IEEE Transactions on Human-Machine Systems.

[34]  Tamim Asfour,et al.  The KIT whole-body human motion database , 2015, 2015 International Conference on Advanced Robotics (ICAR).

[35]  Jehee Lee,et al.  Motion patches: buildings blocks for virtual environments annotated with motion data , 2005, SIGGRAPH 2005.

[36]  Zhe Wang,et al.  Geometric Pose Affordance: 3D Human Pose with Scene Constraints , 2019, ArXiv.

[37]  Marc Pollefeys,et al.  Capturing Hands in Action Using Discriminative Salient Points and Physics Simulation , 2015, International Journal of Computer Vision.

[38]  Nikolaus F. Troje,et al.  AMASS: Archive of Motion Capture As Surface Shapes , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[39]  J. Napier The prehensile movements of the human hand. , 1956, The Journal of bone and joint surgery. British volume.

[40]  Ersin Yumer,et al.  iMapper: Interaction-guided Joint Scene and Human Motion Mapping from Monocular Videos , 2018, ACM Trans. Graph..

[41]  KinKenrick,et al.  Online optical marker-based hand tracking with deep labels , 2018 .

[42]  Vincent Lepetit,et al.  Generalized Feedback Loop for Joint Hand-Object Pose Estimation , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  Luc Van Gool,et al.  An object-dependent hand pose prior from sparse training data , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[44]  Cordelia Schmid,et al.  Learning Joint Reconstruction of Hands and Manipulated Objects , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Tomás Lozano-Pérez,et al.  Imitation Learning of Whole-Body Grasps , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[46]  Matei T. Ciocarlie,et al.  The Columbia grasp database , 2009, 2009 IEEE International Conference on Robotics and Automation.

[47]  Francesc Moreno-Noguer,et al.  Context-Aware Human Motion Prediction , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Jehee Lee,et al.  Motion patches: building blocks for virtual environments annotated with motion data , 2006, ACM Trans. Graph..

[49]  Antonis A. Argyros,et al.  Hand-Object Contact Force Estimation from Markerless Visual Tracking , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[50]  Antonis A. Argyros,et al.  Joint 3D Tracking of a Deformable Object in Interaction with a Hand , 2018, ECCV.

[51]  Antti Oulasvirta,et al.  How We Type: Movement Strategies and Performance in Everyday Typing , 2016, CHI.

[52]  Shanxin Yuan,et al.  First-Person Hand Action Benchmark with RGB-D Videos and 3D Hand Pose Annotations , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[53]  Karan Singh,et al.  Eurographics/siggraph Symposium on Computer Animation (2003) Handrix: Animating the Human Hand , 2003 .

[54]  Francesc Moreno-Noguer,et al.  GanHand: Predicting Human Grasp Affordances in Multi-Object Scenes , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Yi Zhou,et al.  On the Continuity of Rotation Representations in Neural Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[56]  Cordelia Schmid,et al.  Learning from Synthetic Humans , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[57]  Jitendra Malik,et al.  End-to-End Recovery of Human Shape and Pose , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[58]  Hans-Peter Seidel,et al.  Markerless motion capture of man-machine interaction , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[59]  Anis Sahbani,et al.  An overview of 3D object grasp synthesis algorithms , 2012, Robotics Auton. Syst..

[60]  Tero Karras,et al.  Maximizing parallelism in the construction of BVHs, octrees, and k-d trees , 2012, EGGH-HPG'12.

[61]  Edward H. Adelson,et al.  Microgeometry capture using an elastomeric sensor , 2011, ACM Trans. Graph..

[62]  Victor B. Zordan,et al.  Physically based grasping control from example , 2005, SCA '05.

[63]  Charles C. Kemp,et al.  ContactDB: Analyzing and Predicting Grasp Contact via Thermal Imaging , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[64]  Yan Zhang,et al.  Generating 3D People in Scenes Without People , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[65]  Danica Kragic,et al.  Tracking people interacting with objects , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[66]  Leonidas J. Guibas,et al.  Shape2Pose: human-centric shape analysis , 2014, ACM Trans. Graph..

[67]  Hans-Peter Seidel,et al.  Markerless Motion Capture with unsynchronized moving cameras , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[68]  Tamim Asfour,et al.  A whole-body pose taxonomy for loco-manipulation tasks , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[69]  Nicolas Mansard,et al.  Estimating 3D Motion and Forces of Person-Object Interactions From Monocular Video , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[70]  C. Karen Liu,et al.  Synthesis of detailed hand manipulations using contact sampling , 2012, ACM Trans. Graph..

[71]  Leonidas J. Guibas,et al.  Understanding and Exploiting Object Interaction Landscapes , 2017 .

[72]  N. Kamakura,et al.  Patterns of static prehension in normal hands. , 1980, The American journal of occupational therapy : official publication of the American Occupational Therapy Association.

[73]  A. Aldo Faisal,et al.  Haptic SLAM: An Ideal Observer Model for Bayesian Inference of Object Shape and Hand Pose from Contact Dynamics , 2016, EuroHaptics.

[74]  David J. Fleet,et al.  Physics-Based Person Tracking Using the Anthropomorphic Walker , 2010, International Journal of Computer Vision.

[75]  Kenrick Kin,et al.  Online optical marker-based hand tracking with deep labels , 2018, ACM Trans. Graph..

[76]  Mark R. Cutkosky,et al.  On grasp choice, grasp models, and the design of hands for manufacturing tasks , 1989, IEEE Trans. Robotics Autom..

[77]  Antonis A. Argyros,et al.  Full DOF tracking of a hand interacting with an object by modeling occlusions and physical constraints , 2011, 2011 International Conference on Computer Vision.

[78]  Deva Ramanan,et al.  Understanding Everyday Hands in Action from RGB-D Images , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).