On Self-Contact and Human Pose

People touch their face 23 times an hour, they cross their arms and legs, put their hands on their hips, etc. While many images of people contain some form of self-contact, current 3D human pose and shape (HPS) regression methods typically fail to estimate this contact. To address this, we develop new datasets and methods that significantly improve human pose estimation with self-contact. First, we create a dataset of 3D Contact Poses (3DCP) containing SMPL-X bodies fit to 3D scans as well as poses from AMASS, which we refine to ensure good contact. Second, we leverage this to create the Mimic-The-Pose (MTP) dataset of images, collected via Amazon Mechanical Turk, containing people mimicking the 3DCP poses with self-contact. Third, we develop a novel HPS optimization method, SMPLify-XMC, that includes contact constraints and uses the known 3DCP body pose during fitting to create near ground-truth poses for MTP images. Fourth, for more image variety, we label a dataset of in-the-wild images with Discrete Self-Contact (DSC) information and use another new optimization method, SMPLify-DC, that exploits discrete contacts during pose optimization. Finally, we use our datasets during SPIN training to learn a new 3D human pose regressor, called TUCH (Towards Understanding Contact in Humans). We show that the new self-contact training data significantly improves 3D human pose estimates on withheld test data and existing datasets like 3DPW. Not only does our method improve results for self-contact poses, but it also improves accuracy for non-contact poses. The code and data are available for research purposes at https://tuch.is.tue.mpg.de.

[1]  Michael J. Black,et al.  FAUST: Dataset and Evaluation for 3D Mesh Registration , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Nicolas Mansard,et al.  Estimating 3D Motion and Forces of Person-Object Interactions From Monocular Video , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Sebastian Thrun,et al.  Real-Time Human Pose Tracking from Range Data , 2012, ECCV.

[4]  Nassir Navab,et al.  3D Pictorial Structures Revisited: Multiple Human Pose Estimation , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Xiaowei Zhou,et al.  Ordinal Depth Supervision for 3D Human Pose Estimation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6]  Odest Chadwicke Jenkins,et al.  Dynamical Simulation Priors for Human Motion Tracking , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Ioannis A. Kakadiaris,et al.  3D Human pose estimation: A review of the literature and analysis of covariates , 2016, Comput. Vis. Image Underst..

[8]  Taku Komura,et al.  A Sampling Approach to Generating Closely Interacting 3D Pose-Pairs from 2D Annotations , 2019, IEEE Transactions on Visualization and Computer Graphics.

[9]  Olga Sorkine-Hornung,et al.  Robust inside-outside segmentation using generalized winding numbers , 2013, ACM Trans. Graph..

[10]  Kathleen M. Robinette,et al.  The CAESAR project: a 3-D surface anthropometry survey , 1999, Second International Conference on 3-D Digital Imaging and Modeling (Cat. No.PR00062).

[11]  Cristian Sminchisescu,et al.  Pictorial Human Spaces: How Well Do Humans Perceive a 3D Articulated Pose? , 2013, 2013 IEEE International Conference on Computer Vision.

[12]  Danica Kragic,et al.  Tracking people interacting with objects , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[13]  Jitendra Malik,et al.  End-to-End Recovery of Human Shape and Pose , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[14]  Yaser Sheikh,et al.  Total Capture: A 3D Deformation Model for Tracking Faces, Hands, and Bodies , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[15]  Dragomir Anguelov,et al.  SCAPE: shape completion and animation of people , 2005, ACM Trans. Graph..

[16]  Joachim Tesch,et al.  AGORA: Avatars in Geography Optimized for Regression Analysis , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Cristian Sminchisescu,et al.  Three-Dimensional Reconstruction of Human Interactions , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Bodo Rosenhahn,et al.  Posebits for Monocular Human Pose Estimation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Cristian Sminchisescu,et al.  Monocular 3D Pose and Shape Estimation of Multiple People in Natural Scenes: The Importance of Multiple Scene Constraints , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[20]  Bodo Rosenhahn,et al.  Supplementary Material to: Recovering Accurate 3D Human Pose in The Wild Using IMUs and a Moving Camera , 2018 .

[21]  Chenglei Wu,et al.  MonoClothCap: Towards Temporally Coherent Clothing Capture from Monocular RGB Video , 2020, 2020 International Conference on 3D Vision (3DV).

[22]  Pascal Fua,et al.  Monocular 3D Human Pose Estimation in the Wild Using Improved CNN Supervision , 2016, 2017 International Conference on 3D Vision (3DV).

[23]  Pascal Fua,et al.  XNect: Real-time Multi-Person 3D Motion Capture with a Single RGB Camera , 2020, SIGGRAPH 2020.

[24]  Hans-Peter Seidel,et al.  A Statistical Model of Human Pose and Body Shape , 2009, Comput. Graph. Forum.

[25]  Mark Everingham,et al.  Learning effective human pose estimation from inaccurate annotation , 2011, CVPR 2011.

[26]  Dimitrios Tzionas,et al.  Embodied Hands: Modeling and Capturing Hands and Bodies Together , 2022, ArXiv.

[27]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[28]  Minh Hoai,et al.  Detecting Hands and Recognizing Physical Contact in the Wild , 2020, NeurIPS.

[29]  Peter V. Gehler,et al.  Keep It SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image , 2016, ECCV.

[30]  Dimitrios Tzionas,et al.  Expressive Body Capture: 3D Hands, Face, and Body From a Single Image , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Michael J. Black,et al.  Dynamic FAUST: Registering Human Bodies in Motion , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Alexander M. Bronstein,et al.  Numerical Geometry of Non-Rigid Shapes , 2009, Monographs in Computer Science.

[33]  Cristian Sminchisescu,et al.  Learning Complex 3D Human Self-Contact , 2020, AAAI.

[34]  Jorge Nocedal,et al.  On the limited memory BFGS method for large scale optimization , 1989, Math. Program..

[35]  Xiaogang Wang,et al.  DeepFashion: Powering Robust Clothes Recognition and Retrieval with Rich Annotations , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Masanobu Yamamoto,et al.  Scene constraints-aided tracking of human body , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[37]  Yaser Sheikh,et al.  Monocular Total Capture: Posing Face, Body, and Hands in the Wild , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Hsi-Jian Lee,et al.  Determination of 3D human body postures from a single view , 1985, Comput. Vis. Graph. Image Process..

[39]  Michael J. Black,et al.  Dyna: a model of dynamic human shape in motion , 2015, ACM Trans. Graph..

[40]  Dimitrios Tzionas,et al.  Resolving 3D Human Pose Ambiguities With 3D Scene Constraints , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[41]  Michael J. Black,et al.  Resolving 3 D Human Pose Ambiguities with 3 D Scene Constraints , 2019 .

[42]  Andrea Vedaldi,et al.  Exemplar Fine-Tuning for 3D Human Model Fitting Towards In-the-Wild 3D Human Pose Estimation , 2020, 2021 International Conference on 3D Vision (3DV).

[43]  Mark Everingham,et al.  Clustered Pose and Nonlinear Appearance Models for Human Pose Estimation , 2010, BMVC.

[44]  Michael J. Black,et al.  Coregistration: Simultaneous Alignment and Modeling of Articulated 3D Shape , 2012, ECCV.

[45]  Abd El Rahman Shabayek,et al.  A survey on Deep Learning Advances on Different 3D Data Representations , 2018, 1808.01462.

[46]  Mary-Louise McLaws,et al.  Face touching: A frequent habit that has implications for hand hygiene , 2015, American Journal of Infection Control.

[47]  Christoph Bregler,et al.  Learning invariance through imitation , 2011, CVPR 2011.

[48]  山田 祐,et al.  Open Dynamics Engine を用いたスノーボードロボットシミュレータの開発 , 2007 .

[49]  Larry S. Davis,et al.  Context and observation driven latent variable model for human pose estimation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[50]  Michael J. Black,et al.  Learning to Reconstruct 3D Human Pose and Shape via Model-Fitting in the Loop , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[51]  Nikolaus F. Troje,et al.  AMASS: Archive of Motion Capture As Surface Shapes , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).