STAR: Sparse Trained Articulated Human Body Regressor

The SMPL body model is widely used for the estimation, synthesis, and analysis of 3D human pose and shape. While popular, we show that SMPL has several limitations and introduce STAR, which is quantitatively and qualitatively superior to SMPL. First, SMPL has a huge number of parameters resulting from its use of global blend shapes. These dense pose-corrective offsets relate every vertex on the mesh to all the joints in the kinematic tree, capturing spurious long-range correlations. To address this, we define per-joint pose correctives and learn the subset of mesh vertices that are influenced by each joint movement. This sparse formulation results in more realistic deformations and significantly reduces the number of model parameters to 20% of SMPL. When trained on the same data as SMPL, STAR generalizes better despite having many fewer parameters. Second, SMPL factors pose-dependent deformations from body shape while, in reality, people with different shapes deform differently. Consequently, we learn shape-dependent pose-corrective blend shapes that depend on both body pose and BMI. Third, we show that the shape space of SMPL is not rich enough to capture the variation in the human population. We address this by training STAR with an additional 10,000 scans of male and female subjects, and show that this results in better model generalization. STAR is compact, generalizes better to new bodies and is a drop-in replacement for SMPL. STAR is publicly available for research purposes at this http URL.

[1]  Ignas Budvytis,et al.  Indirect deep structured learning for 3D human body shape and pose prediction , 2017, BMVC.

[2]  Zoran Popovic,et al.  The space of human body shapes: reconstruction and parameterization from range scans , 2003, ACM Trans. Graph..

[3]  Nikolaus F. Troje,et al.  AMASS: Archive of Motion Capture As Surface Shapes , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[4]  Michael J. Black,et al.  Dyna: a model of dynamic human shape in motion , 2015, ACM Trans. Graph..

[5]  Michael J. Black,et al.  MoSh: motion and shape capture from sparse markers , 2014, ACM Trans. Graph..

[6]  John P. Lewis,et al.  Pose space deformation: a unified approach to shape interpolation and skeleton-driven deformation , 2000, SIGGRAPH.

[7]  Abd El Rahman Shabayek,et al.  3DBodyTex: Textured 3D Body Dataset , 2018, 2018 International Conference on 3D Vision (3DV).

[8]  Nadia Magnenat-Thalmann,et al.  Human body deformations using joint-dependent local operators and finite-element theory , 1991 .

[9]  Hans-Peter Seidel,et al.  A Statistical Model of Human Pose and Body Shape , 2009, Comput. Graph. Forum.

[10]  Xiaowei Zhou,et al.  Learning to Estimate 3D Human Pose and Shape from a Single Color Image , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[11]  Michael J. Black,et al.  VIBE: Video Inference for Human Body Pose and Shape Estimation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Marcus A. Magnor,et al.  Sparse localized deformation components , 2013, ACM Trans. Graph..

[13]  Kun Zhou,et al.  NeuroSkinning: automatic skin binding for production characters with deep graph networks , 2019, ACM Trans. Graph..

[14]  Jitendra Malik,et al.  End-to-End Recovery of Human Shape and Pose , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[15]  John P. Lewis,et al.  Real‐Time Weighted Pose‐Space Deformation on the GPU , 2006, Comput. Graph. Forum.

[16]  Michael J. Black,et al.  Lie Bodies: A Manifold Representation of 3D Human Shape , 2012, ECCV.

[17]  Michael J. Black,et al.  Deep Inertial Poser: Learning to Reconstruct Human Pose from Sparse Inertial Measurements in Real Time , 2018 .

[18]  Zoran Popovic,et al.  Articulated body deformation from range scan data , 2002, SIGGRAPH.

[19]  Cristian Sminchisescu,et al.  Monocular 3D Pose and Shape Estimation of Multiple People in Natural Scenes: The Importance of Multiple Scene Constraints , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[20]  Bodo Rosenhahn,et al.  Sparse Inertial Poser: Automatic 3D Human Pose Estimation from Sparse IMUs , 2017, Comput. Graph. Forum.

[21]  Michael J. Black,et al.  SMPL: A Skinned Multi-Person Linear Model , 2015, ACM Trans. Graph..

[22]  Bernt Schiele,et al.  Building statistical shape spaces for 3D human modeling , 2015, Pattern Recognit..

[23]  Sebastian Thrun,et al.  SCAPE: shape completion and animation of people , 2005, SIGGRAPH '05.

[24]  Kathleen M. Robinette,et al.  Civilian American and European Surface Anthropometry Resource (CAESAR), Final Report. Volume 1. Summary , 2002 .

[25]  Nadia Magnenat-Thalmann,et al.  Synthesizing animatable body models with parameterized shape modifications , 2003, SCA '03.

[26]  Marcus A. Magnor,et al.  Detailed Human Avatars from Monocular Video , 2018, 2018 International Conference on 3D Vision (3DV).

[27]  Dinesh K. Pai,et al.  EigenSkin: real time large deformation character skinning in hardware , 2002, SCA '02.

[28]  CurlessBrian,et al.  The space of human body shapes , 2003 .

[29]  Michael J. Black,et al.  Learning to Reconstruct 3D Human Pose and Shape via Model-Fitting in the Loop , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[30]  Cristian Sminchisescu,et al.  GHUM & GHUML: Generative 3D Human Shape and Articulated Pose Models , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  CurlessBrian,et al.  Articulated body deformation from range scan data , 2002 .

[32]  Yinghao Huang,et al.  Towards Accurate Marker-Less Human Shape and Pose Estimation over Time , 2017, 2017 International Conference on 3D Vision (3DV).

[33]  Konrad Schindler,et al.  Chained Representation Cycling: Learning to Estimate 3D Human Pose and Shape by Cycling Between Representations , 2020, AAAI.

[34]  Olga Sorkine-Hornung,et al.  Fast automatic skinning transformations , 2012, ACM Trans. Graph..

[35]  Tsuneya Kurihara,et al.  Modeling deformable human hands from medical images , 2004, SCA '04.

[36]  Michael J. Black,et al.  Coregistration: Simultaneous Alignment and Modeling of Articulated 3D Shape , 2012, ECCV.

[37]  Daniel Thalmann,et al.  Joint-dependent local deformations for hand animation and object grasping , 1989 .

[38]  Zicheng Liu,et al.  Tensor-Based Human Body Modeling , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.