Learning and Tracking the 3D Body Shape of Freely Moving Infants from RGB-D sequences

Statistical models of the human body surface are generally learned from thousands of high-quality 3D scans in predefined poses to cover the wide variety of human body shapes and articulations. Acquisition of such data requires expensive equipment, calibration procedures, and is limited to cooperative subjects who can understand and follow instructions, such as adults. We present a method for learning a statistical 3D Skinned Multi-Infant Linear body model (SMIL) from incomplete, low-quality RGB-D sequences of freely moving infants. Quantitative experiments show that SMIL faithfully represents the RGB-D data and properly factorizes the shape and pose of the infants. To demonstrate the applicability of SMIL, we fit the model to RGB-D sequences of freely moving infants and show, with a case study, that our method captures enough motion detail for General Movements Assessment (GMA), a method used in clinical practice for early detection of neurodevelopmental disorders in infants. SMIL provides a new tool for analyzing infant shape and movement and is a step towards an automated system for GMA.

[1]  Michael J. Black,et al.  OpenDR: An Approximate Differentiable Renderer , 2014, ECCV.

[2]  Andrew W. Fitzgibbon,et al.  What Shape Are Dolphins? Building 3D Morphable Models from 2D Images , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Ruigang Yang,et al.  Real-Time Simultaneous Pose and Shape Estimation for Articulated Objects Using a Single Depth Camera , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Michael Arens,et al.  Computer Vision for Medical Infant Motion Analysis: State of the Art and RGB-D Data Set , 2018, ECCV Workshops.

[5]  Matthew Turk,et al.  A Morphable Model For The Synthesis Of 3D Faces , 1999, SIGGRAPH.

[6]  Ersin Yumer,et al.  Self-supervised Learning of Motion Capture , 2017, NIPS.

[7]  Xiaowei Zhou,et al.  Learning to Estimate 3D Human Pose and Shape from a Single Color Image , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[8]  Michael J. Black,et al.  3D Menagerie: Modeling the 3D Shape and Pose of Animals , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Tao Yu,et al.  BodyFusion: Real-Time Capture of Human Motion and Surface Geometry Using a Single Depth Camera , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[10]  Michael J. Black,et al.  The stitched puppet: A graphical model of 3D human shape and pose , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[12]  Bo Fu,et al.  Quality Dynamic Human Body Modeling Using a Single Low-Cost Depth Camera , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Ignas Budvytis,et al.  Indirect deep structured learning for 3D human body shape and pose prediction , 2017, BMVC.

[14]  Zoran Popovic,et al.  The space of human body shapes: reconstruction and parameterization from range scans , 2003, ACM Trans. Graph..

[15]  Nassir Navab,et al.  Patient MoCap: Human Pose Estimation Under Blanket Occlusion for Hospital Monitoring Applications , 2016, MICCAI.

[16]  Michael J. Black,et al.  SMPL: A Skinned Multi-Person Linear Model , 2015, ACM Trans. Graph..

[17]  Bernt Schiele,et al.  Building statistical shape spaces for 3D human modeling , 2015, Pattern Recognit..

[18]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Thomas Plötz,et al.  Movement Recognition Technology as a Method of Assessing Spontaneous General Movements in High Risk Infants , 2015, Front. Neurol..

[20]  Michael J. Black,et al.  Detailed, Accurate, Human Shape Estimation from Clothed 3D Scan Sequences , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Nadia Magnenat-Thalmann,et al.  An automatic modeling of human bodies from sizing parameters , 2003, I3D '03.

[22]  Mijna Hadders-Algra,et al.  General movements: A window for early identification of children at high risk for developmental disorders. , 2004, The Journal of pediatrics.

[23]  Qionghai Dai,et al.  Performance Capture of Interacting Characters with Handheld Kinects , 2012, ECCV.

[24]  Sebastian Thrun,et al.  SCAPE: shape completion and animation of people , 2005, SIGGRAPH '05.

[25]  Kathleen M. Robinette,et al.  Civilian American and European Surface Anthropometry Resource (CAESAR), Final Report. Volume 1. Summary , 2002 .

[26]  Yong Du,et al.  Hierarchical recurrent neural network for skeleton based action recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Qionghai Dai,et al.  DoubleFusion: Real-Time Capture of Human Performances with Inner Body Shapes from a Single Depth Sensor , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[28]  Michael J. Black,et al.  Lie Bodies: A Manifold Representation of 3D Human Shape , 2012, ECCV.

[29]  Hans-Peter Seidel,et al.  Personalization and Evaluation of a Real-Time Depth-Based Full Body Tracker , 2013, 2013 International Conference on 3D Vision.

[30]  Cordelia Schmid,et al.  Learning from Synthetic Humans , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Marcus A. Magnor,et al.  Video Based Reconstruction of 3D People Models , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[32]  Yaser Sheikh,et al.  Hand Keypoint Detection in Single Images Using Multiview Bootstrapping , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Michael J. Black,et al.  Detailed Full-Body Reconstructions of Moving People from Monocular RGB-D Sequences , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[34]  H. Prechtl Qualitative changes of spontaneous movements in fetus and preterm infant are a marker of neurological dysfunction. , 1990, Early human development.

[35]  Michael J. Black,et al.  Dynamic FAUST: Registering Human Bodies in Motion , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Michael J. Black,et al.  ClothCap: seamless 4D clothing capture and retargeting , 2017, ACM Trans. Graph..

[37]  D K Smith,et al.  Numerical Optimization , 2001, J. Oper. Res. Soc..

[38]  Nassir Navab,et al.  Human Motion Analysis with Deep Metric Learning , 2018, ECCV.

[39]  Gang Wang,et al.  Human Identity and Gender Recognition From Gait Sequences With Arbitrary Walking Directions , 2014, IEEE Transactions on Information Forensics and Security.

[40]  Michael J. Black,et al.  Dyna: a model of dynamic human shape in motion , 2015, ACM Trans. Graph..

[41]  Hans-Peter Seidel,et al.  A Statistical Model of Human Pose and Body Shape , 2009, Comput. Graph. Forum.

[42]  Peter V. Gehler,et al.  Neural Body Fitting: Unifying Deep Learning and Model Based Human Pose and Shape Estimation , 2018, 2018 International Conference on 3D Vision (3DV).

[43]  Michael J. Black,et al.  Coregistration: Simultaneous Alignment and Modeling of Articulated 3D Shape , 2012, ECCV.

[44]  Zicheng Liu,et al.  Tensor-Based Human Body Modeling , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[45]  Michael J. Black,et al.  Home 3D body scans from noisy image and range data , 2011, 2011 International Conference on Computer Vision.

[46]  Sebastian Thrun,et al.  Real-Time Human Pose Tracking from Range Data , 2012, ECCV.

[47]  Stuart Geman,et al.  Statistical methods for tomographic image reconstruction , 1987 .

[48]  Ralph R. Martin,et al.  Realtime Reconstruction of an Animating Human Body from a Single Depth Camera , 2016, IEEE Transactions on Visualization and Computer Graphics.

[49]  Meekyoung Kim,et al.  Data-driven physics for human soft tissue animation , 2017, ACM Trans. Graph..

[50]  Ming Zeng,et al.  SCAPE-based human performance reconstruction , 2014, Comput. Graph..

[51]  Linda Denehy,et al.  Validity of the Microsoft Kinect for assessment of postural control. , 2012, Gait & posture.

[52]  Varun Ramakrishna,et al.  Convolutional Pose Machines , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[53]  Ludwig Kappos,et al.  Quantifying Progression of Multiple Sclerosis via Classification of Depth Videos , 2014, MICCAI.

[54]  Matthew Stone,et al.  An anthropometric face model using variational techniques , 1998, SIGGRAPH.

[55]  Peter V. Gehler,et al.  Keep It SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image , 2016, ECCV.

[56]  Marcus A. Magnor,et al.  Detailed Human Avatars from Monocular Video , 2018, 2018 International Conference on 3D Vision (3DV).

[57]  Michael Arens,et al.  Learning an Infant Body Model from RGB-D Data for Accurate Full Body Motion Analysis , 2018, MICCAI.

[58]  Ronen Basri,et al.  Learning 3D Deformation of Animals from 2D Images , 2015, Comput. Graph. Forum.