Detailed Full-Body Reconstructions of Moving People from Monocular RGB-D Sequences

We accurately estimate the 3D geometry and appearance of the human body from a monocular RGB-D sequence of a user moving freely in front of the sensor. Range data in each frame is first brought into alignment with a multi-resolution 3D body model in a coarse-to-fine process. The method then uses geometry and image texture over time to obtain accurate shape, pose, and appearance information despite unconstrained motion, partial views, varying resolution, occlusion, and soft tissue deformation. Our novel body model has variable shape detail, allowing it to capture faces with a high-resolution deformable head model and body shape with lower-resolution. Finally we combine range data from an entire sequence to estimate a high-resolution displacement map that captures fine shape details. We compare our recovered models with high-resolution scans from a professional system and with avatars created by a commercial product. We extract accurate 3D avatars from challenging motion sequences and even capture soft tissue dynamics.

[1]  Stuart Geman,et al.  Statistical methods for tomographic image reconstruction , 1987 .

[2]  Michael Garland,et al.  Surface simplification using quadric error metrics , 1997, SIGGRAPH.

[3]  Kathleen M. Robinette,et al.  The CAESAR project: a 3-D surface anthropometry survey , 1999, Second International Conference on 3-D Digital Imaging and Modeling (Cat. No.PR00062).

[4]  D K Smith,et al.  Numerical Optimization , 2001, J. Oper. Res. Soc..

[5]  Adrian Hilton,et al.  Model-based multiple view reconstruction of people , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[6]  Sebastian Thrun,et al.  SCAPE: shape completion and animation of people , 2005, SIGGRAPH '05.

[7]  Hans-Peter Seidel,et al.  Performance capture from sparse multi-view video , 2008, ACM Trans. Graph..

[8]  Huamin Wang,et al.  Modeling deformable objects from a single depth camera , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[9]  Hans-Peter Seidel,et al.  Motion capture using joint skeleton tracking and surface estimation , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Pieter Peers,et al.  Dynamic shape capture using multi-view photometric stereo , 2009, ACM Trans. Graph..

[11]  Leonidas J. Guibas,et al.  Robust single-view geometry and motion reconstruction , 2009, ACM Trans. Graph..

[12]  Matthias Zwicker,et al.  Global registration of dynamic range scans for articulated model reconstruction , 2011, TOGS.

[13]  Michael J. Black,et al.  Home 3D body scans from noisy image and range data , 2011, 2011 International Conference on Computer Vision.

[14]  Andrew W. Fitzgibbon,et al.  KinectFusion: Real-time dense surface mapping and tracking , 2011, 2011 10th IEEE International Symposium on Mixed and Augmented Reality.

[15]  Andrew W. Fitzgibbon,et al.  KinectFusion: real-time 3D reconstruction and interaction using a moving depth camera , 2011, UIST.

[16]  Pieter Peers,et al.  Temporally coherent completion of dynamic shapes , 2012, TOGS.

[17]  Qionghai Dai,et al.  Performance Capture of Interacting Characters with Handheld Kinects , 2012, ECCV.

[18]  Michael J. Black,et al.  Coregistration: Simultaneous Alignment and Modeling of Articulated 3D Shape , 2012, ECCV.

[19]  Didier Stricker,et al.  KinectAvatar: Fully Automatic Body Capture Using a Single Kinect , 2012, ACCV Workshops.

[20]  Ligang Liu,et al.  Scanning 3D Full Human Bodies Using Kinects , 2012, IEEE Transactions on Visualization and Computer Graphics.

[21]  Zicheng Liu,et al.  Tensor-Based Human Body Modeling , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Jan-Michael Frahm,et al.  Scanning and tracking dynamic objects with commodity depth cameras , 2013, 2013 IEEE International Symposium on Mixed and Augmented Reality (ISMAR).

[23]  Ming Zeng,et al.  Templateless Quasi-rigid Shape Modeling with Implicit Loop-Closure , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Jonathan T. Barron,et al.  3D self-portraits , 2013, ACM Trans. Graph..

[25]  Hans-Peter Seidel,et al.  Personalization and Evaluation of a Real-Time Depth-Based Full Body Tracker , 2013, 2013 International Conference on 3D Vision.

[26]  Gérard G. Medioni,et al.  Rapid avatar capture and simulation using commodity depth sensors , 2014, Comput. Animat. Virtual Worlds.

[27]  Vladlen Koltun,et al.  Color map optimization for 3D reconstruction with consumer depth cameras , 2014, ACM Trans. Graph..

[28]  Björn Stenger,et al.  Human Body Shape Estimation Using a Multi-resolution Manifold Forest , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Bo Fu,et al.  Quality Dynamic Human Body Modeling Using a Single Low-Cost Depth Camera , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Michael J. Black,et al.  FAUST: Dataset and Evaluation for 3D Mesh Registration , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Andrew W. Fitzgibbon,et al.  Real-time non-rigid reconstruction using an RGB-D camera , 2014, ACM Trans. Graph..

[32]  Michael J. Black,et al.  OpenDR: An Approximate Differentiable Renderer , 2014, ECCV.

[33]  Dieter Fox,et al.  DynamicFusion: Reconstruction and tracking of non-rigid scenes in real-time , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).