I M Avatar: Implicit Morphable Head Avatars from Videos

Traditional morphable face models provide fine-grained control over expression but cannot easily capture geometric and appearance details. Neural volumetric representations approach photo-realism but are hard to animate and do not generalize well to unseen expressions. To tackle this problem, we propose IMavatar (Implicit Morphable avatar), a novel method for learning implicit head avatars from monocular videos. Inspired by the fine-grained control mechanisms afforded by conventional 3DMMs, we represent the expressionand pose-related deformations via learned blendshapes and skinning fields. These attributes are pose-independent and can be used to morph the canonical geometry and texture fields given novel expression and pose parameters. We employ ray tracing and iterative rootfinding to locate the canonical surface intersection for each pixel. A key contribution is our novel analytical gradient formulation that enables end-to-end training of IMavatars from videos. We show quantitatively and qualitatively that our method improves geometry and covers a more complete expression space compared to state-of-the-art methods.

[1]  Michael J. Black,et al.  Learning to Regress 3D Face Shape and Expression From an Image Without 3D Supervision , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Michael J. Black,et al.  Capture, Learning, and Synthesis of 3D Speaking Styles , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Pratul P. Srinivasan,et al.  NeRF , 2020, ECCV.

[4]  Francesc Moreno-Noguer,et al.  D-NeRF: Neural Radiance Fields for Dynamic Scenes , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Long Quan,et al.  Self-Supervised Monocular 3D Face Reconstruction by Occlusion-Aware Multi-view Geometry Consistency , 2020, ECCV.

[6]  Nick Pears,et al.  Statistical Modeling of Craniofacial Shape and Texture , 2019, International Journal of Computer Vision.

[7]  Andrea Tagliasacchi,et al.  NASA: Neural Articulated Shape Approximation , 2020, ECCV.

[8]  Ronen Basri,et al.  Multiview Neural Surface Reconstruction by Disentangling Geometry and Appearance , 2020, NeurIPS.

[9]  Justus Thies,et al.  Dynamic Neural Radiance Fields for Monocular 4D Facial Avatar Reconstruction , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  M. Zollhöfer,et al.  Self-Supervised Multi-level Face Model Learning for Monocular Reconstruction at Over 250 Hz , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[11]  Stefanos Zafeiriou,et al.  GANFIT: Generative Adversarial Network Fitting for High Fidelity 3D Face Reconstruction , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Michael J. Black,et al.  LEAP: Learning Articulated Occupancy of People , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Sami Romdhani,et al.  A 3D Face Model for Pose and Illumination Invariant Face Recognition , 2009, 2009 Sixth IEEE International Conference on Advanced Video and Signal Based Surveillance.

[14]  Ruigang Yang,et al.  FaceScape: A Large-Scale High Quality 3D Face Dataset and Detailed Riggable 3D Face Prediction , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Michael Zollhöfer,et al.  Learning Compositional Radiance Fields of Dynamic Human Heads , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Justus Thies,et al.  Face2Face: Real-Time Face Capture and Reenactment of RGB Videos , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Alexei A. Efros,et al.  The Unreasonable Effectiveness of Deep Features as a Perceptual Metric , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[18]  Michael J. Black,et al.  Generating 3D faces using Convolutional Mesh Autoencoders , 2018, ECCV.

[19]  C. G. Broyden A Class of Methods for Solving Nonlinear Simultaneous Equations , 1965 .

[20]  Ronald Fedkiw,et al.  High-Quality Face Capture Using Anatomical Muscles , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Patrick Pérez,et al.  MoFA: Model-Based Deep Convolutional Face Autoencoder for Unsupervised Monocular Reconstruction , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[22]  Gordon Wetzstein,et al.  Neural Lumigraph Rendering , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Hao Zhang,et al.  Learning Implicit Fields for Generative Shape Modeling , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Bernhard Egger,et al.  Markov Chain Monte Carlo for Automated Face Image Analysis , 2017, International Journal of Computer Vision.

[25]  Michael J. Black,et al.  Learning an animatable detailed 3D face model from in-the-wild images , 2020, ArXiv.

[26]  Tony Tung,et al.  Neural-GIF: Neural Generalized Implicit Functions for Animating People in Clothing , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[27]  Kun Zhou,et al.  Real-time facial animation with image-based dynamic avatars , 2016, ACM Trans. Graph..

[28]  Baoyuan Wang,et al.  Personalized Face Modeling for Improved Face Reconstruction and Motion Retargeting , 2020, ECCV.

[29]  Stefanos Zafeiriou,et al.  AvatarMe: Realistically Renderable 3D Facial Reconstruction “In-the-Wild” , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Jonathan T. Barron,et al.  Nerfies: Deformable Neural Radiance Fields , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[31]  Christian Theobalt,et al.  Non-Rigid Neural Radiance Fields: Reconstruction and Novel View Synthesis of a Dynamic Scene From Monocular Video , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[32]  Richard A. Newcombe,et al.  DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Mark Pauly,et al.  Dynamic 3D avatar creation from hand-held video input , 2015, ACM Trans. Graph..

[34]  Changil Kim,et al.  Space-time Neural Irradiance Fields for Free-Viewpoint Video , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Michael J. Black,et al.  SNARF: Differentiable Forward Skinning for Animating Non-Rigid Neural Implicit Shapes , 2021, IEEE International Conference on Computer Vision.

[36]  Stefanos Zafeiriou,et al.  Towards a Complete 3D Morphable Model of the Human Head , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Baoyuan Wang,et al.  JNR: Joint-based Neural Rig Representation for Compact 3D Face Modeling , 2020, ECCV.

[38]  Christian Theobalt,et al.  Reconstruction of Personalized 3D Face Rigs from Monocular Video , 2016, ACM Trans. Graph..

[39]  Sebastian Nowozin,et al.  Occupancy Networks: Learning 3D Reconstruction in Function Space , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Hao Li,et al.  Avatar digitization from a single image for real-time rendering , 2017, ACM Trans. Graph..

[41]  Louis Chevallier,et al.  Towards High Fidelity Monocular Face Reconstruction with Rich Reflectance using Self-supervised Learning and Ray Tracing , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[42]  Xiaoming Liu,et al.  Nonlinear 3D Face Morphable Model , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[43]  Jonathan T. Barron,et al.  HyperNeRF , 2021, ACM Trans. Graph..

[44]  Jiaolong Yang,et al.  Accurate 3D Face Reconstruction With Weakly-Supervised Learning: From Single Image to Image Set , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[45]  Andreas Geiger,et al.  Occupancy Flow: 4D Reconstruction by Learning Particle Dynamics , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[46]  Michael J. Black,et al.  Learning a model of facial shape and expression from 4D scans , 2017, ACM Trans. Graph..

[47]  Rynson W. H. Lau,et al.  Is a Green Screen Really Necessary for Real-Time Portrait Matting? , 2020, ArXiv.

[48]  Ping Tan,et al.  Riggable 3D Face Reconstruction via In-Network Optimization , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Francesc Moreno-Noguer,et al.  H3D-Net: Few-Shot High-Fidelity 3D Head Reconstruction , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[50]  Georgios Tzimiropoulos,et al.  How Far are We from Solving the 2D & 3D Face Alignment Problem? (and a Dataset of 230,000 3D Facial Landmarks) , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[51]  Daniel Cremers,et al.  i3DMM: Deep Implicit 3D Morphable Model of Human Heads , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Alec Jacobson,et al.  NiLBS: Neural Inverse Linear Blend Skinning , 2020, ArXiv.

[53]  Gordon Wetzstein,et al.  pi-GAN: Periodic Implicit Generative Adversarial Networks for 3D-Aware Image Synthesis , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Sami Romdhani,et al.  Estimating 3D shape and texture using pixel intensity, edges, specular highlights, texture constraints and a prior , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[55]  Matthew Turk,et al.  A Morphable Model For The Synthesis Of 3D Faces , 1999, SIGGRAPH.

[56]  Yiying Tong,et al.  FaceWarehouse: A 3D Facial Expression Database for Visual Computing , 2014, IEEE Transactions on Visualization and Computer Graphics.

[57]  Michael J. Black,et al.  SCANimate: Weakly Supervised Learning of Skinned Clothed Avatar Networks , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[58]  Jovan Popovic,et al.  Deformation transfer for triangle meshes , 2004, ACM Trans. Graph..

[59]  Hans-Peter Seidel,et al.  FML: Face Model Learning From Videos , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[60]  Hanspeter Pfister,et al.  Face transfer with multilinear models , 2005, ACM Trans. Graph..