Generating 3D faces using Convolutional Mesh Autoencoders

Learned 3D representations of human faces are useful for computer vision problems such as 3D face tracking and reconstruction from images, as well as graphics applications such as character generation and animation. Traditional models learn a latent representation of a face using linear subspaces or higher-order tensor generalizations. Due to this linearity, they can not capture extreme deformations and non-linear expressions. To address this, we introduce a versatile model that learns a non-linear representation of a face using spectral convolutions on a mesh surface. We introduce mesh sampling operations that enable a hierarchical mesh representation that captures non-linear variations in shape and expression at multiple scales within the model. In a variational setting, our model samples diverse realistic 3D faces from a multivariate Gaussian distribution. Our training data consists of 20,466 meshes of extreme expressions captured over 12 different subjects. Despite limited training data, our trained model outperforms state-of-the-art face models with 50% lower reconstruction error, while using 75% fewer parameters. We show that, replacing the expression space of an existing state-of-the-art face model with our model, achieves a lower reconstruction error. Our data, model and code are available at http://coma.is.tue.mpg.de/.

[1]  Michael Garland,et al.  Surface simplification using quadric error metrics , 1997, SIGGRAPH.

[2]  Matthew Turk,et al.  A Morphable Model For The Synthesis Of 3D Faces , 1999, SIGGRAPH.

[3]  Hanspeter Pfister,et al.  Face transfer with multilinear models , 2005, ACM Trans. Graph..

[4]  Jun Wang,et al.  A 3D facial expression database for facial behavior research , 2006, 7th International Conference on Automatic Face and Gesture Recognition (FGR06).

[5]  Arman Savran,et al.  Bosphorus Database for 3D Face Analysis , 2008, BIOID.

[6]  Thomas Vetter,et al.  Expression invariant 3D face recognition with a Morphable Model , 2008, 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition.

[7]  Lijun Yin,et al.  A high-resolution 3D dynamic facial expression database , 2008, 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition.

[8]  Sami Romdhani,et al.  A 3D Face Model for Pose and Illumination Invariant Face Recognition , 2009, 2009 Sixth IEEE International Conference on Advanced Video and Signal Based Surveillance.

[9]  Pierre Vandergheynst,et al.  Wavelets on Graphs via Spectral Graph Theory , 2009, ArXiv.

[10]  Alexander M. Bronstein,et al.  Numerical Geometry of Non-Rigid Shapes , 2009, Monographs in Computer Science.

[11]  M. Pauly,et al.  Example-based facial rigging , 2010, ACM Trans. Graph..

[12]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[13]  Fei Yang,et al.  Expression flow for 3D-aware face component transfer , 2011, ACM Trans. Graph..

[14]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[15]  Martin Breidt,et al.  Robust semantic analysis by synthesis of 3D facial motion , 2011, Face and Gesture 2011.

[16]  Adrian Hilton,et al.  A FACS valid 3D dynamic action unit database with applications to 3D dynamic morphable facial modeling , 2011, 2011 International Conference on Computer Vision.

[17]  Marcus A. Magnor,et al.  Sparse localized deformation components , 2013, ACM Trans. Graph..

[18]  Yangang Wang,et al.  Online modeling for realtime facial animation , 2013, ACM Trans. Graph..

[19]  Alan Brunton,et al.  Review of statistical shape spaces for 3D data with comparative analysis for human faces , 2012, Comput. Vis. Image Underst..

[20]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[21]  Joan Bruna,et al.  Spectral Networks and Locally Connected Networks on Graphs , 2013, ICLR.

[22]  Alan Brunton,et al.  Multilinear Wavelets: A Statistical Shape Space for Human Faces , 2014, ECCV.

[23]  Ming Yang,et al.  DeepFace: Closing the Gap to Human-Level Performance in Face Verification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Yiying Tong,et al.  FaceWarehouse: A 3D Facial Expression Database for Visual Computing , 2014, IEEE Transactions on Visualization and Computer Graphics.

[25]  Pierre Vandergheynst,et al.  Learning class‐specific descriptors for deformable shapes using localized spectral convolutional networks , 2015, SGP '15.

[26]  Justus Thies,et al.  Real-time expression transfer for facial reenactment , 2015, ACM Trans. Graph..

[27]  Pierre Vandergheynst,et al.  Geodesic Convolutional Neural Networks on Riemannian Manifolds , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[28]  Joan Bruna,et al.  Deep Convolutional Networks on Graph-Structured Data , 2015, ArXiv.

[29]  Alberto Del Bimbo,et al.  Dictionary Learning Based 3D Morphable Model Construction for Face Recognition with Varying Expression and Pose , 2015, 2015 International Conference on 3D Vision.

[30]  Koray Kavukcuoglu,et al.  Pixel Recurrent Neural Networks , 2016, ICML.

[31]  Theodore Lim,et al.  Generative and Discriminative Voxel Modeling with Convolutional Neural Networks , 2016, ArXiv.

[32]  Justus Thies,et al.  Demo of Face2Face: real-time face capture and reenactment of RGB videos , 2016, SIGGRAPH Emerging Technologies.

[33]  Karthik Ramani,et al.  Deep Learning 3D Shape Surfaces Using Geometry Images , 2016, ECCV.

[34]  Jonathan Masci,et al.  Learning shape correspondence with anisotropic convolutional neural networks , 2016, NIPS.

[35]  Xavier Bresson,et al.  Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering , 2016, NIPS.

[36]  Heiga Zen,et al.  WaveNet: A Generative Model for Raw Audio , 2016, SSW.

[37]  Georgios Tzimiropoulos,et al.  Large Pose 3D Face Reconstruction from a Single Image via Direct Volumetric CNN Regression , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[38]  Pierre Vandergheynst,et al.  Geometric Deep Learning: Going beyond Euclidean data , 2016, IEEE Signal Process. Mag..

[39]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[40]  Patrick Pérez,et al.  MoFA: Model-Based Deep Convolutional Face Autoencoder for Unsupervised Monocular Reconstruction , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[41]  Edmond Boyer,et al.  Dynamic Filters in Graph Convolutional Networks , 2017, ArXiv.

[42]  Michael J. Black,et al.  Learning a model of facial shape and expression from 4D scans , 2017, ACM Trans. Graph..

[43]  Leonidas J. Guibas,et al.  SyncSpecCNN: Synchronized Spectral CNN for 3D Shape Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Stefanos Zafeiriou,et al.  Large Scale 3D Morphable Models , 2017, International Journal of Computer Vision.

[45]  Federico Tombari,et al.  Learning to Detect Good 3D Keypoints , 2017, International Journal of Computer Vision.

[46]  Tal Hassner,et al.  Extreme 3D Face Reconstruction: Looking Past Occlusions , 2017, ArXiv.

[47]  A. Ponniah,et al.  Large Scale 3D Morphable Models , 2017, International Journal of Computer Vision.

[48]  Jonathan Masci,et al.  Geometric Deep Learning on Graphs and Manifolds Using Mixture Model CNNs , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Ersin Yumer,et al.  Convolutional neural networks on surfaces via seamless toric covers , 2017, ACM Trans. Graph..

[50]  Edmond Boyer,et al.  Multilinear Autoencoder for 3D Face Model Learning , 2018, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[51]  Alexander M. Bronstein,et al.  Deformable Shape Completion with Graph Convolutional Autoencoders , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[52]  Tal Hassner,et al.  Extreme 3D Face Reconstruction: Seeing Through Occlusions , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[53]  U. Feige,et al.  Spectral Graph Theory , 2015 .

[54]  Justus Thies,et al.  Face2Face: real-time face capture and reenactment of RGB videos , 2019, Commun. ACM.