Generating Facial Expressions with Deep Belief Nets

Realistic facial expression animation requires a powerful “animator” (or graphics program) that can represent the kinds of variations in facial appearance that are both possible and likely to occur in a given context. If the goal is fully determined as in character animation for film, knowledge can be provided in the form of human higher-level descriptions. However, for generating facial expressions for interactive interfaces, such as animated avatars, correct expressions for a given context must be generated on the fly. A simple solution is to rely on a set of prototypical expressions or basis shapes that are linearly combined to create every facial expression in an animated sequence (Kleiser, 1989; Parke, 1972). An innovative algorithm for fitting basis shapes to images was proposed by Blanz and Vetter (1999). The main problem with the basis shape approach is that the full range of appearance variation required for convincing expressive behavior is far beyond the capacity of what a small set of basis shapes can accommodate. Moreover, even if many expression components are used to create a repertoire of basis shapes (Joshi et al., 2007; Lewis et al., 2000), the interface may need to render different identities or mixtures of facial expressions not captured by the learned basis shapes. A representation of facial appearance for animation must be powerful enough to capture the right constraints for realistic expression generation yet flexible enough to accommodate different identities and behaviors. Besides the obvious utility of such a representation to animated facial interfaces, a good model of facial expression generation would be useful for computer vision tasks because the model’s representation would likely be much richer and more informative than the original pixel data. For example, inferring the model’s representation corresponding to a given image might even allow transferring an expression extracted from an image of a face onto a different character as illustrated by the method of expression cloning (Noh & Neumann, 2001). In this chapter we introduce a novel approach to learning to generate facial expressions that uses a deep belief net (Hinton et al., 2006). The model can easily accommodate different constraints on generation. We demonstrate this by restricting it to generate expressions with a given identity and with elementary facial expressions such as “raised eyebrows.” The deep

[1]  Frederick I. Parke,et al.  Computer generated animation of faces , 1972, ACM Annual Conference.

[2]  F. I. Parke June,et al.  Computer Generated Animation of Faces , 1972 .

[3]  P. Ekman Pictures of Facial Affect , 1976 .

[4]  P. Ekman,et al.  Handbook of methods in nonverbal behavior research , 1982 .

[5]  E. Ziegel,et al.  Artificial intelligence and statistics , 1986 .

[6]  M. Turk,et al.  Eigenfaces for Recognition , 1991, Journal of Cognitive Neuroscience.

[7]  P. Ekman,et al.  The ability to detect deceit generalizes across different types of high-stake lies. , 1997, Journal of personality and social psychology.

[8]  Timothy F. Cootes,et al.  Active Appearance Models , 1998, ECCV.

[9]  Frederick I. Parke,et al.  Computer gernerated animation of faces , 1998 .

[10]  T. Sejnowski,et al.  Measuring facial expressions by computer image analysis. , 1999, Psychophysiology.

[11]  J. Cohn,et al.  Automated face analysis by feature point tracking has high concurrent validity with manual FACS coding. , 1999, Psychophysiology.

[12]  Thomas Vetter,et al.  A morphable model for the synthesis of 3D faces , 1999, SIGGRAPH.

[13]  Maja Pantic,et al.  Automatic Analysis of Facial Expressions: The State of the Art , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  John P. Lewis,et al.  Pose space deformation: a unified approach to shape interpolation and skeleton-driven deformation , 2000, SIGGRAPH.

[15]  Takeo Kanade,et al.  Comprehensive database for facial expression analysis , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[16]  Andrew J. Calder,et al.  PII: S0042-6989(01)00002-5 , 2001 .

[17]  Paul A. Viola,et al.  Robust Real-time Object Detection , 2001 .

[18]  Jun-yong Noh,et al.  Expression cloning , 2001, SIGGRAPH 2001.

[19]  G. Cottrell,et al.  EMPATH: A Neural Network that Categorizes Facial Expressions , 2002, Journal of Cognitive Neuroscience.

[20]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[21]  Javier R. Movellan,et al.  Towards Automatic Recognition of Spontaneous Facial Actions , 2003 .

[22]  Frédéric H. Pighin,et al.  Learning controls for blend shape based realistic facial animation , 2003, SIGGRAPH '03.

[23]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[24]  B. Abboud,et al.  Bilinear factorisation for facial expression analysis and synthesis , 2005 .

[25]  Frédéric H. Pighin,et al.  Learning controls for blend shape based realistic facial animation , 2005, SIGGRAPH Courses.

[26]  James H. Elder,et al.  Creating invariance to "nuisance parameters" in face recognition , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[27]  L. J. M. Rothkrantz,et al.  Parametric Generation of Facial Expressions Based on FACS , 2005, Comput. Graph. Forum.

[28]  Ian R. Fasel,et al.  A generative framework for real time object detection and classification , 2005, Comput. Vis. Image Underst..

[29]  P. Ekman,et al.  What the face reveals : basic and applied studies of spontaneous expression using the facial action coding system (FACS) , 2005 .

[30]  Miguel Á. Carreira-Perpiñán,et al.  On Contrastive Divergence Learning , 2005, AISTATS.

[31]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[32]  Parag Havaldar Sony Pictures Imageworks , 2006, SIGGRAPH Courses.

[33]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[34]  J. Cohn,et al.  Movement Differences between Deliberate and Spontaneous Facial Expressions: Zygomaticus Major Action in Smiling , 2006, Journal of nonverbal behavior.

[35]  Gwen Littlewort,et al.  Automatic Recognition of Facial Actions in Spontaneous Expressions , 2006, J. Multim..

[36]  J. Movellan,et al.  Human and computer recognition of facial expressions of emotion , 2007, Neuropsychologia.

[37]  Geoffrey E. Hinton Learning multiple layers of representation , 2007, Trends in Cognitive Sciences.

[38]  Christopher J. Fox,et al.  What is adapted in face adaptation? The neural representations of expression in the human visual system , 2007, Brain Research.

[39]  Thomas Hofmann,et al.  Greedy Layer-Wise Training of Deep Networks , 2007 .

[40]  Geoffrey E. Hinton,et al.  To recognize shapes, first learn to generate images. , 2007, Progress in brain research.

[41]  John F. Kalaska,et al.  Computational neuroscience : theoretical insights into brain function , 2007 .

[42]  Geoffrey E. Hinton,et al.  Modeling image patches with a directed hierarchy of Markov random fields , 2007, NIPS.

[43]  Geoffrey E. Hinton,et al.  Using Deep Belief Nets to Learn Covariance Kernels for Gaussian Processes , 2007, NIPS.

[44]  A. Anderson,et al.  Examinations of identity invariance in facial expression adaptation , 2008, Cognitive, affective & behavioral neuroscience.

[45]  Antonio Torralba,et al.  Small codes and large image databases for recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[46]  J. Cohn,et al.  Measuring Facial Action , 2008 .