Learning-based approach to real time tracking and analysis of faces

This paper describes a trainable system capable of tracking faces and facial features like eyes and nostrils and estimating basic mouth features such as degrees of openness and smile in real time. In developing this system, we have addressed the twin issues of image representation and algorithms for learning. We have used the invariance properties of image representations based on Haar wavelets to robustly capture various facial features. Similarly, unlike previous approaches this system is entirely trained using examples and does not rely on a priori (hand-crafted) models of facial features based on an optical flow or facial musculature. The system works in several stages that begin with face detection, followed by localization of facial features and estimation of mouth parameters. Each of these stages is formulated as a problem in supervised learning from examples. We apply the new and robust technique of support vector machines (SVM) for classification in the stage of skin segmentation, face detection and eye detection. Estimation of mouth parameters is modeled as a regression from a sparse subset of coefficients (basis functions) of an overcomplete dictionary of Haar wavelets.

[1]  Stéphane Mallat,et al.  A Theory for Multiresolution Signal Decomposition: The Wavelet Representation , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Tomaso Poggio,et al.  Example Based Image Analysis and Synthesis , 1993 .

[3]  Demetri Terzopoulos,et al.  Analysis and Synthesis of Facial Image Sequences Using Physical and Anatomical Models , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Alex Pentland,et al.  A vision system for observing and extracting facial action parameters , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[5]  David Salesin,et al.  Fast multiresolution image querying , 1995, SIGGRAPH.

[6]  Alexander J. Smola,et al.  Support Vector Method for Function Approximation, Regression Estimation and Signal Processing , 1996, NIPS.

[7]  Tony Ezzat,et al.  Example-based analysis and synthesis for images of human faces , 1996 .

[8]  Tomaso Poggio,et al.  Image Representations for Visual Learning , 1996, Science.

[9]  Fumio Kishino,et al.  Real-time facial expression detection based on frequency domain transform , 1996, Other Conferences.

[10]  Alex Pentland,et al.  LAFTER: lips and face real time tracker , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[11]  Federico Girosi,et al.  Training support vector machines: an application to face detection , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[12]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[13]  Tomaso A. Poggio,et al.  A general framework for object detection , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[14]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[15]  Hsien-Che Lee Colors as Seen by Humans and Machines , 2000 .

[16]  Timothy F. Cootes,et al.  Active Appearance Models , 2001, IEEE Trans. Pattern Anal. Mach. Intell..