Fast Multi-view Face Detection

Abstract This paper extends the face detection framework proposedby Viola and Jones 2001 to handle profile views and rotatedfaces. As in the work of Rowley et al 1998. and Schneider-man et al. 2000, we build different detectors for differentviews of the face. A decision tree is then trained to deter-mine the viewpoint class (such as right profile or rotated60 degrees) for a given window of the image being exam-ined. This is similar to the approach of Rowley et al. 1998.The appropriate detector for that viewpoint can then be runinstead of running all detectors on all windows. This tech-niqueyields goodresults and maintainsthe speed advantageof the Viola-Jones detector. 1. Introduction There are a number of techniques that can successfullydetect frontal upright faces in a wide variety of images[11, 7, 10, 12, 3, 6]. While the definition of “frontal” and“upright”mayvaryfromsystem to system, the reality is thatmany natural images contain rotated or profile faces thatare not reliably detected. There are a small number of sys-tems which explicitly address non-frontal, or non-uprightface detection [8, 10, 2]. This paper describes progress to-ward a system which can detect faces regardless of posereliably and in real-time.This paperextendsthe frameworkproposedby Viola andJones [12]. This approach is selected because of its compu-tational efficiency and simplicity.One observation which is shared among all previous re-lated work is that a multi-view detector must be carefullyconstructed by combining a collection of detectors eachtrained for a single viewpoint. It appears that a monolithicapproach, where a single classifier is trained to detect allposes of a face, is unlearnable with existing classifiers. Ourinformal experiments lend support to this conclusion, sincea classifier trained on all poses appears to be hopelessly in-accurate.This paper addresses two types of pose variation: non-frontal faces, which are rotated out of the image plane, andnon-upright faces, which are rotated in the image plane.In both cases the multi-view detector presented in this pa-per is a combination of Viola-Jones detectors, each detectortrained on face data taken from a single viewpoint.Reliable non-upright face detection was first presentedin a paper by Rowley, Baluja and Kanade [8]. They traintwo neural network classifiers. The first estimates the poseof a face in the detection window. The second is a conven-tional face detector. Faces are detected in three steps: foreach image window the pose of “face” is first estimated; thepose estimate is then used to de-rotate the image window;the window is then classified by the second detector. Fornon-face windows, the poses estimate must be consideredrandom. Nevertheless, a rotated non-faceshouldbe rejectedby the conventional detector. One potential flaw of such asystem is that the final detection rate is roughly the productof the correct classification rates of the two classifiers (sincethe errors of the two classifiers are somewhat independent).One could adopt the Rowley et al. three step approachwhile replacingthe classifiers with those of Viola andJones.The final system would be more efficient, but not signifi-cantly. Classification by the Viola-Jones system is so effi-cient, that derotation would dominate the computational ex-pense. In principle derotation is not strictly necessary sinceit should be possible to construct a detector for rotated facesdirectly. Detection becomes a two stage process. First thepose of the window is estimated and then one ofrotationspecific detectors is called upon to classify the window.In this paper detection of non-upright faces is handledusing the two stage approach. In the first stage the pose ofeach window is estimated using a decision tree constructedusing features like those described by Viola and Jones. Inthe second stage one ofpose specific Viola-Jones dete-tectors are used to classify the window.Oncepose specific detectors are trained and available,an alternative detection process can be tested as well. In thiscase alldetectors are evaluated and the union of their de-

[1]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[2]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[3]  Federico Girosi,et al.  Training support vector machines: an application to face detection , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[4]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[5]  Takeo Kanade,et al.  Rotation invariant neural network-based face detection , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[6]  Takeo Kanade,et al.  Neural Network-Based Face Detection , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Tomaso A. Poggio,et al.  A general framework for object detection , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[8]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[9]  Takeo Kanade,et al.  A statistical method for 3D object detection applied to faces and cars , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[10]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[11]  Harry Shum,et al.  Statistical Learning of Multi-view Face Detection , 2002, ECCV.