论文信息 - Fast Multi-view Face Detection

Fast Multi-view Face Detection

Abstract This paper extends the face detection framework proposedby Viola and Jones 2001 to handle proﬁle views and rotatedfaces. As in the work of Rowley et al 1998. and Schneider-man et al. 2000, we build different detectors for differentviews of the face. A decision tree is then trained to deter-mine the viewpoint class (such as right proﬁle or rotated60 degrees) for a given window of the image being exam-ined. This is similar to the approach of Rowley et al. 1998.The appropriate detector for that viewpoint can then be runinstead of running all detectors on all windows. This tech-niqueyields goodresults and maintainsthe speed advantageof the Viola-Jones detector. 1. Introduction There are a number of techniques that can successfullydetect frontal upright faces in a wide variety of images[11, 7, 10, 12, 3, 6]. While the deﬁnition of “frontal” and“upright”mayvaryfromsystem to system, the reality is thatmany natural images contain rotated or proﬁle faces thatare not reliably detected. There are a small number of sys-tems which explicitly address non-frontal, or non-uprightface detection [8, 10, 2]. This paper describes progress to-ward a system which can detect faces regardless of posereliably and in real-time.This paperextendsthe frameworkproposedby Viola andJones [12]. This approach is selected because of its compu-tational efﬁciency and simplicity.One observation which is shared among all previous re-lated work is that a multi-view detector must be carefullyconstructed by combining a collection of detectors eachtrained for a single viewpoint. It appears that a monolithicapproach, where a single classiﬁer is trained to detect allposes of a face, is unlearnable with existing classiﬁers. Ourinformal experiments lend support to this conclusion, sincea classiﬁer trained on all poses appears to be hopelessly in-accurate.This paper addresses two types of pose variation: non-frontal faces, which are rotated out of the image plane, andnon-upright faces, which are rotated in the image plane.In both cases the multi-view detector presented in this pa-per is a combination of Viola-Jones detectors, each detectortrained on face data taken from a single viewpoint.Reliable non-upright face detection was ﬁrst presentedin a paper by Rowley, Baluja and Kanade [8]. They traintwo neural network classiﬁers. The ﬁrst estimates the poseof a face in the detection window. The second is a conven-tional face detector. Faces are detected in three steps: foreach image window the pose of “face” is ﬁrst estimated; thepose estimate is then used to de-rotate the image window;the window is then classiﬁed by the second detector. Fornon-face windows, the poses estimate must be consideredrandom. Nevertheless, a rotated non-faceshouldbe rejectedby the conventional detector. One potential ﬂaw of such asystem is that the ﬁnal detection rate is roughly the productof the correct classiﬁcation rates of the two classiﬁers (sincethe errors of the two classiﬁers are somewhat independent).One could adopt the Rowley et al. three step approachwhile replacingthe classiﬁers with those of Viola andJones.The ﬁnal system would be more efﬁcient, but not signiﬁ-cantly. Classiﬁcation by the Viola-Jones system is so efﬁ-cient, that derotation would dominate the computational ex-pense. In principle derotation is not strictly necessary sinceit should be possible to construct a detector for rotated facesdirectly. Detection becomes a two stage process. First thepose of the window is estimated and then one ofrotationspeciﬁc detectors is called upon to classify the window.In this paper detection of non-upright faces is handledusing the two stage approach. In the ﬁrst stage the pose ofeach window is estimated using a decision tree constructedusing features like those described by Viola and Jones. Inthe second stage one ofpose speciﬁc Viola-Jones dete-tectors are used to classify the window.Oncepose speciﬁc detectors are trained and available,an alternative detection process can be tested as well. In thiscase alldetectors are evaluated and the union of their de-

Paul A. Viola | Michael J. Jones | Paul Viola

[1] J. Ross Quinlan,et al. C4.5: Programs for Machine Learning , 1992 .

[2] Yoav Freund,et al. A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[3] Federico Girosi,et al. Training support vector machines: an application to face detection , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[4] Yoram Singer,et al. Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[5] Takeo Kanade,et al. Rotation invariant neural network-based face detection , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[6] Takeo Kanade,et al. Neural Network-Based Face Detection , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[7] Tomaso A. Poggio,et al. A general framework for object detection , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[8] Yoram Singer,et al. Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[9] Takeo Kanade,et al. A statistical method for 3D object detection applied to faces and cars , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[10] Paul A. Viola,et al. Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[11] Harry Shum,et al. Statistical Learning of Multi-view Face Detection , 2002, ECCV.