A discriminatively trained, multiscale, deformable part model

This paper describes a discriminatively trained, multiscale, deformable part model for object detection. Our system achieves a two-fold improvement in average precision over the best performance in the 2006 PASCAL person detection challenge. It also outperforms the best results in the 2007 challenge in ten out of twenty categories. The system relies heavily on deformable parts. While deformable part models have become quite popular, their value had not been demonstrated on difficult benchmarks such as the PASCAL challenge. Our system also relies heavily on new methods for discriminative training. We combine a margin-sensitive approach for data mining hard negative examples with a formalism we call latent SVM. A latent SVM, like a hidden CRF, leads to a non-convex training problem. However, a latent SVM is semi-convex and the training problem becomes convex once latent information is specified for the positive examples. We believe that our training methods will eventually make possible the effective use of more latent information such as hierarchical (grammar) models and models involving latent three dimensional pose.

[1]  Martin A. Fischler,et al.  The Representation and Matching of Pictorial Structures , 1973, IEEE Transactions on Computers.

[2]  Pietro Perona,et al.  A Probabilistic Approach to Object Recognition Using Local Photometry and Global Geometry , 1998, ECCV.

[3]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[4]  B. Schölkopf,et al.  Advances in kernel methods: support vector learning , 1999 .

[5]  Peter Russer,et al.  Electromagnetic field representations and computations in complex structures I: complexity architecture and generalized network formulation , 2002 .

[6]  Pietro Perona,et al.  Object class recognition by unsupervised scale-invariant learning , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[7]  David A. Forsyth,et al.  Probabilistic Methods for Finding People , 2001, International Journal of Computer Vision.

[8]  Takeo Kanade,et al.  Object Detection Using the Statistics of Parts , 2004, International Journal of Computer Vision.

[9]  Daniel P. Huttenlocher,et al.  Pictorial Structures for Object Recognition , 2004, International Journal of Computer Vision.

[10]  Pietro Perona,et al.  A discriminative framework for modelling object classes , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[11]  Daniel P. Huttenlocher,et al.  Spatial priors for part-based recognition using statistical models , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[12]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[13]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[14]  Stuart Geman,et al.  Context and Hierarchy in a Probabilistic Image Model , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[15]  Cristian Sminchisescu,et al.  Training Deformable Models for Localization , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[16]  Luc Van Gool,et al.  The 2005 PASCAL Visual Object Classes Challenge , 2005, MLCW.

[17]  Fu Jie Huang,et al.  A Tutorial on Energy-Based Learning , 2006 .

[18]  Daniel P. Huttenlocher,et al.  Weakly Supervised Learning of Part-Based Spatial Models for Visual Object Recognition , 2006, ECCV.

[19]  Cordelia Schmid,et al.  Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[20]  Yali Amit,et al.  POP: Patchwork of Parts Models for Object Recognition , 2007, International Journal of Computer Vision.

[21]  Deva Ramanan,et al.  Using Segmentation to Verify Object Hypotheses , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Gökhan BakIr,et al.  Predicting Structured Data , 2008 .

[23]  David A. McAllester,et al.  The Generalized A* Architecture , 2007, J. Artif. Intell. Res..

[24]  Shimon Ullman,et al.  Semantic Hierarchies for Recognizing Objects and Parts , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Trevor Darrell,et al.  Hidden Conditional Random Fields , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  E. Savaş On generalized A , 2010 .