Active Structured Learning for High-Speed Object Detection

High-speed smooth and accurate visual tracking of objects in arbitrary, unstructured environments is essential for robotics and human motion analysis. However, building a system that can adapt to arbitrary objects and a wide range of lighting conditions is a challenging problem, especially if hard real-time constraints apply like in robotics scenarios. In this work, we introduce a method for learning a discriminative object tracking system based on the recent structured regression framework for object localization. Using a kernel function that allows fast evaluation on the GPU, the resulting system can process video streams at speed of 100 frames per second or more. Consecutive frames in high speed video sequences are typically very redundant, and for training an object detection system, it is sufficient to have training labels from only a subset of all images. We propose an active learning method that select training examples in a data-driven way, thereby minimizing the required number of training labeling. Experiments on realistic data show that the active learning is superior to previously used methods for dataset subsampling for this task.

[1]  G. Kitagawa Non-Gaussian State—Space Modeling of Nonstationary Time Series , 1987 .

[2]  F. O'sullivan Non-Gaussian State-Space Modeling of Nonstationary Time Series: Comment , 1987 .

[3]  G. Giannakis,et al.  Object detection and classification using matched filtering and higher-order statistics , 1989, Sixth Multidimensional Signal Processing Workshop,.

[4]  Georgios B. Giannakis,et al.  Signal detection and classification using matched filtering and higher order statistics , 1989, IEEE Trans. Acoust. Speech Signal Process..

[5]  David A. Cohn,et al.  Active Learning with Statistical Models , 1996, NIPS.

[6]  Takeo Kanade,et al.  Neural Network-Based Face Detection , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Gregory D. Hager,et al.  Efficient Region Tracking With Parametric Models of Geometry and Illumination , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[9]  T. Başar,et al.  A New Approach to Linear Filtering and Prediction Problems , 2001 .

[10]  Tieniu Tan,et al.  A survey on visual surveillance of object motion and behaviors , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[11]  Joachim Denzler,et al.  Efficient Combination of Histograms for Real-Time Tracking Using Mean-Shift and Trust-Region Optimization , 2005, DAGM-Symposium.

[12]  Thomas Hofmann,et al.  Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[13]  Horst Bischof,et al.  On-line Boosting and Vision , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[14]  Thorsten Joachims,et al.  Training linear SVMs in linear time , 2006, KDD '06.

[15]  D. Roth,et al.  Active Learning with Perceptron for Structured Output , 2006 .

[16]  Axel Pinz,et al.  Computer Vision – ECCV 2006 , 2006, Lecture Notes in Computer Science.

[17]  M. Shah,et al.  Object tracking: A survey , 2006, CSUR.

[18]  Antonio Criminisi,et al.  TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-class Object Recognition and Segmentation , 2006, ECCV.

[19]  Bernt Schiele,et al.  Robust Object Detection with Interleaved Categorization and Segmentation , 2008, International Journal of Computer Vision.

[20]  Andrew J. Davison,et al.  Active Matching , 2008, ECCV.

[21]  Daniel P. Huttenlocher,et al.  Learning for stereo vision using the structured support vector machine , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Christoph H. Lampert,et al.  Learning to Localize Objects with Structured Output Regression , 2008, ECCV.

[23]  Derek Hoiem,et al.  Learning CRFs Using Graph Cuts , 2008, ECCV.

[24]  Hans Burkhardt,et al.  Equivariant Holomorphic Filters for Contour Denoising and Rapid Object Detection , 2008, IEEE Transactions on Image Processing.

[25]  Thorsten Joachims,et al.  Cutting-plane training of structural SVMs , 2009, Machine Learning.