Learning to Recognize Objects with Little Supervision

Abstract This paper shows (i) improvements over state-of-the-art local feature recognition systems, (ii) how to formulate principled models for automatic local feature selection in object class recognition when there is little supervised data, and (iii) how to formulate sensible spatial image context models using a conditional random field for integrating local features and segmentation cues (superpixels). By adopting sparse kernel methods, Bayesian learning techniques and data association with constraints, the proposed model identifies the most relevant sets of local features for recognizing object classes, achieves performance comparable to the fully supervised setting, and obtains excellent results for image classification.

[1]  A. Zellner An Introduction to Bayesian Inference in Econometrics , 1971 .

[2]  J. Besag Spatial Interaction and the Statistical Analysis of Lattice Systems , 1974 .

[3]  D. McFadden A Method of Simulated Moments for Estimation of Discrete Response Models Without Numerical Integration , 1989 .

[4]  Thomas G. Dietterich,et al.  In Advances in Neural Information Processing Systems 12 , 1991, NIPS 1991.

[5]  Christian P. Robert,et al.  The Bayesian choice , 1994 .

[6]  Jun S. Liu,et al.  Covariance structure of the Gibbs sampler with applications to the comparisons of estimators and augmentation schemes , 1994 .

[7]  C. Robert Simulation of truncated normal variables , 2009, 0907.4010.

[8]  S. Chib,et al.  Understanding the Metropolis-Hastings Algorithm , 1995 .

[9]  Thomas G. Dietterich,et al.  Solving the Multiple Instance Problem with Axis-Parallel Rectangles , 1997, Artif. Intell..

[10]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[11]  Jun S. Liu,et al.  Parameter Expansion for Data Augmentation , 1999 .

[12]  C. Robert,et al.  Computational and Inferential Difficulties with Mixture Posterior Distributions , 2000 .

[13]  Robert Kohn,et al.  Nonparametric regression using linear combinations of basis functions , 2001, Stat. Comput..

[14]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[15]  George Eastman House,et al.  Sparse Bayesian Learning and the Relevance Vector Machine , 2001 .

[16]  C. Schmid,et al.  Indexing based on scale invariant interest points , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[17]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[18]  Stuart J. Russell,et al.  Identity Uncertainty and Citation Matching , 2002, NIPS.

[19]  Thomas Hofmann,et al.  Multiple instance learning with generalized support vector machines , 2002, AAAI/IAAI.

[20]  David A. Forsyth,et al.  Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary , 2002, ECCV.

[21]  Kotagiri Ramamohanarao,et al.  Sparse Bayesian Learning for Regression and Classification using Markov Chain Monte Carlo , 2002, ICML.

[22]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[23]  Antonio Torralba,et al.  Context-based vision system for place and object recognition , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[24]  Pietro Perona,et al.  Object class recognition by unsupervised scale-invariant learning , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[25]  Cordelia Schmid,et al.  Selection of scale-invariant parts for object class recognition , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[26]  Jitendra Malik,et al.  Learning a classification model for segmentation , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[27]  Nando de Freitas,et al.  Bayesian Feature Weighting for Unsupervised Learning, with Application to Object Recognition , 2003, AISTATS.

[28]  Peter Auer,et al.  Weak Hypotheses and Boosting for Generic Object Detection and Recognition , 2004, ECCV.

[29]  Michael Brady,et al.  Saliency, Scale and Image Description , 2001, International Journal of Computer Vision.

[30]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[31]  C. Schmid,et al.  Bayesian learning for weakly supervised object classification , 2004 .

[32]  Nando de Freitas,et al.  An Introduction to MCMC for Machine Learning , 2004, Machine Learning.

[33]  Yee Whye Teh,et al.  Names and faces in the news , 2004, CVPR 2004.

[34]  Nando de Freitas,et al.  From Fields to Trees , 2004, UAI.

[35]  Nando de Freitas,et al.  A Statistical Model for General Contextual Object Recognition , 2004, ECCV.

[36]  Christian P. Robert,et al.  Monte Carlo Statistical Methods , 2005, Springer Texts in Statistics.

[37]  Trevor Darrell,et al.  Conditional Random Fields for Object Recognition , 2004, NIPS.

[38]  Tony Lindeberg,et al.  Feature Detection with Automatic Scale Selection , 1998, International Journal of Computer Vision.

[39]  Dan Roth,et al.  Learning to detect objects in images via a sparse, part-based representation , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[41]  Nando de Freitas,et al.  A Constrained Semi-supervised Learning Approach to Data Association , 2004, ECCV.

[42]  Lixin Fan,et al.  Categorizing Nine Visual Classes using Local Appearance Descriptors , 2004 .

[43]  Cordelia Schmid,et al.  Human Detection Based on a Probabilistic Assembly of Robust Part Detectors , 2004, ECCV.

[44]  Thomas Serre,et al.  Object recognition with features inspired by visual cortex , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[45]  Bernt Schiele,et al.  Pedestrian detection in crowded scenes , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[46]  Christian P. Robert,et al.  Monte Carlo Statistical Methods (Springer Texts in Statistics) , 2005 .

[47]  Cordelia Schmid,et al.  A sparse texture representation using local affine regions , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[48]  Nando de Freitas,et al.  Learning about Individuals from Group Statistics , 2005, UAI.

[49]  Hermann Ney,et al.  Discriminative training for object recognition using image patches , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[50]  Alexei A. Efros,et al.  Discovering objects and their location in images , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[51]  Cordelia Schmid,et al.  A Performance Evaluation of Local Descriptors , 2005, IEEE Trans. Pattern Anal. Mach. Intell..

[52]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[53]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[54]  Luc Van Gool,et al.  The 2005 PASCAL Visual Object Classes Challenge , 2005, MLCW.

[55]  Martial Hebert,et al.  Discriminative Random Fields , 2006, International Journal of Computer Vision.

[56]  Cordelia Schmid,et al.  Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[57]  M. Botje Introduction to Bayesian Inference , 2011 .

[58]  Jamie Shotton,et al.  The Layout Consistent Random Field for Recognizing and Segmenting Partially Occluded Objects , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).