A Semantic Distance Based Nearest Neighbor Method for Image Annotation

Most of the Nearest Neighbor (NN) based image annotation (or classification) methods cannot achieve satisfactory performance, due to the fact that information loss is inevitable when extracting visual features from images, such as constructing bag of visual words based features. In this paper, we propose a novel NN method based on semantic distance, which improves classification performance by compensating for the information loss and minimizing the semantic gap between intra-class variations and inter-class similarities . We first deal with distance metric using image semantic information. Then we construct NN-based classifier which utilizes the distance metric to compute the similarity between any two images. E xperimental results based on image annotation task of ImageCLEF2012 show that the proposed method outperforms the traditional classifiers. More importantly, our method is extremely simple, efficient, and competitive in comparison with the state of the art learning-based image classifiers .

[1]  Paul Over,et al.  Evaluation campaigns and TRECVid , 2006, MIR '06.

[2]  R. Manmatha,et al.  Multiple Bernoulli relevance models for image and video annotation , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[3]  Frédéric Jurie,et al.  Creating efficient codebooks for visual recognition , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[4]  Andrew Zisserman,et al.  Image Classification using Random Forests and Ferns , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[5]  Elad Hazan,et al.  Logarithmic regret algorithms for online convex optimization , 2006, Machine Learning.

[6]  Eli Shechtman,et al.  In defense of Nearest-Neighbor based image classification , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Bart Thomee,et al.  Overview of the ImageCLEF 2012 Flickr Photo Annotation and Retrieval Task , 2012, CLEF.

[8]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[10]  Zhihong Zeng,et al.  A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Changhu Wang,et al.  Learning to reduce the semantic gap in web image retrieval and annotation , 2008, SIGIR '08.

[12]  Jitendra Malik,et al.  SVM-KNN: Discriminative Nearest Neighbor Classification for Visual Category Recognition , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[13]  Bart Thomee,et al.  New trends and ideas in visual concept detection: the MIR flickr retrieval evaluation initiative , 2010, MIR '10.

[14]  James M. Rehg,et al.  Beyond the Euclidean distance: Creating effective visual codebooks using the Histogram Intersection Kernel , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[15]  Hermann Ney,et al.  Features for image retrieval: an experimental comparison , 2008, Information Retrieval.

[16]  Gang Wang,et al.  Using Dependent Regions for Object Categorization in a Generative Framework , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[17]  Thomas Deselaers,et al.  Features for Image Retrieval , 2003 .

[18]  Rong Jin,et al.  Correlated Label Propagation with Application to Multi-label Learning , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[19]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[20]  Yiannis S. Boutalis,et al.  FCTH: Fuzzy Color and Texture Histogram - A Low Level Feature for Accurate Image Retrieval , 2008, 2008 Ninth International Workshop on Image Analysis for Multimedia Interactive Services.

[21]  Marcus Jerome Pickering,et al.  A comparative study of evidence combination strategies , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[22]  R. Manmatha,et al.  Automatic image annotation and retrieval using cross-media relevance models , 2003, SIGIR.

[23]  Cordelia Schmid,et al.  Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[24]  Trevor Darrell,et al.  Beyond spatial pyramids: Receptive field learning for pooled image features , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Manik Varma,et al.  Learning The Discriminative Power-Invariance Trade-Off , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[26]  Cor J. Veenman,et al.  Visual Word Ambiguity , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[28]  Gustavo Carneiro,et al.  Supervised Learning of Semantic Classes for Image Annotation and Retrieval , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Xiongfei Li,et al.  Image Annotation Refinement Using Dynamic Weighted Voting Based on Mutual Information , 2011, J. Softw..