Decoupling sparse coding of SIFT descriptors for large-scale visual recognition

In recent years, sparse coding has drawn considerable research attention in developing feature representations for visual recognition problems. In this paper, we devise sparse coding algorithms to learn a dictionary of basis functions from Scale- Invariant Feature Transform (SIFT) descriptors extracted from images. The learned dictionary is used to code SIFT-based inputs for the feature representation that is further pooled via spatial pyramid matching kernels and fed into a Support Vector Machine (SVM) for object classification on the large-scale ImageNet dataset. We investigate the advantage of SIFT-based sparse coding approach by combining different dictionary learning and sparse representation algorithms. Our results also include favorable performance on different subsets of the ImageNet database.

[1]  Y. C. Pati,et al.  Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition , 1993, Proceedings of 27th Asilomar Conference on Signals, Systems and Computers.

[2]  Rajat Raina,et al.  Efficient sparse coding algorithms , 2006, NIPS.

[3]  Andrew Y. Ng,et al.  The Importance of Encoding Versus Training with Sparse Coding and Vector Quantization , 2011, ICML.

[4]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[5]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[6]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Jean Ponce,et al.  Learning mid-level features for recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[8]  M. R. Osborne,et al.  A new approach to variable selection in least squares problems , 2000 .

[9]  Terrence J. Sejnowski,et al.  Learning Overcomplete Representations , 2000, Neural Computation.

[10]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[11]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[12]  Wentao Huang,et al.  Learning sparse representation via a nonlinear shrinkage encoder and a linear sparse decoder , 2012, The 2012 International Joint Conference on Neural Networks (IJCNN).

[13]  Bhaskar D. Rao,et al.  Sparse signal reconstruction from limited data using FOCUSS: a re-weighted minimum norm algorithm , 1997, IEEE Trans. Signal Process..

[14]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[15]  Stéphane Mallat,et al.  Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..

[16]  Chih-Jen Lin,et al.  Probability Estimates for Multi-class Classification by Pairwise Coupling , 2003, J. Mach. Learn. Res..

[17]  David J. Field,et al.  Sparse coding with an overcomplete basis set: A strategy employed by V1? , 1997, Vision Research.

[18]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[19]  A. Bruckstein,et al.  K-SVD : An Algorithm for Designing of Overcomplete Dictionaries for Sparse Representation , 2005 .

[20]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[21]  Guillermo Sapiro,et al.  Online Learning for Matrix Factorization and Sparse Coding , 2009, J. Mach. Learn. Res..

[22]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[23]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[24]  M. Elad,et al.  $rm K$-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation , 2006, IEEE Transactions on Signal Processing.

[25]  Wentao Huang,et al.  Hierarchical discriminative sparse coding via bidirectional connections , 2011, The 2011 International Joint Conference on Neural Networks.

[26]  Pietro Perona,et al.  Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[27]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[28]  Kjersti Engan,et al.  Method of optimal directions for frame design , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[29]  Neill W Campbell,et al.  IEEE International Conference on Computer Vision and Pattern Recognition , 2008 .

[30]  Y. LeCun,et al.  Learning methods for generic object recognition with invariance to pose and lighting , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[31]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.