Discriminative Unsupervised Feature Learning with Exemplar Convolutional Neural Networks

Deep convolutional networks have proven to be very successful in learning task specific features that allow for unprecedented performance on various computer vision tasks. Training of such networks follows mostly the supervised learning paradigm, where sufficiently many input-output pairs are required for training. Acquisition of large training sets is one of the key challenges, when approaching a new task. In this paper, we aim for generic feature learning and present an approach for training a convolutional network using only unlabeled data. To this end, we train the network to discriminate between a set of surrogate classes. Each surrogate class is formed by applying a variety of transformations to a randomly sampled `seed' image patch. In contrast to supervised network training, the resulting feature representation is not class specific. It rather provides robustness to the transformations that have been applied during training. This generic feature representation allows for classification results that outperform the state of the art for unsupervised learning on several popular datasets (STL-10, CIFAR-10, Caltech-101, Caltech-256). While features learned with our approach cannot compete with class specific features from supervised training on a classification task, we show that they are advantageous on geometric matching problems, where they also outperform the SIFT descriptor.

[1]  Yann LeCun,et al.  Tangent Prop - A Formalism for Specifying Selected Invariances in an Adaptive Network , 1991, NIPS.

[2]  Shenghuo Zhu,et al.  Deep Learning of Invariant Features via Simulated Fixations in Video , 2012, NIPS.

[3]  Yann LeCun,et al.  What is the best multi-stage architecture for object recognition? , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[4]  Jitendra Malik,et al.  Hypercolumns for object segmentation and fine-grained localization , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Massih-Reza Amini,et al.  Semi Supervised Logistic Regression , 2002, ECAI.

[6]  Andrew Y. Ng,et al.  Selecting Receptive Fields in Deep Networks , 2011, NIPS.

[7]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Honglak Lee,et al.  Learning Invariant Representations with Local Transformations , 2012, ICML.

[9]  Cordelia Schmid,et al.  A performance evaluation of local descriptors , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Nicolas Le Roux,et al.  Ask the locals: Multi-way local pooling for image recognition , 2011, 2011 International Conference on Computer Vision.

[11]  Dieter Fox,et al.  Unsupervised Feature Learning for RGB-D Based Object Recognition , 2012, ISER.

[12]  Jiri Matas,et al.  Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[13]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[14]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Pascal Vincent,et al.  A Generative Process for Contractive Auto-Encoders , 2012, ICML.

[16]  Dieter Fox,et al.  Multipath Sparse Coding Using Hierarchical Matching Pursuit , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Pascal Vincent,et al.  The Manifold Tangent Classifier , 2011, NIPS.

[18]  Zhenghao Chen,et al.  On Random Weights and Unsupervised Feature Learning , 2011, ICML.

[19]  Xiang Zhang,et al.  OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[20]  Thomas Brox,et al.  Descriptor Matching with Convolutional Neural Networks: a Comparison to SIFT , 2014, ArXiv.

[21]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[22]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[23]  Harris Drucker,et al.  Improving generalization performance using double backpropagation , 1992, IEEE Trans. Neural Networks.

[24]  Pietro Perona,et al.  Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[25]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[26]  Ka Yu Hui,et al.  Direct Modeling of Complex Invariances for Visual Object Features , 2013, ICML.

[27]  Jasper Snoek,et al.  Multi-Task Bayesian Optimization , 2013, NIPS.

[28]  Alexei A. Efros,et al.  Unsupervised Discovery of Mid-Level Discriminative Patches , 2012, ECCV.

[29]  Yoshua Bengio,et al.  Entropy Regularization , 2006, Semi-Supervised Learning.

[30]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[31]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[32]  G. Griffin,et al.  Caltech-256 Object Category Dataset , 2007 .

[33]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[34]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[35]  Yihong Gong,et al.  Training Hierarchical Feed-Forward Visual Recognition Models Using Transfer Learning from Pseudo-Tasks , 2008, ECCV.

[36]  Cordelia Schmid,et al.  A Comparison of Affine Region Detectors , 2005, International Journal of Computer Vision.

[37]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[39]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[40]  Zhuowen Tu,et al.  Deeply-Supervised Nets , 2014, AISTATS.

[41]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[42]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[43]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[44]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[45]  Sida I. Wang,et al.  Dropout Training as Adaptive Regularization , 2013, NIPS.

[46]  Gang Wang,et al.  Learning Discriminative and Shareable Features for Scene Classification , 2014, ECCV.

[47]  Honglak Lee,et al.  An Analysis of Single-Layer Networks in Unsupervised Feature Learning , 2011, AISTATS.

[48]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[49]  Y-Lan Boureau,et al.  Learning Convolutional Feature Hierarchies for Visual Recognition , 2010, NIPS.