Online Unsupervised Learning of Visual Representations and Categories

Real world learning scenarios involve a nonstationary distribution of classes with sequential dependencies among the samples, in contrast to the standard machine learning formulation of drawing samples independently from a fixed, typically uniform distribution. Furthermore, real world interactions demand learning on-the-fly from few or no class labels. In this work, we propose an unsupervised model that simultaneously performs online visual representation learning and few-shot learning of new categories without relying on any class labels. Our model is a prototype-based memory network with a control component that determines when to form a new class prototype. We formulate it as an online Gaussian mixture model, where components are created online with only a single new example, and assignments do not have to be balanced, which permits an approximation to natural imbalanced distributions from uncurated raw data. Learning includes a contrastive loss that encourages different views of the same image to be assigned to the same prototype. The result is a mechanism that forms categorical representations of objects in nonstationary environments. Experiments show that our method can learn from an online stream of visual input data and is significantly better at category recognition compared to state-of-the-art self-supervised learning methods.

[1]  Andrea Vedaldi,et al.  Self-labelling via simultaneous clustering and representation learning , 2020, ICLR.

[2]  Julien Mairal,et al.  Unsupervised Learning of Visual Features by Contrasting Cluster Assignments , 2020, NeurIPS.

[3]  Sergey Levine,et al.  Unsupervised Learning via Meta-Learning , 2018, ICLR.

[4]  Renjie Liao,et al.  Incremental Few-Shot Learning with Attention Attractor Networks , 2018, NeurIPS.

[5]  Chen Change Loy,et al.  Online Deep Clustering for Unsupervised Representation Learning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Patrick Pérez,et al.  Boosting Few-Shot Visual Learning With Self-Supervision , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[7]  Xinlei Chen,et al.  Exploring Simple Siamese Representation Learning , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Christoph H. Lampert,et al.  iCaRL: Incremental Classifier and Representation Learning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Gunhee Kim,et al.  Imbalanced Continual Learning with Partitioning Reservoir Sampling , 2020, ECCV.

[10]  Joshua B. Tenenbaum,et al.  Infinite Mixture Prototypes for Few-Shot Learning , 2019, ICML.

[11]  D. Medin,et al.  SUSTAIN: a network model of category learning. , 2004, Psychological review.

[12]  Christopher Kanan,et al.  REMIND Your Neural Network to Prevent Catastrophic Forgetting , 2020, ECCV.

[13]  Yee Whye Teh,et al.  Continual Unsupervised Representation Learning , 2019, NeurIPS.

[14]  Oriol Vinyals,et al.  Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.

[15]  Kaiming He,et al.  Improved Baselines with Momentum Contrastive Learning , 2020, ArXiv.

[16]  Oriol Vinyals,et al.  Matching Networks for One Shot Learning , 2016, NIPS.

[17]  Laurens van der Maaten,et al.  Self-Supervised Learning of Pretext-Invariant Representations , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[19]  Joshua B. Tenenbaum,et al.  Human-level concept learning through probabilistic program induction , 2015, Science.

[20]  Ryan P. Adams,et al.  Composing graphical models with neural networks for structured representations and fast inference , 2016, NIPS.

[21]  Geoffrey E. Hinton,et al.  A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[22]  Amos J. Storkey,et al.  Assume, Augment and Learn: Unsupervised Few-Shot Meta-Learning via Random Labels and Data Augmentation , 2019, ArXiv.

[23]  Martha White,et al.  Meta-Learning Representations for Continual Learning , 2019, NeurIPS.

[24]  Nikos Komodakis,et al.  Dynamic Few-Shot Visual Learning Without Forgetting , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[25]  Rob Fergus,et al.  Stochastic Video Generation with a Learned Prior , 2018, ICML.

[26]  Brenden M. Lake,et al.  Self-supervised learning through the eyes of a child , 2020, NeurIPS.

[27]  Phillip Isola,et al.  Contrastive Multiview Coding , 2019, ECCV.

[28]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[29]  Michael L. Mack,et al.  Dynamic updating of hippocampal object representations reflects new conceptual knowledge , 2016, Proceedings of the National Academy of Sciences.

[30]  Nathan D. Cahill,et al.  Memory Efficient Experience Replay for Streaming Learning , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[31]  Ladislau Bölöni,et al.  Unsupervised Meta-Learning for Few-Shot Image Classification , 2019, NeurIPS.

[32]  Jiawei He,et al.  Probabilistic Video Generation using Holistic Attribute Control , 2018, ECCV.

[33]  Yuwen Xiong,et al.  Self-Supervised Representation Learning from Flow Equivariance , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[34]  Matthias Grossglauser,et al.  Self-Supervised Prototypical Transfer Learning for Few-Shot Classification , 2020, ICML 2020.

[35]  Nitish Srivastava Unsupervised Learning of Visual Representations using Videos , 2015 .

[36]  Hongbin Wang,et al.  Highly efficient incremental estimation of Gaussian mixture models for online data stream clustering , 2005, SPIE Defense + Commercial Sensing.

[37]  Richard S. Zemel,et al.  Wandering Within a World: Online Contextualized Few-Shot Learning , 2021, ICLR.

[38]  James L. McClelland,et al.  Modeling Unsupervised Perceptual Category Learning , 2008, IEEE Transactions on Autonomous Mental Development.

[39]  Asim Kadav,et al.  S3VAE: Self-Supervised Sequential VAE for Representation Disentanglement and Data Generation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Il-Chul Moon,et al.  Dirichlet Variational Autoencoder , 2019, Pattern Recognit..

[41]  Hugo Larochelle,et al.  Meta-Dataset: A Dataset of Datasets for Learning to Learn from Few Examples , 2019, ICLR.

[42]  Trevor Darrell,et al.  Learning Features by Watching Objects Move , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Matthijs Douze,et al.  Deep Clustering for Unsupervised Learning of Visual Features , 2018, ECCV.

[44]  Joshua B. Tenenbaum,et al.  Meta-Learning for Semi-Supervised Few-Shot Classification , 2018, ICLR.

[45]  Marco Cuturi,et al.  Sinkhorn Distances: Lightspeed Computation of Optimal Transport , 2013, NIPS.

[46]  Alexandre Lacoste,et al.  TADAM: Task dependent adaptive metric for improved few-shot learning , 2018, NeurIPS.

[47]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[48]  Hugo Larochelle,et al.  Centroid Networks for Few-Shot Clustering and Unsupervised Few-Shot Classification , 2019, ArXiv.

[49]  Michal Valko,et al.  Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning , 2020, NeurIPS.

[50]  Junnan Li,et al.  Prototypical Contrastive Learning of Unsupervised Representations , 2020, ICLR.

[51]  Paulo Martins Engel,et al.  A Fast Incremental Gaussian Mixture Model , 2015, PloS one.

[52]  Matthias Nießner,et al.  Matterport3D: Learning from RGB-D Data in Indoor Environments , 2017, 2017 International Conference on 3D Vision (3DV).

[53]  Simone Calderara,et al.  Dark Experience for General Continual Learning: a Strong, Simple Baseline , 2020, NeurIPS.

[54]  Jürgen Schmidhuber,et al.  Neural Expectation Maximization , 2017, NIPS.

[55]  Pietro Perona,et al.  A Bayesian approach to unsupervised one-shot learning of object categories , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[56]  Thomas L. Griffiths,et al.  Reconciling meta-learning and continual learning with online mixtures of tasks , 2018, NeurIPS.

[57]  Erik B. Sudderth,et al.  Memoized Online Variational Inference for Dirichlet Process Mixture Models , 2013, NIPS.

[58]  Kaiming He,et al.  Momentum Contrast for Unsupervised Visual Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[59]  Cordelia Schmid,et al.  End-to-End Incremental Learning , 2018, ECCV.

[60]  Richard S. Zemel,et al.  Prototypical Networks for Few-shot Learning , 2017, NIPS.