On the duality between contrastive and non-contrastive self-supervised learning

Recent approaches in self-supervised learning of image representations can be categorized into different families of methods and, in particular, can be divided into contrastive and non-contrastive approaches. While differences between the two families have been thoroughly discussed to motivate new approaches, we focus more on the theoretical similarities between them. By designing contrastive and covariance based non-contrastive criteria that can be related algebraically and shown to be equivalent under limited assumptions, we show how close those families can be. We further study popular methods and introduce variations of them, allowing us to relate this theoretical result to current practices and show the influence (or lack thereof) of design choices on downstream performance. Motivated by our equivalence result, we investigate the low performance of SimCLR and show how it can match VICReg's with careful hyperparameter tuning, improving significantly over known baselines. We also challenge the popular assumptions that contrastive and non-contrastive methods, respectively, need large batch sizes and output dimensions. Our theoretical and quantitative results suggest that the numerical gaps between contrastive and non-contrastive methods in certain regimes can be closed given better network design choices and hyperparameter tuning. The evidence shows that unifying different SOTA methods is an important direction to build a better understanding of self-supervised learning.

[1]  Yann LeCun,et al.  Contrastive and Non-Contrastive Self-Supervised Learning Recover Global and Local Spectral Embedding Methods , 2022, NeurIPS.

[2]  Jeff Z. HaoChen,et al.  Beyond Separability: Analyzing the Linear Transferability of Contrastive Representations to Related Subpopulations , 2022, NeurIPS.

[3]  Sang Michael Xie,et al.  Connect, Not Collapse: Explaining Contrastive Learning for Unsupervised Domain Adaptation , 2022, ICML.

[4]  Chaoning Zhang,et al.  Dual Temperature Helps Contrastive Learning Without Many Negative Samples: Towards Understanding and Simplifying MoCo , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Jinjin Tian,et al.  Contrasting the landscape of contrastive and non-contrastive learning , 2022, AISTATS.

[6]  Teck Khim Ng,et al.  Mugs: A Multi-Granular Self-Supervised Learning Framework , 2022, ArXiv.

[7]  Yann LeCun,et al.  Neural Manifold Clustering and Embedding , 2022, ArXiv.

[8]  Lars Buesing,et al.  Pushing the limits of self-supervised ResNets: Can we outperform supervised learning without labels on ImageNet? , 2022, ArXiv.

[9]  T. Furon,et al.  Watermarking Images in Self-Supervised Latent Spaces , 2021, ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[10]  Jifeng Dai,et al.  Exploring the Equivalence of Siamese Self-Supervised Learning via A Unified Gradient Framework , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Yann LeCun,et al.  Understanding Dimensional Collapse in Contrastive Self-supervised Learning , 2021, ICLR.

[12]  Yann LeCun,et al.  Decoupled Contrastive Learning , 2021, ECCV.

[13]  Chunyuan Li,et al.  Efficient Self-supervised Vision Transformers for Representation Learning , 2021, ICLR.

[14]  Yann LeCun,et al.  VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning , 2021, ICLR.

[15]  Tao Kong,et al.  iBOT: Image BERT Pre-Training with Online Tokenizer , 2021, ArXiv.

[16]  Mingyang Yi,et al.  Towards the Generalization of Contrastive Self-Supervised Learning , 2021, ICLR.

[17]  John Canny,et al.  Compressive Visual Representations , 2021, NeurIPS.

[18]  Jeff Z. HaoChen,et al.  Provable Guarantees for Self-Supervised Deep Learning with Spectral Contrastive Loss , 2021, Neural Information Processing Systems.

[19]  Yuanzhi Li,et al.  Toward Understanding the Feature Learning Process of Self-supervised Contrastive Learning , 2021, ICML.

[20]  Julien Mairal,et al.  Emerging Properties in Self-Supervised Vision Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[21]  Ruslan Salakhutdinov,et al.  A Note on Connecting Barlow Twins with Negative-Sample-Free Contrastive Learning , 2021, ArXiv.

[22]  Saining Xie,et al.  An Empirical Study of Training Self-Supervised Vision Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[23]  Yann LeCun,et al.  Barlow Twins: Self-Supervised Learning via Redundancy Reduction , 2021, ICML.

[24]  Yuandong Tian,et al.  Understanding self-supervised Learning Dynamics without Contrastive Pairs , 2021, ICML.

[25]  Xinlei Chen,et al.  Exploring Simple Siamese Representation Learning , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Ting Chen,et al.  Intriguing Properties of Contrastive Losses , 2020, NeurIPS.

[27]  Nicu Sebe,et al.  Whitening for Self-Supervised Representation Learning , 2020, ICML.

[28]  Julien Mairal,et al.  Unsupervised Learning of Visual Features by Contrasting Cluster Assignments , 2020, NeurIPS.

[29]  Pierre H. Richemond,et al.  Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning , 2020, NeurIPS.

[30]  Phillip Isola,et al.  Understanding Contrastive Representation Learning through Alignment and Uniformity on the Hypersphere , 2020, ICML.

[31]  Kaiming He,et al.  Improved Baselines with Momentum Contrastive Learning , 2020, ArXiv.

[32]  Geoffrey E. Hinton,et al.  A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[33]  Ross B. Girshick,et al.  Momentum Contrast for Unsupervised Visual Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Oriol Vinyals,et al.  Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.

[35]  Yang You,et al.  Large Batch Training of Convolutional Networks , 2017, 1708.03888.

[36]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Quoc V. Le,et al.  ICA with Reconstruction Cost for Efficient Overcomplete Feature Learning , 2011, NIPS.

[38]  S. Li Concise Formulas for the Area and Volume of a Hyperspherical Cap , 2011 .

[39]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[40]  Yann LeCun,et al.  Signature Verification Using A "Siamese" Time Delay Neural Network , 1993, Int. J. Pattern Recognit. Artif. Intell..