Can contrastive learning avoid shortcut solutions?

The generalization of representations learned via contrastive learning depends crucially on what features of the data are extracted. However, we observe that the contrastive loss does not always sufficiently guide which features are extracted, a behavior that can negatively impact the performance on downstream tasks via “shortcuts”, i.e., by inadvertently suppressing important predictive features. We find that feature extraction is influenced by the difficulty of the so-called instance discrimination task (i.e., the task of discriminating pairs of similar points from pairs of dissimilar ones). Although harder pairs improve the representation of some features, the improvement comes at the cost of suppressing previously well represented features. In response, we propose implicit feature modification (IFM), a method for altering positive and negative samples in order to guide contrastive models towards capturing a wider variety of predictive features. Empirically, we observe that IFM reduces feature suppression, and as a result improves performance on vision and medical imaging tasks. The code is available at: https://github. com/joshr17/IFM.

[1]  Andrew K. Lampinen,et al.  What shapes feature representations? Exploring datasets, architectures, and training , 2020, NeurIPS.

[2]  Phillip Isola,et al.  Understanding Contrastive Representation Learning through Alignment and Uniformity on the Hypersphere , 2020, ICML.

[3]  Huaping Liu,et al.  Understanding the Behaviour of Contrastive Loss , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Matthias Bethge,et al.  Excessive Invariance Causes Adversarial Vulnerability , 2018, ICLR.

[5]  Aleksander Madry,et al.  Adversarial Examples Are Not Bugs, They Are Features , 2019, NeurIPS.

[6]  Pietro Perona,et al.  Recognition in Terra Incognita , 2018, ECCV.

[7]  Matthias Bethge,et al.  ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness , 2018, ICLR.

[8]  J. Lee,et al.  Predicting What You Already Know Helps: Provable Self-Supervised Learning , 2020, NeurIPS.

[9]  Kaiming He,et al.  Improved Baselines with Momentum Contrastive Learning , 2020, ArXiv.

[10]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[11]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[12]  Hongyi Zhang,et al.  mixup: Beyond Empirical Risk Minimization , 2017, ICLR.

[13]  Ting Chen,et al.  Robust Pre-Training by Adversarial Contrastive Learning , 2020, NeurIPS.

[14]  Alexei A. Efros,et al.  Context Encoders: Feature Learning by Inpainting , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Yann LeCun,et al.  Barlow Twins: Self-Supervised Learning via Redundancy Reduction , 2021, ICML.

[16]  Ting Chen,et al.  Intriguing Properties of Contrastive Losses , 2020, NeurIPS.

[17]  Francis Bach,et al.  Implicit Bias of Gradient Descent for Wide Two-layer Neural Networks Trained with the Logistic Loss , 2020, COLT.

[18]  Simon Kornblith,et al.  Do Wide and Deep Networks Learn the Same Things? Uncovering How Neural Network Representations Vary with Width and Depth , 2021, ICLR.

[19]  Aleksander Madry,et al.  Robustness May Be at Odds with Accuracy , 2018, ICLR.

[20]  Huaping Liu,et al.  Unsupervised Representation Learning by InvariancePropagation , 2020, NeurIPS.

[21]  Suvrit Sra,et al.  Strength from Weakness: Fast Learning Using Weak Supervision , 2020, ICML.

[22]  Ching-Yao Chuang,et al.  Debiased Contrastive Learning , 2020, NeurIPS.

[23]  Benjamin Recht,et al.  Do ImageNet Classifiers Generalize to ImageNet? , 2019, ICML.

[24]  M. Bethge,et al.  Shortcut learning in deep neural networks , 2020, Nature Machine Intelligence.

[25]  Li Sun,et al.  Context Matters: Graph-based Self-supervised Representation Learning for Medical Images , 2020, AAAI.

[26]  Stephen Lin,et al.  What makes instance discrimination good for transfer learning? , 2020, ICLR.

[27]  Kaifeng Lyu,et al.  Gradient Descent Maximizes the Margin of Homogeneous Neural Networks , 2019, ICLR.

[28]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[29]  Yannis Kalantidis,et al.  Hard Negative Mixing for Contrastive Learning , 2020, NeurIPS.

[30]  Yoshua Bengio,et al.  Learning deep representations by mutual information estimation and maximization , 2018, ICLR.

[31]  Aapo Hyvärinen,et al.  Noise-contrastive estimation: A new estimation principle for unnormalized statistical models , 2010, AISTATS.

[32]  Sung Ju Hwang,et al.  Adversarial Self-Supervised Contrastive Learning , 2020, NeurIPS.

[33]  Phillip Isola,et al.  Contrastive Multiview Coding , 2019, ECCV.

[34]  Matthias Bethge,et al.  Contrastive Learning Inverts the Data Generating Process , 2021, ICML.

[35]  Guo-Jun Qi,et al.  Contrastive Learning With Stronger Augmentations , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Nuno Vasconcelos,et al.  Contrastive Learning with Adversarial Examples , 2020, NeurIPS.

[37]  Michael I. Jordan,et al.  Theoretically Principled Trade-off between Robustness and Accuracy , 2019, ICML.

[38]  Simon Kornblith,et al.  The Origins and Prevalence of Texture Bias in Convolutional Neural Networks , 2020, NeurIPS.

[39]  Kaiming He,et al.  Momentum Contrast for Unsupervised Visual Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Xiao Wang,et al.  AdCo: Adversarial Contrast for Efficient Learning of Unsupervised Representations from Self-Trained Negative Adversaries , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Cordelia Schmid,et al.  What makes for good views for contrastive learning , 2020, NeurIPS.

[42]  Ching-Yao Chuang,et al.  Contrastive Learning with Hard Negative Samples , 2020, ArXiv.

[43]  Alexei A. Efros,et al.  Colorful Image Colorization , 2016, ECCV.

[44]  Michal Valko,et al.  Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning , 2020, NeurIPS.

[45]  Hossein Mobahi,et al.  The Low-Rank Simplicity Bias in Deep Networks , 2021, ArXiv.

[46]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[47]  Yuan Yuan,et al.  Information-Preserving Contrastive Learning for Self-Supervised Representations , 2020, ArXiv.

[48]  Xinlei Chen,et al.  Exploring Simple Siamese Representation Learning , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[50]  Oriol Vinyals,et al.  Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.

[51]  Olivier Bachem,et al.  Automatic Shortcut Removal for Self-Supervised Representation Learning , 2020, ICML.

[52]  Geoffrey E. Hinton,et al.  A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[53]  Nathan Srebro,et al.  The Implicit Bias of Gradient Descent on Separable Data , 2017, J. Mach. Learn. Res..

[54]  Julien Mairal,et al.  Unsupervised Learning of Visual Features by Contrasting Cluster Assignments , 2020, NeurIPS.

[55]  Jean Ponce,et al.  VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning , 2021, ArXiv.

[56]  E. Regan,et al.  Genetic Epidemiology of COPD (COPDGene) Study Design , 2011, COPD.