论文信息 - What Do We Maximize in Self-Supervised Learning?

What Do We Maximize in Self-Supervised Learning?

In this paper, we examine self-supervised learning methods, particularly VICReg, to provide an information-theoretical understanding of their construction. As a ﬁrst step, we demonstrate how information-theoretic quantities can be obtained for a deterministic network, offering a possible alternative to prior work that relies on stochastic models. This enables us to demonstrate how VICReg can be (re)discovered from ﬁrst principles and its assumptions about data distribution. Fur-thermore, we empirically demonstrate the validity of our assumptions, conﬁrming our novel understanding of VICReg. Finally, we believe that the derivation and insights we obtain can be gener-alized to many other SSL methods, opening new avenues for theoretical and practical understanding of SSL and transfer learning.

Yann LeCun | Ravid Shwartz-Ziv | Randall Balestriero

[1] Ravid Shwartz-Ziv. Information Flow in Deep Neural Networks , 2022, ArXiv.

[2] John Canny,et al. Compressive Visual Representations , 2021, NeurIPS.

[3] Chris J. Maddison,et al. Lossy Compression for Lossless Prediction , 2021, NeurIPS.

[4] Julien Mairal,et al. Emerging Properties in Self-Supervised Vision Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[5] Xinlei Chen,et al. Exploring Simple Siamese Representation Learning , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6] J. Lee,et al. Predicting What You Already Know Helps: Provable Self-Supervised Learning , 2020, NeurIPS.

[7] Julien Mairal,et al. Unsupervised Learning of Visual Features by Contrasting Cluster Assignments , 2020, NeurIPS.

[8] Pierre H. Richemond,et al. Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning , 2020, NeurIPS.

[9] Zeynep Akata,et al. Learning Robust Representations via Multi-View Information Bottleneck , 2020, ICLR.

[10] Geoffrey E. Hinton,et al. A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[11] Thomas Steinke,et al. Reasoning About Generalization via Conditional Mutual Information , 2020, COLT.

[12] Laurens van der Maaten,et al. Self-Supervised Learning of Pretext-Invariant Representations , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13] Naftali Tishby,et al. The Dual Information Bottleneck , 2019, ArXiv.

[14] Rana Ali Amjad,et al. Learning Representations for Neural Network-Based Classification Using the Information Bottleneck Principle , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15] Alexander A. Alemi,et al. Information in Infinite Ensembles of Infinitely-Wide Neural Networks , 2019, AABI.

[16] Alexander A. Alemi,et al. On Variational Bounds of Mutual Information , 2019, ICML.

[17] Mikhail Khodak,et al. A Theoretical Analysis of Contrastive Unsupervised Representation Learning , 2019, ICML.

[18] Yoshua Bengio,et al. Learning deep representations by mutual information estimation and maximization , 2018, ICLR.

[19] Brian Kingsbury,et al. Estimating Information Flow in Neural Networks , 2018, ArXiv.

[20] Naftali Tishby,et al. REPRESENTATION COMPRESSION AND GENERALIZATION IN DEEP NEURAL NETWORKS , 2018 .

[21] Oriol Vinyals,et al. Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.

[22] Aaron C. Courville,et al. MINE: Mutual Information Neural Estimation , 2018, ArXiv.

[23] Stefano Soatto,et al. Information Dropout: Learning Optimal Representations Through Noisy Computation , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24] Richard G. Baraniuk,et al. A Spline Theory of Deep Networks , 2018, ICML.

[25] Artemy Kolchinsky,et al. Estimating Mixture Entropy with Pairwise Distances , 2017, Entropy.

[26] Maxim Raginsky,et al. Information-theoretic analysis of generalization capability of learning algorithms , 2017, NIPS.

[27] Naftali Tishby,et al. Opening the Black Box of Deep Neural Networks via Information , 2017, ArXiv.

[28] Alexander A. Alemi,et al. Deep Variational Information Bottleneck , 2017, ICLR.

[29] Kamyar Moshksar,et al. Arbitrarily Tight Bounds on Differential Entropy of Gaussian Mixtures , 2016, IEEE Transactions on Information Theory.

[30] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..

[31] Razvan Pascanu,et al. On the Number of Linear Regions of Deep Neural Networks , 2014, NIPS.

[32] Magnus Egerstedt,et al. Control Theoretic Splines: Optimal Control, Statistics, and Path Planning , 2009 .

[33] Mikhail Belkin,et al. DATA SPECTROSCOPY: EIGENSPACES OF CONVOLUTION OPERATORS AND CLUSTERING , 2008, 0807.3719.

[34] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .

[35] Hugh F. Durrant-Whyte,et al. On entropy approximation for Gaussian mixture random vectors , 2008, 2008 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems.

[36] C. Fantuzzi,et al. Identification of piecewise affine models in noisy environment , 2002 .

[37] Ward Cheney,et al. A course in approximation theory , 1999 .

[38] Yann LeCun,et al. Signature Verification Using A "Siamese" Time Delay Neural Network , 1993, Int. J. Pattern Recognit. Artif. Intell..

[39] Ralph Linsker,et al. Self-organization in a perceptual network , 1988, Computer.

[40] R. D'Agostino. An omnibus test of normality for moderate and large size samples , 1971 .