Disentanglement and Generalization Under Correlation Shifts

Correlations between factors of variation are prevalent in real-world data. Machine learning algorithms may benefit from exploiting such correlations, as they can increase predictive performance on noisy data. However, often such correlations are not robust (e.g., they may change between domains, datasets, or applications) and we wish to avoid exploiting them. Disentanglement methods aim to learn representations which capture different factors of variation in latent subspaces. A common approach involves minimizing the mutual information between latent subspaces, such that each encodes a single underlying attribute. However, this fails when attributes are correlated. We solve this problem by enforcing independence between subspaces conditioned on the available attributes, which allows us to remove only dependencies that are not due to the correlation structure present in the training data. We achieve this via an adversarial approach to minimize the conditional mutual information (CMI) between subspaces with respect to categorical variables. We first show theoretically that CMI minimization is a good objective for robust disentanglement on linear problems with Gaussian data. We then apply our method on real-world datasets based on MNIST and CelebA, and show that it yields models that are disentangled and robust under correlation shift, including in weakly supervised settings. Preliminary work. * denotes joint first authors. † denotes joint senior authors.

[1]  S. Varadhan,et al.  Asymptotic evaluation of certain Markov process expectations for large time , 1975 .

[2]  Christian Jutten,et al.  Blind separation of sources, part I: An adaptive algorithm based on neuromimetic architecture , 1991, Signal Process..

[3]  Pierre Comon,et al.  Independent component analysis, A new concept? , 1994, Signal Process..

[4]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[5]  Terrence J. Sejnowski,et al.  The “independent components” of natural scenes are edge filters , 1997, Vision Research.

[6]  Jean-François Cardoso,et al.  Multidimensional independent component analysis , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[7]  Aapo Hyvärinen,et al.  Emergence of Phase- and Shift-Invariant Features by Decomposition of Natural Images into Independent Feature Subspaces , 2000, Neural Computation.

[8]  Ken Perlin,et al.  Improving noise , 2002, SIGGRAPH.

[9]  J. Karhunen,et al.  Advances in Nonlinear Blind Source Separation , 2003 .

[10]  Y. LeCun,et al.  Learning methods for generic object recognition with invariance to pose and lighting , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[11]  Bernhard Schölkopf,et al.  A Kernel Method for the Two-Sample-Problem , 2006, NIPS.

[12]  Laurent El Ghaoui,et al.  Robust Optimization , 2021, ICORES.

[13]  Toon Calders,et al.  Classifying without discriminating , 2009, 2009 2nd International Conference on Computer, Control and Communication.

[14]  Pietro Perona,et al.  Caltech-UCSD Birds 200 , 2010 .

[15]  Jun Sakuma,et al.  Fairness-aware Learning through Regularization Approach , 2011, 2011 IEEE 11th International Conference on Data Mining Workshops.

[16]  Toniann Pitassi,et al.  Learning Fair Representations , 2013, ICML.

[17]  Iasonas Kokkinos,et al.  Describing Textures in the Wild , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[19]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[20]  Xiaogang Wang,et al.  Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[21]  Yuting Zhang,et al.  Deep Visual Analogy-Making , 2015, NIPS.

[22]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[23]  Nathan Srebro,et al.  Equality of Opportunity in Supervised Learning , 2016, NIPS.

[24]  Sebastian Nowozin,et al.  f-GAN: Training Generative Neural Samplers using Variational Divergence Minimization , 2016, NIPS.

[25]  Trevor Darrell,et al.  Adversarial Discriminative Domain Adaptation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Alexander A. Alemi,et al.  Deep Variational Information Bottleneck , 2017, ICLR.

[27]  Christopher Burgess,et al.  DARLA: Improving Zero-Shot Transfer in Reinforcement Learning , 2017, ICML.

[28]  Christopher Burgess,et al.  beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.

[29]  Léon Bottou,et al.  Wasserstein Generative Adversarial Networks , 2017, ICML.

[30]  Andriy Mnih,et al.  Disentangling by Factorising , 2018, ICML.

[31]  Sebastian Nowozin,et al.  Multi-Level Variational Autoencoder: Learning Disentangled Representations from Grouped Observations , 2017, AAAI.

[32]  Prafulla Dhariwal,et al.  Glow: Generative Flow with Invertible 1x1 Convolutions , 2018, NeurIPS.

[33]  Abhishek Kumar,et al.  Variational Inference of Disentangled Latent Concepts from Unlabeled Observations , 2017, ICLR.

[34]  Zoubin Ghahramani,et al.  Discovering Interpretable Representations for Both Deep Generative and Discriminative Models , 2018, ICML.

[35]  Pietro Perona,et al.  Recognition in Terra Incognita , 2018, ECCV.

[36]  Aaron C. Courville,et al.  MINE: Mutual Information Neural Estimation , 2018, ArXiv.

[37]  Dacheng Tao,et al.  Dual Swap Disentangling , 2018, NeurIPS.

[38]  Roger B. Grosse,et al.  Isolating Sources of Disentanglement in Variational Autoencoders , 2018, NeurIPS.

[39]  Yu-Chiang Frank Wang,et al.  A Unified Feature Disentangler for Multi-Domain Image Translation and Manipulation , 2018, NeurIPS.

[40]  Oriol Vinyals,et al.  Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.

[41]  Sotirios A. Tsaftaris,et al.  Factorised spatial representation learning: application in semi-supervised myocardial segmentation , 2018, MICCAI.

[42]  Christopher K. I. Williams,et al.  A Framework for the Quantitative Evaluation of Disentangled Representations , 2018, ICLR.

[43]  Toniann Pitassi,et al.  Learning Adversarially Fair and Transferable Representations , 2018, ICML.

[44]  Kun Zhang,et al.  On Learning Invariant Representation for Domain Adaptation , 2019, ArXiv.

[45]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[46]  Toniann Pitassi,et al.  Flexibly Fair Representation Learning by Disentanglement , 2019, ICML.

[47]  Alexander A. Alemi,et al.  On Variational Bounds of Mutual Information , 2019, ICML.

[48]  Himanshu Asnani,et al.  CCMI : Classifier based Conditional Mutual Information Estimation , 2019, UAI.

[49]  Sjoerd van Steenkiste,et al.  Are Disentangled Representations Helpful for Abstract Visual Reasoning? , 2019, NeurIPS.

[50]  Matthias Bethge,et al.  Excessive Invariance Causes Adversarial Vulnerability , 2018, ICLR.

[51]  David Lopez-Paz,et al.  Invariant Risk Minimization , 2019, ArXiv.

[52]  Bernhard Schölkopf,et al.  Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations , 2018, ICML.

[53]  Stefan Bauer,et al.  On the Transfer of Inductive Bias from Simulation to the Real World: a New Disentanglement Dataset , 2019, NeurIPS.

[54]  Stefan Bauer,et al.  On the Fairness of Disentangled Representations , 2019, NeurIPS.

[55]  Shiguang Shan,et al.  AttGAN: Facial Attribute Editing by Only Changing What You Want , 2017, IEEE Transactions on Image Processing.

[56]  Sergey Levine,et al.  Wasserstein Dependency Measure for Representation Learning , 2019, NeurIPS.

[57]  Stefan Bauer,et al.  Robustly Disentangled Causal Mechanisms: Validating Deep Representations for Interventional Robustness , 2018, ICML.

[58]  Yoshua Bengio,et al.  Learning deep representations by mutual information estimation and maximization , 2018, ICLR.

[59]  Kate Saenko,et al.  Domain Agnostic Learning with Disentangled Representations , 2019, ICML.

[60]  Kun Zhang,et al.  A Causal View on Robustness of Neural Networks , 2020, NeurIPS.

[61]  Ali Razavi,et al.  Data-Efficient Image Recognition with Contrastive Predictive Coding , 2019, ICML.

[62]  Aapo Hyvärinen,et al.  Variational Autoencoders and Nonlinear ICA: A Unifying Framework , 2019, AISTATS.

[63]  Ben Poole,et al.  Weakly-Supervised Disentanglement Without Compromises , 2020, ICML.

[64]  Sreeram Kannan,et al.  C-MI-GAN : Estimation of Conditional Mutual Information using MinMax formulation , 2020, UAI.

[65]  Zhe Gan,et al.  CLUB: A Contrastive Log-ratio Upper Bound of Mutual Information , 2020, ICML.

[66]  Vladimir Pavlovic,et al.  Unsupervised Multi-Target Domain Adaptation: An Information Theoretic Approach , 2018, IEEE Transactions on Image Processing.

[67]  Francesco Locatello,et al.  Is Independence all you need? On the Generalization of Representations Learned from Correlated Data , 2020, ArXiv.

[68]  Alexei A. Efros,et al.  Swapping Autoencoder for Deep Image Manipulation , 2020, NeurIPS.

[69]  József Németh,et al.  Adversarial Disentanglement with Grouped Observations , 2020, AAAI.

[70]  Aleksander Madry,et al.  From ImageNet to Image Classification: Contextualizing Progress on Benchmarks , 2020, ICML.

[71]  Phillip Isola,et al.  Contrastive Multiview Coding , 2019, ECCV.

[72]  Ashish Khetan,et al.  PacGAN: The Power of Two Samples in Generative Adversarial Networks , 2017, IEEE Journal on Selected Areas in Information Theory.

[73]  Mikael Skoglund,et al.  Conditional Mutual Information Neural Estimator , 2019, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[74]  Changho Suh,et al.  A Fair Classifier Using Mutual Information , 2020, 2020 IEEE International Symposium on Information Theory (ISIT).

[75]  Francesco Locatello,et al.  A Sober Look at the Unsupervised Learning of Disentangled Representations and their Evaluation , 2020, J. Mach. Learn. Res..

[76]  Jakub M. Tomczak,et al.  DIVA: Domain Invariant Variational Autoencoders , 2019, DGS@ICLR.

[77]  Xiaohua Zhai,et al.  Are we done with ImageNet? , 2020, ArXiv.

[78]  Weakly Supervised Disentanglement with Guarantees , 2019, ICLR.

[79]  A. Prinz Chocolate Consumption and Noble Laureates , 2020 .

[80]  Matthias Bethge,et al.  Towards Nonlinear Disentanglement in Natural Data with Temporal Sparse Coding , 2020, ICLR.

[81]  Phillip Isola,et al.  Using latent space regression to analyze and leverage compositionality in GANs , 2021, ICLR.

[82]  Philip H. S. Torr,et al.  Capturing Label Characteristics in VAEs , 2021, ICLR.

[83]  B. Schölkopf,et al.  On the Transfer of Disentangled Representations in Realistic Settings , 2020, ICLR.

[84]  Mikael Skoglund,et al.  Neural Estimators for Conditional Mutual Information Using Nearest Neighbors Sampling , 2020, IEEE Transactions on Signal Processing.