Information Constraints on Auto-Encoding Variational Bayes

Parameterizing the approximate posterior of a generative model with neural networks has become a common theme in recent machine learning research. While providing appealing flexibility, this approach makes it difficult to impose or assess structural constraints such as conditional independence. We propose a framework for learning representations that relies on Auto-Encoding Variational Bayes and whose search space is constrained via kernel-based measures of independence. In particular, our method employs the $d$-variable Hilbert-Schmidt Independence Criterion (dHSIC) to enforce independence between the latent representations and arbitrary nuisance factors. We show how to apply this method to a range of problems, including the problems of learning invariant representations and the learning of interpretable representations. We also present a full-fledged application to single-cell RNA sequencing (scRNA-seq). In this setting the biological signal is mixed in complex ways with sequencing errors and sampling effects. We show that our method out-performs the state-of-the-art in this domain.

[1]  Michael Satosi Watanabe,et al.  Information Theoretical Analysis of Multivariate Correlation , 1960, IBM J. Res. Dev..

[2]  J. Urgen Schmidhuber,et al.  Learning Factorial Codes by Predictability Minimization , 1992 .

[3]  David J. Kriegman,et al.  From Few to Many: Illumination Cone Models for Face Recognition under Variable Lighting and Pose , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Bernhard Schölkopf,et al.  Measuring Statistical Dependence with Hilbert-Schmidt Norms , 2005, ALT.

[5]  David J. Kriegman,et al.  Acquiring linear subspaces for face recognition under variable lighting , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Le Song,et al.  A Kernel Statistical Test of Independence , 2007, NIPS.

[7]  Bernhard Schölkopf,et al.  Kernel Measures of Conditional Dependence , 2007, NIPS.

[8]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[9]  Guillermo Sapiro,et al.  Supervised Dictionary Learning , 2008, NIPS.

[10]  Joseph Hilbe,et al.  Data Analysis Using Regression and Multilevel/Hierarchical Models , 2009 .

[11]  Hugo Larochelle,et al.  Efficient Learning of Deep Boltzmann Machines , 2010, AISTATS.

[12]  Eva K. Lee,et al.  Systems Biology of Seasonal Influenza Vaccination in Humans , 2011, Nature Immunology.

[13]  Bernhard Schölkopf,et al.  A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[14]  Jean-Philippe Thiran,et al.  Lower and upper bounds for approximation of the Kullback-Leibler divergence between Gaussian Mixture Models , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[16]  Max Welling,et al.  Semi-supervised Learning with Deep Generative Models , 2014, NIPS.

[17]  W. Huber,et al.  Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 , 2014, Genome Biology.

[18]  A. Oudenaarden,et al.  Validation of noise models for single-cell transcriptomics , 2014, Nature Methods.

[19]  P. Linsley,et al.  MAST: a flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data , 2015, Genome Biology.

[20]  Allon M. Klein,et al.  Droplet Barcoding for Single-Cell Transcriptomics Applied to Embryonic Stem Cells , 2015, Cell.

[21]  Evan Z. Macosko,et al.  Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets , 2015, Cell.

[22]  Fabian J Theis,et al.  Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells , 2015, Nature Biotechnology.

[23]  Navdeep Jaitly,et al.  Adversarial Autoencoders , 2015, ArXiv.

[24]  John P. Cunningham,et al.  Bayesian Learning of Kernel Embeddings , 2016, UAI.

[25]  Pieter Abbeel,et al.  InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets , 2016, NIPS.

[26]  Ruslan Salakhutdinov,et al.  Importance Weighted Autoencoders , 2015, ICLR.

[27]  A. Regev,et al.  Revealing the vectors of cellular identity with single-cell genomics , 2016, Nature Biotechnology.

[28]  Ryan P. Adams,et al.  Composing graphical models with neural networks for structured representations and fast inference , 2016, NIPS.

[29]  Max Welling,et al.  The Variational Fair Autoencoder , 2015, ICLR.

[30]  B. Schölkopf,et al.  Kernel‐based tests for joint independence , 2016, 1603.00285.

[31]  Sandrine Dudoit,et al.  Performance Assessment and Selection of Normalization Procedures for Single-Cell RNA-Seq , 2017 .

[32]  Jean-Philippe Vert,et al.  ZINB-WaVE: A general and flexible method for signal extraction from single-cell RNA-seq data , 2017, bioRxiv.

[33]  Grace X. Y. Zheng,et al.  Massively parallel digital transcriptional profiling of single cells , 2016, Nature Communications.

[34]  Arthur Gretton,et al.  An Adaptive Test of Independence with Analytic Kernel Embeddings , 2016, ICML.

[35]  Max Welling,et al.  Improved Variational Inference with Inverse Autoregressive Flow , 2016, NIPS 2016.

[36]  A. Regev,et al.  Scaling single-cell genomics from phenomenology to mechanism , 2017, Nature.

[37]  Christopher Burgess,et al.  beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.

[38]  Zoltán Szabó,et al.  Characteristic and Universal Tensor Product Kernels , 2017, J. Mach. Learn. Res..

[39]  S. Dudoit,et al.  A general and flexible method for signal extraction from single-cell RNA-seq data , 2018, Nature Communications.

[40]  Junjie Zhu,et al.  Visualization and analysis of single-cell RNA-seq data by kernel-based similarity learning , 2016, bioRxiv.

[41]  Andriy Mnih,et al.  Disentangling by Factorising , 2018, ICML.

[42]  Guillaume Desjardins,et al.  Understanding disentangling in β-VAE , 2018, ArXiv.

[43]  Roger B. Grosse,et al.  Isolating Sources of Disentanglement in Variational Autoencoders , 2018, NeurIPS.

[44]  Le Song,et al.  Learning to Explain: An Information-Theoretic Perspective on Model Interpretation , 2018, ICML.

[45]  Nir Yosef,et al.  Bayesian Inference for a Generative Model of Transcriptome Profiles from Single-cell RNA Sequencing , 2018, bioRxiv.

[46]  Gustau Camps-Valls,et al.  Sensitivity maps of the Hilbert-Schmidt independence criterion , 2016, Appl. Soft Comput..