Invertible Residual Networks

We show that standard ResNet architectures can be made invertible, allowing the same model to be used for classification, density estimation, and generation. Typically, enforcing invertibility requires partitioning dimensions or restricting network architectures. In contrast, our approach only requires adding a simple normalization step during training, already available in standard frameworks. Invertible ResNets define a generative model which can be trained by maximum likelihood on unlabeled data. To compute likelihoods, we introduce a tractable approximation to the Jacobian log-determinant of a residual block. Our empirical evaluation shows that invertible ResNets perform competitively with both state-of-the-art image classifiers and flow-based generative models, something that has not been previously achieved with a single architecture.

[1]  M. Hutchinson A stochastic estimator of the trace of the influence matrix for laplacian smoothing splines , 1989 .

[2]  Harris Drucker,et al.  Improving generalization performance using double backpropagation , 1992, IEEE Trans. Neural Networks.

[3]  Robert D. Russell,et al.  Numerical solution of boundary value problems for ordinary differential equations , 1995, Classics in applied mathematics.

[4]  B. Hall Lie Groups, Lie Algebras, and Representations: An Elementary Introduction , 2004 .

[5]  Uri M. Ascher,et al.  Numerical Methods for Evolutionary Differential Equations , 2008 .

[6]  C. Withers,et al.  log det A = tr log A , 2010 .

[7]  Sivan Toledo,et al.  Randomized algorithms for estimating the trace of an implicit symmetric positive semi-definite matrix , 2011, JACM.

[8]  Max Welling,et al.  Semi-supervised Learning with Deep Generative Models , 2014, NIPS.

[9]  Shakir Mohamed,et al.  Variational Inference with Normalizing Flows , 2015, ICML.

[10]  Christos Boutsidis,et al.  A Randomized Algorithm for Approximating the Log Determinant of a Symmetric Positive Definite Matrix , 2015, ArXiv.

[11]  Hugo Larochelle,et al.  MADE: Masked Autoencoder for Distribution Estimation , 2015, ICML.

[12]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[13]  Yoshua Bengio,et al.  NICE: Non-linear Independent Components Estimation , 2014, ICLR.

[14]  Nikos Komodakis,et al.  Wide Residual Networks , 2016, BMVC.

[15]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Koray Kavukcuoglu,et al.  Pixel Recurrent Neural Networks , 2016, ICML.

[17]  Jinwoo Shin,et al.  Approximating the Spectral Sums of Large-scale Matrices using Chebyshev Approximations , 2016, ArXiv.

[18]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[19]  Max Welling,et al.  The Variational Fair Autoencoder , 2015, ICLR.

[20]  Heiga Zen,et al.  WaveNet: A Generative Model for Raw Audio , 2016, SSW.

[21]  Sepp Hochreiter,et al.  Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.

[22]  Samy Bengio,et al.  Density estimation using Real NVP , 2016, ICLR.

[23]  Eldad Haber,et al.  Stable architectures for deep neural networks , 2017, ArXiv.

[24]  Raquel Urtasun,et al.  The Reversible Residual Network: Backpropagation Without Storing Activations , 2017, NIPS.

[25]  Iain Murray,et al.  Masked Autoregressive Flow for Density Estimation , 2017, NIPS.

[26]  Andrew Gordon Wilson,et al.  Scalable Log Determinants for Gaussian Process Kernel Learning , 2017, NIPS.

[27]  Guillermo Sapiro,et al.  Robust Large Margin Deep Neural Networks , 2017, IEEE Transactions on Signal Processing.

[28]  Masashi Sugiyama,et al.  Lipschitz-Margin Training: Scalable Certification of Perturbation Invariance for Deep Neural Networks , 2018, NeurIPS.

[29]  Arnold W. M. Smeulders,et al.  i-RevNet: Deep Invertible Networks , 2018, ICLR.

[30]  Bin Dong,et al.  Beyond Finite Layer Neural Networks: Bridging Deep Architectures and Numerical Differential Equations , 2017, ICML.

[31]  Dustin Tran,et al.  Image Transformer , 2018, ICML.

[32]  Will Grathwohl Scalable Reversible Generative Models with Free-form Continuous Dynamics , 2018 .

[33]  Prafulla Dhariwal,et al.  Glow: Generative Flow with Invertible 1x1 Convolutions , 2018, NeurIPS.

[34]  Colin Raffel,et al.  Realistic Evaluation of Deep Semi-Supervised Learning Algorithms , 2018, NeurIPS.

[35]  Ryan P. Adams,et al.  Estimating the Spectral Density of Large Implicit Matrices , 2018, 1802.03451.

[36]  David Duvenaud,et al.  Neural Ordinary Differential Equations , 2018, NeurIPS.

[37]  Christian Osendorfer,et al.  NAIS-Net: Stable Deep Networks from Non-Autonomous Differential Equations , 2018, NeurIPS.

[38]  Eldad Haber,et al.  Reversible Architectures for Arbitrarily Deep Residual Neural Networks , 2017, AAAI.

[39]  Yuichi Yoshida,et al.  Spectral Normalization for Generative Adversarial Networks , 2018, ICLR.

[40]  Oriol Vinyals,et al.  Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.

[41]  Yann LeCun,et al.  Backpropagation for Implicit Spectral Densities , 2018, ArXiv.

[42]  Jinwoo Shin,et al.  Stochastic Chebyshev Gradient Descent for Spectral Optimization , 2018, NeurIPS.

[43]  Honglak Lee,et al.  Information Regularized Neural Networks , 2018 .

[44]  Ullrich Köthe,et al.  Analyzing Inverse Problems with Invertible Neural Networks , 2018, ICLR.

[45]  Matthias Bethge,et al.  Excessive Invariance Causes Adversarial Vulnerability , 2018, ICLR.

[46]  David Duvenaud,et al.  FFJORD: Free-form Continuous Dynamics for Scalable Reversible Generative Models , 2018, ICLR.

[47]  Cem Anil,et al.  Sorting out Lipschitz function approximation , 2018, ICML.

[48]  Yee Whye Teh,et al.  Hybrid Models with Deep and Invertible Features , 2019, ICML.

[49]  Timo Aila,et al.  A Style-Based Generator Architecture for Generative Adversarial Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Eldad Haber,et al.  Deep Neural Networks Motivated by Partial Differential Equations , 2018, Journal of Mathematical Imaging and Vision.

[51]  Yoshua Bengio,et al.  Learning deep representations by mutual information estimation and maximization , 2018, ICLR.

[52]  Philip M. Long,et al.  The Singular Values of Convolutional Layers , 2018, ICLR.

[53]  Matthias Bethge,et al.  Towards the first adversarially robust neural network model on MNIST , 2018, ICLR.

[54]  Bernhard Pfahringer,et al.  Regularisation of neural networks by enforcing Lipschitz continuity , 2018, Machine Learning.