Backpropagation for Implicit Spectral Densities

Most successful machine intelligence systems rely on gradient-based learning, which is made possible by backpropagation. Some systems are designed to aid us in interpreting data when explicit goals cannot be provided. These unsupervised systems are commonly trained by backpropagating through a likelihood function. We introduce a tool that allows us to do this even when the likelihood is not explicitly set, by instead using the implicit likelihood of the model. Explicitly defining the likelihood often entails making heavy-handed assumptions that impede our ability to solve challenging tasks. On the other hand, the implicit likelihood of the model is accessible without the need for such assumptions. Our tool, which we call spectral backpropagation, allows us to optimize it in much greater generality than what has been attempted before. GANs can also be viewed as a technique for optimizing implicit likelihoods. We study them using spectral backpropagation in order to demonstrate robustness for high-dimensional problems, and identify two novel properties of the generator G: (1) there exist aberrant, nonsensical outputs to which G assigns very high likelihood, and (2) the eigenvectors of the metric induced by G over latent space correspond to quasi-disentangled explanatory factors.

[1]  M. Hutchinson A stochastic estimator of the trace of the influence matrix for laplacian smoothing splines , 1989 .

[2]  Radford M. Neal Annealed importance sampling , 1998, Stat. Comput..

[3]  Harold R. Parks,et al.  Analytical Tools: The Area Formula, the Coarea Formula, and Poincaré Inequalities. , 2008 .

[4]  Sivan Toledo,et al.  Randomized algorithms for estimating the trace of an implicit symmetric positive semi-definite matrix , 2011, JACM.

[5]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[6]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[7]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[8]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[9]  Shakir Mohamed,et al.  Variational Inference with Normalizing Flows , 2015, ICML.

[10]  Xiaogang Wang,et al.  Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[11]  Christos Boutsidis,et al.  A Randomized Algorithm for Approximating the Log Determinant of a Symmetric Positive Definite Matrix , 2015, ArXiv.

[12]  Hugo Larochelle,et al.  MADE: Masked Autoencoder for Distribution Estimation , 2015, ICML.

[13]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[14]  Alex Graves,et al.  Conditional Image Generation with PixelCNN Decoders , 2016, NIPS.

[15]  Nikos Komodakis,et al.  Wide Residual Networks , 2016, BMVC.

[16]  Koray Kavukcuoglu,et al.  Pixel Recurrent Neural Networks , 2016, ICML.

[17]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[18]  Wojciech Zaremba,et al.  Improved Techniques for Training GANs , 2016, NIPS.

[19]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[20]  Max Welling,et al.  Improving Variational Auto-Encoders using Householder Flow , 2016, ArXiv.

[21]  Heiga Zen,et al.  WaveNet: A Generative Model for Raw Audio , 2016, SSW.

[22]  Samy Bengio,et al.  Density estimation using Real NVP , 2016, ICLR.

[23]  Iain Murray,et al.  Masked Autoregressive Flow for Density Estimation , 2017, NIPS.

[24]  Max Welling,et al.  Improved Variational Inference with Inverse Autoregressive Flow , 2016, NIPS 2016.

[25]  Stephen J. Roberts,et al.  Entropic Trace Estimates for Log Determinants , 2017, ECML/PKDD.

[26]  Ruslan Salakhutdinov,et al.  On the Quantitative Analysis of Decoder-Based Generative Models , 2016, ICLR.

[27]  Yousef Saad,et al.  Fast Estimation of tr(f(A)) via Stochastic Lanczos Quadrature , 2017, SIAM J. Matrix Anal. Appl..

[28]  Sitao Xiang,et al.  On the Effects of Batch and Weight Normalization in Generative Adversarial Networks , 2017 .

[29]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[30]  Jinwoo Shin,et al.  Approximating Spectral Sums of Large-Scale Matrices using Stochastic Chebyshev Approximations , 2017, SIAM J. Sci. Comput..

[31]  Richard E. Turner,et al.  Gradient Estimators for Implicit Models , 2017, ICLR.

[32]  Max Welling,et al.  Sylvester Normalizing Flows for Variational Inference , 2018, UAI.

[33]  Heiga Zen,et al.  Parallel WaveNet: Fast High-Fidelity Speech Synthesis , 2017, ICML.