论文信息 - Learning the Stein Discrepancy for Training and Evaluating Energy-Based Models without Sampling

Learning the Stein Discrepancy for Training and Evaluating Energy-Based Models without Sampling

We present a new method for evaluating and training unnormalized density models. Our approach only requires access to the gradient of the unnormalized model's log-density. We estimate the Stein discrepancy between the data density $p(x)$ and the model density $q(x)$ defined by a vector function of the data. We parameterize this function with a neural network and fit its parameters to maximize the discrepancy. This yields a novel goodness-of-fit test which outperforms existing methods on high dimensional data. Furthermore, optimizing $q(x)$ to minimize this discrepancy produces a novel method for training unnormalized models which scales more gracefully than existing methods. The ability to both learn and compare models is a unique feature of the proposed method.

[1] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.

[2] Dilin Wang,et al. Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm , 2016, NIPS.

[3] Prafulla Dhariwal,et al. Glow: Generative Flow with Invertible 1x1 Convolutions , 2018, NeurIPS.

[4] Bernhard Schölkopf,et al. A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[5] Yuichi Yoshida,et al. Spectral Normalization for Generative Adversarial Networks , 2018, ICLR.

[6] Geoffrey E. Hinton. Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[7] Tian Han,et al. On the Anatomy of MCMC-based Maximum Likelihood Learning of Energy-Based Models , 2019, AAAI.

[8] Hao Wu,et al. Boltzmann generators: Sampling equilibrium states of many-body systems with deep learning , 2018, Science.

[9] Mohammad Norouzi,et al. Understanding Posterior Collapse in Generative Latent Variable Models , 2019, DGS@ICLR.

[10] Yann LeCun,et al. Synergistic Face Detection and Pose Estimation with Energy-Based Models , 2004, J. Mach. Learn. Res..

[11] Lester W. Mackey,et al. Measuring Sample Quality with Kernels , 2017, ICML.

[12] Alessandro Barp,et al. Minimum Stein Discrepancy Estimators , 2019, NeurIPS.

[13] Bernhard Schölkopf,et al. Deep Energy Estimator Networks , 2018, ArXiv.

[14] M. Hutchinson. A stochastic estimator of the trace of the influence matrix for laplacian smoothing splines , 1989 .

[15] Erik Nijkamp,et al. On Learning Non-Convergent Short-Run MCMC Toward Energy-Based Model , 2019, ArXiv.

[16] Qiang Liu,et al. A Kernelized Stein Discrepancy for Goodness-of-fit Tests , 2016, ICML.

[17] John O'Leary,et al. Unbiased Markov chain Monte Carlo with couplings , 2017, 1708.03625.

[18] Tapani Raiko,et al. Gaussian-Bernoulli deep Boltzmann machine , 2013, The 2013 International Joint Conference on Neural Networks (IJCNN).

[19] Song-Chun Zhu,et al. Learning Descriptor Networks for 3D Shape Synthesis and Analysis , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[20] Quoc V. Le,et al. Searching for Activation Functions , 2018, arXiv.

[21] Emmanuel Müller,et al. The Shape of Data: Intrinsic Distance for Data Distributions , 2020, ICLR.

[22] Jeff Donahue,et al. Large Scale GAN Training for High Fidelity Natural Image Synthesis , 2018, ICLR.

[23] Marc'Aurelio Ranzato,et al. Energy-Based Models in Document Recognition and Computer Vision , 2007, Ninth International Conference on Document Analysis and Recognition (ICDAR 2007).

[24] Yee Whye Teh,et al. A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[25] Mohammad Norouzi,et al. Your Classifier is Secretly an Energy Based Model and You Should Treat it Like One , 2019, ICLR.

[26] Zhijian Ou,et al. Learning Neural Random Fields with Inclusive Auxiliary Generators , 2018, ArXiv.

[27] Matthias Bethge,et al. A note on the evaluation of generative models , 2015, ICLR.

[28] Song-Chun Zhu,et al. Learning Energy-Based Spatial-Temporal Generative ConvNets for Dynamic Patterns , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[30] Igor Mordatch,et al. Implicit Generation and Generalization with Energy Based Models , 2018 .

[31] Pascal Vincent,et al. A Connection Between Score Matching and Denoising Autoencoders , 2011, Neural Computation.

[32] Michael U. Gutmann,et al. Conditional Noise-Contrastive Estimation of Unnormalised Models , 2018, ICML.

[33] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[34] Dustin Tran,et al. Operator Variational Inference , 2016, NIPS.

[35] Aapo Hyvärinen,et al. Noise-contrastive estimation: A new estimation principle for unnormalized statistical models , 2010, AISTATS.

[36] Yoshua Bengio,et al. Equilibrium Propagation: Bridging the Gap between Energy-Based Models and Backpropagation , 2016, Front. Comput. Neurosci..

[37] Alexander J. Smola,et al. Generative Models and Model Criticism via Optimized Maximum Mean Discrepancy , 2016, ICLR.

[38] Andrew M. Dai,et al. Flow Contrastive Estimation of Energy-Based Models , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[39] Yang Song,et al. Generative Modeling by Estimating Gradients of the Data Distribution , 2019, NeurIPS.

[40] Shakir Mohamed,et al. Variational Inference with Normalizing Flows , 2015, ICML.

[41] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[42] Rob Fergus,et al. Energy-based models for atomic-resolution protein conformations , 2020, ICLR.

[43] C. Stein. A bound for the error in the normal approximation to the distribution of a sum of dependent random variables , 1972 .

[44] Jinwoo Shin,et al. Approximating Spectral Sums of Large-Scale Matrices using Stochastic Chebyshev Approximations , 2017, SIAM J. Sci. Comput..

[45] Kenji Fukumizu,et al. A Linear-Time Kernel Goodness-of-Fit Test , 2017, NIPS.

[46] Yang Lu,et al. A Theory of Generative ConvNet , 2016, ICML.

[47] Debora S. Marks,et al. Learning Protein Structure with a Differentiable Simulator , 2018, ICLR.

[48] David Duvenaud,et al. FFJORD: Free-form Continuous Dynamics for Scalable Reversible Generative Models , 2018, ICLR.

[49] Yee Whye Teh,et al. Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[50] Aapo Hyvärinen,et al. Estimation of Non-Normalized Statistical Models by Score Matching , 2005, J. Mach. Learn. Res..

[51] Yang Song,et al. Sliced Score Matching: A Scalable Approach to Density and Score Estimation , 2019, UAI.

[52] Nicola De Cao,et al. Explorations in Homeomorphic Variational Auto-Encoding , 2018, ArXiv.