Directly Training Joint Energy-Based Models for Conditional Synthesis and Calibrated Prediction of Multi-Attribute Data

Multi-attribute classification generalizes classification, presenting new challenges for making accurate predictions and quantifying uncertainty. We build upon recent work and show that architectures for multi-attribute prediction can be reinterpreted as energy-based models (EBMs). While existing EBM approaches achieve strong discriminative performance, they are unable to generate samples conditioned on novel attribute combinations. We propose a simple extension which expands the capabilities of EBMs to generating accurate conditional samples. Our approach, combined with newly developed techniques in energy-based model training, allows us to directly maximize the likelihood of data and labels under the unnormalized joint distribution. We evaluate our proposed approach on high-dimensional image data with high-dimensional binary attribute labels. We find our models are capable of both accurate, calibrated predictions and high-quality conditional synthesis of novel attribute combinations.

[1]  Shuang Li,et al.  Improved Contrastive Divergence Training of Energy Based Models , 2020, ICML.

[2]  Kristen Grauman,et al.  Fine-Grained Visual Comparisons with Local Learning , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Tian Han,et al.  On the Anatomy of MCMC-based Maximum Likelihood Learning of Energy-Based Models , 2019, AAAI.

[4]  Quoc V. Le,et al.  Swish: a Self-Gated Activation Function , 2017, 1710.05941.

[5]  Rob Fergus,et al.  Energy-based models for atomic-resolution protein conformations , 2020, ICLR.

[6]  Erik Nijkamp,et al.  Learning Non-Convergent Non-Persistent Short-Run MCMC Toward Energy-Based Model , 2019, NeurIPS.

[7]  Igor Mordatch,et al.  Implicit Generation and Generalization with Energy Based Models , 2018 .

[8]  David Duvenaud,et al.  Oops I Took A Gradient: Scalable Sampling for Discrete Distributions , 2021, ICML.

[9]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[10]  Antonio Torralba,et al.  Energy-Based Models for Continual Learning , 2020, ArXiv.

[11]  Mohammad Norouzi,et al.  Your Classifier is Secretly an Energy Based Model and You Should Treat it Like One , 2019, ICLR.

[12]  Guangyao Zhou Mixed Hamiltonian Monte Carlo for Mixed Discrete and Continuous Variables , 2019, NeurIPS.

[13]  Mohammad Norouzi,et al.  Deep Value Networks Learn to Evaluate and Iteratively Refine Structured Outputs , 2017, ICML.

[14]  Andrew McCallum,et al.  End-to-End Learning for Structured Prediction Energy Networks , 2017, ICML.

[15]  Kenji Doya,et al.  Sigmoid-Weighted Linear Units for Neural Network Function Approximation in Reinforcement Learning , 2017, Neural Networks.

[16]  Erik Nijkamp,et al.  Learning Multi-layer Latent Variable Model via Variational Optimization of Short Run MCMC for Approximate Inference , 2019, ECCV.

[17]  Yee Whye Teh,et al.  Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[18]  Kevin Gimpel,et al.  Gaussian Error Linear Units (GELUs) , 2016 .

[19]  Kristen Grauman,et al.  Semantic Jitter: Dense Supervision for Visual Comparisons via Synthetic Images , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[20]  Xiaogang Wang,et al.  Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[21]  Tijmen Tieleman,et al.  Training restricted Boltzmann machines using approximations to the likelihood gradient , 2008, ICML '08.

[22]  Yang Lu,et al.  A Theory of Generative ConvNet , 2016, ICML.