Fast Inference and Learning for Modeling Documents with a Deep Boltzmann Machine

We introduce a type of Deep Boltzmann Machine (DBM) that is suitable for extracting distributed semantic representations from a large unstructured collection of documents. We propose an approximate inference method that interacts with learning in a way that makes it possible to train the DBM more eciently than previously proposed methods. Even though the model has two hidden layers, it can be trained just as eciently as a standard Restricted Boltzmann Machine. Our experiments show that the model assigns better log probability to unseen data than the Replicated Softmax model. Features extracted from our model outperform LDA, Replicated Softmax, and DocNADE models on document retrieval and document classication tasks.

[1]  Geoffrey E. Hinton,et al.  Replicated Softmax: an Undirected Topic Model , 2009, NIPS.

[2]  Radford M. Neal Annealed importance sampling , 1998, Stat. Comput..

[3]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[4]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[5]  Thomas L. Griffiths,et al.  The nested chinese restaurant process and bayesian nonparametric inference of topic hierarchies , 2007, JACM.

[6]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[7]  David M. Blei,et al.  Probabilistic topic models , 2012, Commun. ACM.

[8]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[9]  Chong Wang,et al.  Variational Inference for the Nested Chinese Restaurant Process , 2009, NIPS.

[10]  Geoffrey E. Hinton,et al.  Deep Boltzmann Machines , 2009, AISTATS.

[11]  Tijmen Tieleman,et al.  Training restricted Boltzmann machines using approximations to the likelihood gradient , 2008, ICML '08.

[12]  Yee Whye Teh,et al.  Collapsed Variational Inference for HDP , 2007, NIPS.

[13]  Andrew McCallum,et al.  Topic Models Conditioned on Arbitrary Features with Dirichlet-multinomial Regression , 2008, UAI.

[14]  Hugo Larochelle,et al.  A Neural Autoregressive Topic Model , 2012, NIPS.

[15]  L. Younes On the convergence of markovian stochastic algorithms with rapidly decreasing ergodicity rates , 1999 .

[16]  Geoffrey E. Hinton,et al.  A Better Way to Pretrain Deep Boltzmann Machines , 2012, NIPS.

[17]  Thomas L. Griffiths,et al.  Online Inference of Topics with Latent Dirichlet Allocation , 2009, AISTATS.