An Introduction to Variational Methods for Graphical Models

This paper presents a tutorial introduction to the use of variational methods for inference and learning in graphical models (Bayesian networks and Markov random fields). We present a number of examples of graphical models, including the QMR-DT database, the sigmoid belief network, the Boltzmann machine, and several variants of hidden Markov models, in which it is infeasible to run exact inference algorithms. We then introduce variational methods, which exploit laws of large numbers to transform the original graphical model into a simplified graphical model in which inference is efficient. Inference in the simpified model provides bounds on probabilities of interest in the original model. We describe a general framework for generating variational transformations based on convex duality. Finally we return to the examples and demonstrate how variational algorithms can be formulated in each case.

[1]  L. Baum,et al.  A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .

[2]  丸山 徹 Convex Analysisの二,三の進展について , 1977 .

[3]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[4]  H. Saunders Book Reviews : NUMERICAL METHODS IN FINITE ELEMENT ANALYSIS K.-J. Bathe and E.L. Wilson Prentice-Hall, Inc, Englewood Cliffs, NJ , 1978 .

[5]  J. J. Sakurai,et al.  Modern Quantum Mechanics , 1986 .

[6]  Geoffrey E. Hinton,et al.  Learning and relearning in Boltzmann machines , 1986 .

[7]  Carsten Peterson,et al.  A Mean Field Theory Learning Algorithm for Neural Networks , 1987, Complex Syst..

[8]  Eric Horvitz,et al.  Bounded Conditioning: Flexible Inference for Decisions under Scarce Resources , 2013, UAI 1989.

[9]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[10]  Keiji Kanazawa,et al.  A model for reasoning about persistence and causation , 1989 .

[11]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[12]  Uue Kjjrull Triangulation of Graphs { Algorithms Giving Small Total State Space Triangulation of Graphs { Algorithms Giving Small Total State Space , 1990 .

[13]  A. Hasman,et al.  Probabilistic reasoning in intelligent systems: Networks of plausible inference , 1991 .

[14]  Geoffrey E. Hinton,et al.  Mean field networks that learn to discriminate temporally distorted strings , 1991 .

[15]  Max Henrion,et al.  Search-Based Methods to Bound Diagnostic Probabilities in Very Large Belief Nets , 1991, UAI.

[16]  Gregory F. Cooper,et al.  An Empirical Analysis of Likelihood-Weighting Simulation on a Large, Multiply-Connected Belief Network , 1991, Computers and biomedical research, an international journal.

[17]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[18]  D. Heckerman,et al.  ,81. Introduction , 2022 .

[19]  Radford M. Neal Connectionist Learning of Belief Networks , 1992, Artif. Intell..

[20]  Prakash P. Shenoy,et al.  Valuation-Based Systems for Bayesian Decision Analysis , 1992, Oper. Res..

[21]  R. Martin Chavez,et al.  Approximating Probabilistic Inference in Bayesian Belief Networks , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[22]  Geoffrey E. Hinton,et al.  Keeping the neural networks simple by minimizing the description length of the weights , 1993, COLT '93.

[23]  C. Galland The limitations of deterministic Boltzmann machine learning , 1993 .

[24]  Michael Luby,et al.  Approximating Probabilistic Inference in Bayesian Belief Networks is NP-Hard , 1993, Artif. Intell..

[25]  Michael I. Jordan A statistical approach to decision tree modeling , 1994, COLT '94.

[26]  Denise Draper,et al.  Localized Partial Evaluation of Belief Networks , 1994, UAI.

[27]  Frank Jensen,et al.  Optimal junction Trees , 1994, UAI.

[28]  Michael I. Jordan,et al.  Learning in Boltzmann Trees , 1994, Neural Computation.

[29]  Ross D. Shachter,et al.  Global Conditioning for Probabilistic Inference in Belief Networks , 1994, UAI.

[30]  Walter R. Gilks,et al.  A Language and Program for Complex Bayesian Modelling , 1994 .

[31]  Uffe Kjærulff,et al.  Reduction of Computational Complexity in Bayesian Networks Through Removal of Weak Dependences , 1994, UAI.

[32]  Robert M. Fung,et al.  Backward Simulation in Bayesian Networks , 1994, UAI.

[33]  Uffe Kjærulff,et al.  Blocking Gibbs sampling in very large probabilistic expert systems , 1995, Int. J. Hum. Comput. Stud..

[34]  Stuart J. Russell,et al.  Stochastic simulation algorithms for dynamic probabilistic networks , 1995, UAI.

[35]  Geoffrey E. Hinton,et al.  The Helmholtz Machine , 1995, Neural Computation.

[36]  Geoffrey E. Hinton,et al.  The "wake-sleep" algorithm for unsupervised neural networks. , 1995, Science.

[37]  Hill,et al.  Annealed Theories of Learning , 1995 .

[38]  Steve R. Waterhouse,et al.  Bayesian Methods for Mixtures of Experts , 1995, NIPS.

[39]  Michael I. Jordan,et al.  Exploiting Tractable Substructures in Intractable Networks , 1995, NIPS.

[40]  K. Bathe Finite Element Procedures , 1995 .

[41]  Michael I. Jordan,et al.  Mean Field Theory for Sigmoid Belief Networks , 1996, J. Artif. Intell. Res..

[42]  Michael I. Jordan,et al.  Hidden Markov Decision Trees , 1996, NIPS.

[43]  Michael I. Jordan,et al.  Recursive Algorithms for Approximating Probabilities in Graphical Models , 1996, NIPS.

[44]  Geoffrey E. Hinton,et al.  Switching State-Space Models , 1996 .

[45]  Michael I. Jordan,et al.  Computing upper and lower bounds on likelihoods in intractable networks , 1996, UAI.

[46]  Rina Dechter,et al.  Bucket elimination: A unifying framework for probabilistic inference , 1996, UAI.

[47]  Michael I. Jordan,et al.  Variational methods for inference and estimation in graphical models , 1997 .

[48]  Neil D. Lawrence,et al.  Approximating Posterior Distributions in Belief Networks Using Mixtures , 1997, NIPS.

[49]  ModelsbyTommi S. Jaakkola Variational Methods for Inference and Estimation inGraphical , 1997 .

[50]  Michael I. Jordan,et al.  Probabilistic Independence Networks for Hidden Markov Probability Models , 1997, Neural Computation.

[51]  C. Cruz,et al.  Improving the Mean Field Approximation via the Use of Mixture Distributions , 1998 .

[52]  Robert Cowell,et al.  Introduction to Inference for Bayesian Networks , 1998, Learning in Graphical Models.

[53]  David Heckerman,et al.  A Tutorial on Learning with Bayesian Networks , 1998, Learning in Graphical Models.

[54]  Michael I. Jordan,et al.  Variational methods and the QMR-DT database , 1998 .

[55]  Jung-Fu Cheng,et al.  Turbo Decoding as an Instance of Pearl's "Belief Propagation" Algorithm , 1998, IEEE J. Sel. Areas Commun..

[56]  Michael I. Jordan,et al.  A Mean Field Learning Algorithm for Unsupervised Neural Networks , 1999, Learning in Graphical Models.

[57]  Brian Sallans,et al.  A Hierarchical Community of Experts , 1999, Learning in Graphical Models.

[58]  David J. C. Mackay,et al.  Introduction to Monte Carlo Methods , 1998, Learning in Graphical Models.

[59]  Michael I. Jordan Learning in Graphical Models , 1999, NATO ASI Series.

[60]  Michael I. Jordan,et al.  Improving the Mean Field Approximation Via the Use of Mixture Distributions , 1999, Learning in Graphical Models.

[61]  Geoffrey E. Hinton,et al.  A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[62]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[63]  Michael I. Jordan,et al.  Variational Probabilistic Inference and the QMR-DT Network , 2011, J. Artif. Intell. Res..

[64]  David J. C. MacKay,et al.  Comparison of Approximate Methods for Handling Hyperparameters , 1999, Neural Computation.

[65]  Michael I. Jordan,et al.  MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 2001 .

[66]  Philipp Slusallek,et al.  Introduction to real-time ray tracing , 2005, SIGGRAPH Courses.