Linear and Parallel Learning of Markov Random Fields

We introduce a new embarrassingly parallel parameter learning algorithm for Markov random fields with untied parameters which is efficient for a large class of practical models. Our algorithm parallelizes naturally over cliques and, for graphs of bounded degree, its complexity is linear in the number of cliques. Unlike its competitors, our algorithm is fully parallel and for loglinear models it is also data efficient, requiring only the local sufficient statistics of the data to estimate parameters.

[1]  Geoffrey E. Hinton,et al.  A Learning Algorithm for Boltzmann Machines , 1985, Cogn. Sci..

[2]  R. Fisher,et al.  On the Mathematical Foundations of Theoretical Statistics , 1922 .

[3]  K. Mardia,et al.  Maximum likelihood estimation using composite likelihoods for closed exponential families , 2009 .

[4]  Nando de Freitas,et al.  Inductive Principles for Restricted Boltzmann Machine Learning , 2010, AISTATS.

[5]  Stephen E. Fienberg,et al.  Maximum likelihood estimation in log-linear models , 2011, 1104.3618.

[6]  Harry Joe,et al.  Composite Likelihood Methods , 2012 .

[7]  Charles J. Geyer,et al.  EXTENDING PSEUDO-LIKELIHOOD FOR POTTS MODELS , 2011 .

[8]  Riccardo Zecchina,et al.  Survey propagation: An algorithm for satisfiability , 2002, Random Struct. Algorithms.

[9]  Michael P. Friedlander,et al.  Probing the Pareto Frontier for Basis Pursuit Solutions , 2008, SIAM J. Sci. Comput..

[10]  Sebastian Tschiatschek,et al.  Introduction to Probabilistic Graphical Models , 2014 .

[11]  Felix J. Herrmann,et al.  Fighting the Curse of Dimensionality: Compressive Sensing in Exploration Seismology , 2012, IEEE Signal Processing Magazine.

[12]  D. J. Strauss,et al.  Pseudolikelihood Estimation for Social Networks , 1990 .

[13]  Richard Szeliski,et al.  A Comparative Study of Energy Minimization Methods for Markov Random Fields with Smoothness-Based Priors , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Nando de Freitas,et al.  On Autoencoders and Score Matching for Energy Based Models , 2011, ICML.

[15]  Michael P. Friedlander,et al.  Sparse Optimization with Least-Squares Constraints , 2011, SIAM J. Optim..

[16]  Giorgio Parisi,et al.  Numerical Simulations of Spin Glass Systems , 1997 .

[17]  C. Vogel Computational Methods for Inverse Problems , 1987 .

[18]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[19]  Alfred O. Hero,et al.  Distributed Covariance Estimation in Gaussian Graphical Models , 2010, IEEE Transactions on Signal Processing.

[20]  Michael I. Jordan,et al.  An asymptotic analysis of generative, discriminative, and pseudolikelihood estimators , 2008, ICML '08.

[21]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[22]  Mark W. Schmidt,et al.  On Sparse, Spectral and Other Parameterizations of Binary Probabilistic Models , 2012, AISTATS.

[23]  L. Younes Parametric Inference for imperfectly observed Gibbsian fields , 1989 .

[24]  Andrew McCallum,et al.  An Introduction to Conditional Random Fields , 2010, Found. Trends Mach. Learn..

[25]  A. Hyvärinen,et al.  Estimation of Non-normalized Statistical Models , 2009 .

[26]  Nando de Freitas,et al.  Asymptotic Efficiency of Deterministic Estimators for Discrete Energy-Based Models: Ratio Matching and Pseudolikelihood , 2011, UAI.

[27]  Aapo Hyv Estimation of Non-Normalized Statistical Models by Score Matching , 2005 .

[28]  Xuejin Chen,et al.  Sketch-based tree modeling using Markov random field , 2008, SIGGRAPH 2008.

[29]  Leonhard Held,et al.  Gaussian Markov Random Fields: Theory and Applications , 2005 .

[30]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[31]  J. Lafferty,et al.  High-dimensional Ising model selection using ℓ1-regularized logistic regression , 2010, 1010.0311.

[32]  Stefan Kindermann,et al.  Large scale inverse problems : computational methods and applications in the earth sciences , 2013 .

[33]  Joseph K. Bradley,et al.  Sample Complexity of Composite Likelihood , 2012, AISTATS.

[34]  N. Reid,et al.  AN OVERVIEW OF COMPOSITE LIKELIHOOD METHODS , 2011 .

[35]  Uri M. Ascher,et al.  Stochastic Algorithms for Inverse Problems Involving PDEs and many Measurements , 2014, SIAM J. Sci. Comput..

[36]  Lieven Vandenberghe,et al.  Maximum-likelihood estimation of autoregressive models with conditional independence constraints , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[37]  E. Haber,et al.  Preconditioned all-at-once methods for large, sparse parameter estimation problems , 2001 .

[38]  Xuejin Chen,et al.  Sketch-based tree modeling using Markov random field , 2008, ACM Trans. Graph..

[39]  David Salesin,et al.  Interactive digital photomontage , 2004, ACM Trans. Graph..

[40]  Padhraic Smyth,et al.  Learning with Blocks: Composite Likelihood and Contrastive Divergence , 2010, AISTATS.

[41]  Olga Veksler,et al.  Graph Cuts in Vision and Graphics: Theories and Applications , 2006, Handbook of Mathematical Models in Computer Vision.

[42]  J. Besag Statistical Analysis of Non-Lattice Data , 1975 .

[43]  Pedro Larrañaga,et al.  An Introduction to Probabilistic Graphical Models , 2002, Estimation of Distribution Algorithms.

[44]  Michael I. Jordan Graphical Models , 1998 .

[45]  D. Griffeath,et al.  Introduction to Random Fields , 2020, 2007.09660.

[46]  Faming Liang,et al.  Statistical and Computational Inverse Problems , 2006, Technometrics.

[47]  Larry Wasserman,et al.  All of Statistics , 2004 .

[48]  Michel Galley,et al.  A Skip-Chain Conditional Random Field for Ranking Meeting Utterances by Importance , 2006, EMNLP.

[49]  M. W. Johnson,et al.  Quantum annealing with manufactured spins , 2011, Nature.

[50]  Edward J. Wegman,et al.  Statistical Signal Processing , 1985 .

[51]  Yair Weiss,et al.  Minimizing and Learning Energy Functions for Side-Chain Prediction , 2007, RECOMB.

[52]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[53]  Matthew Richardson,et al.  Markov logic networks , 2006, Machine Learning.

[54]  Nando de Freitas,et al.  Toward the Implementation of a Quantum RBM , 2011 .

[55]  Paul Smolensky,et al.  Information processing in dynamical systems: foundations of harmony theory , 1986 .

[56]  Aapo Hyvärinen,et al.  Connections Between Score Matching, Contrastive Divergence, and Pseudolikelihood for Continuous-Valued Variables , 2007, IEEE Transactions on Neural Networks.

[57]  Alfred O. Hero,et al.  Distributed Learning of Gaussian Graphical Models via Marginal Likelihoods , 2013, AISTATS.

[58]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[59]  Sebastian Nowozin Constructing Composite Likelihoods in General Random Fields , 2013 .

[60]  Stan Z. Li,et al.  Markov Random Field Modeling in Image Analysis , 2001, Computer Science Workbench.

[61]  Alfred O. Hero,et al.  Marginal Likelihoods for Distributed Parameter Estimation of Gaussian Graphical Models , 2014, IEEE Transactions on Signal Processing.

[62]  J. Besag Spatial Interaction and the Statistical Analysis of Lattice Systems , 1974 .

[63]  Guy Lebanon,et al.  Stochastic Composite Likelihood , 2010, J. Mach. Learn. Res..

[64]  J. M. Hammersley,et al.  Markov fields on finite graphs and lattices , 1971 .

[65]  J J Hopfield,et al.  Neurons with graded response have collective computational properties like those of two-state neurons. , 1984, Proceedings of the National Academy of Sciences of the United States of America.

[66]  John Odentrantz,et al.  Markov Chains: Gibbs Fields, Monte Carlo Simulation, and Queues , 2000, Technometrics.

[67]  Dan Roth,et al.  On the Hardness of Approximate Reasoning , 1993, IJCAI.

[68]  Joseph K. Bradley,et al.  Learning Large-Scale Conditional Random Fields , 2013 .

[69]  U. Ascher Computational methods for large distributed parameter estimation problems with possible discontinuities , 2003 .

[70]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[71]  Misha Denil Toward the Implementation of a Quantum RBM , 2011 .

[72]  Stéphane Mallat,et al.  A Wavelet Tour of Signal Processing - The Sparse Way, 3rd Edition , 2008 .

[73]  Qiang Liu,et al.  Distributed Parameter Estimation via Pseudo-likelihood , 2012, ICML.

[74]  J. Laurie Snell,et al.  Markov Random Fields and Their Applications , 1980 .

[75]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[76]  Nando de Freitas,et al.  Efficient Learning of Practical Markov Random Fields with Exact Inference , 2013, ArXiv.

[77]  Tony F. Chan,et al.  Image processing and analysis - variational, PDE, wavelet, and stochastic methods , 2005 .

[78]  Bobby R. Hunt,et al.  Sectioned methods for image restoration , 1978 .