Agreement-Based Learning

The learning of probabilistic models with many hidden variables and non-decomposable dependencies is an important and challenging problem. In contrast to traditional approaches based on approximate inference in a single intractable model, our approach is to train a set of tractable submodels by encouraging them to agree on the hidden variables. This allows us to capture non-decomposable aspects of the data while still maintaining tractability. We propose an objective function for our approach, derive EM-style algorithms for parameter estimation, and demonstrate their effectiveness on three challenging real-world learning tasks.

[1]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[2]  Wei Chu,et al.  Relational Learning with Gaussian Processes , 2006, NIPS.

[3]  Michael Ghil,et al.  Advanced data assimilation in strongly nonlinear dynamical systems , 1994 .

[4]  A. Stuart,et al.  Sampling the posterior: An approach to non-Gaussian data assimilation , 2007 .

[5]  G. Eyink,et al.  A mean field approximation in data assimilation for nonlinear dynamics , 2004 .

[6]  Zoubin Ghahramani,et al.  Bayesian Inference for Gaussian Mixed Graph Models , 2006, UAI.

[7]  Andrew McCallum,et al.  Piecewise Training for Undirected Models , 2005, UAI.

[8]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[9]  Andrew McCallum,et al.  Automating the Construction of Internet Portals with Machine Learning , 2000, Information Retrieval.

[10]  Nebojsa Jojic,et al.  Efficient approximations for learning phylogenetic HMM models from data , 2004, ISMB/ECCB.

[11]  P. Spirtes,et al.  Ancestral graph Markov models , 2002 .

[12]  G. Eyink,et al.  Accelerated Monte Carlo for Optimal Estimation of Time Series , 2005 .

[13]  James D. Annan,et al.  Parameter estimation in an intermediate complexity earth system model using an ensemble Kalman filter , 2005 .

[14]  Dan Klein,et al.  Corpus-Based Induction of Syntactic Structure: Models of Dependency and Constituency , 2004, ACL.

[15]  Geoffrey E. Hinton Products of experts , 1999 .

[16]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[17]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[18]  David Barber,et al.  Expectation Correction for Smoothed Inference in Switching Linear Dynamical Systems , 2006, J. Mach. Learn. Res..

[19]  Andrew McCallum,et al.  Piecewise pseudolikelihood for efficient training of conditional random fields , 2007, ICML '07.

[20]  P. Kloeden,et al.  Numerical Solution of Stochastic Differential Equations , 1992 .

[21]  Ben Taskar,et al.  Alignment by Agreement , 2006, NAACL.

[22]  Dan Cornford,et al.  Gaussian Process Approximations of Stochastic Differential Equations , 2007, Gaussian Processes in Practice.

[23]  Hermann Ney,et al.  HMM-Based Word Alignment in Statistical Translation , 1996, COLING.

[24]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[25]  Lise Getoor,et al.  Link-Based Classification , 2003, Encyclopedia of Machine Learning and Data Mining.

[26]  David Haussler,et al.  Combining Phylogenetic and Hidden Markov Models in Biosequence Analysis , 2004, J. Comput. Biol..

[27]  P. Fearnhead,et al.  Exact and computationally efficient likelihood‐based estimation for discretely observed diffusion processes (with discussion) , 2006 .