Accelerated training of conditional random fields with stochastic gradient methods

We apply Stochastic Meta-Descent (SMD), a stochastic gradient optimization method with gain vector adaptation, to the training of Conditional Random Fields (CRFs). On several large data sets, the resulting optimizer converges to the same quality of solution over an order of magnitude faster than limited-memory BFGS, the leading method reported to date. We report results for both exact and inexact inference techniques.

[1]  R. Tarjan,et al.  A Separator Theorem for Planar Graphs , 1977 .

[2]  J. F. C. Kingman,et al.  Information and Exponential Families in Statistical Theory , 1980 .

[3]  T. Stephenson Image analysis , 1984, Nature.

[4]  J. Besag On the Statistical Analysis of Dirty Pictures , 1986 .

[5]  Barak A. Pearlmutter Fast Exact Multiplication by the Hessian , 1994, Neural Computation.

[6]  Gerhard Winkler,et al.  Image analysis, random fields and dynamic Monte Carlo methods: a mathematical introduction , 1995, Applications of mathematics.

[7]  Olga Veksler,et al.  Fast approximate energy minimization via graph cuts , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[8]  Nicol N. Schraudolph Local Gain Adaptation in Stochastic Gradient Descent , 1999 .

[9]  Sabine Buchholz,et al.  Introduction to the CoNLL-2000 Shared Task Chunking , 2000, CoNLL/LLL.

[10]  Andreas Griewank,et al.  Evaluating derivatives - principles and techniques of algorithmic differentiation, Second Edition , 2000, Frontiers in applied mathematics.

[11]  M. Opper,et al.  Comparing the Mean Field Method and Belief Propagation for Approximate Inference in MRFs , 2001 .

[12]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[13]  Michael Collins,et al.  Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[14]  Nicol N. Schraudolph,et al.  Fast Curvature Matrix-Vector Products for Second-Order Gradient Descent , 2002, Neural Computation.

[15]  Martial Hebert,et al.  Discriminative Fields for Modeling Spatial Dependencies in Natural Images , 2003, NIPS.

[16]  William T. Freeman,et al.  Understanding belief propagation and its generalizations , 2003 .

[17]  Nicol N. Schraudolph,et al.  Combining Conjugate Direction Methods with Stochastic Approximation of Gradients , 2003, AISTATS.

[18]  Martial Hebert,et al.  Man-made structure detection in natural images using a causal multiscale random field , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[19]  Fernando Pereira,et al.  Shallow Parsing with Conditional Random Fields , 2003, NAACL.

[20]  Nigel Collier,et al.  Introduction to the Bio-entity Recognition Task at JNLPBA , 2004, NLPBA/BioNLP.

[21]  Burr Settles,et al.  Biomedical Named Entity Recognition using Conditional Random Fields and Rich Feature Sets , 2004, NLPBA/BioNLP.

[22]  Patrick Pérez,et al.  Interactive Image Segmentation Using an Adaptive GMMRF Model , 2004, ECCV.

[23]  Alfonso Valencia,et al.  Overview of BioCreAtIvE: critical assessment of information extraction for biology , 2005, BMC Bioinformatics.

[24]  Max Welling,et al.  Learning in Markov Random Fields An Empirical Study , 2005 .

[25]  Vladimir Kolmogorov,et al.  Convergent Tree-Reweighted Message Passing for Energy Minimization , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Alexander J. Smola,et al.  Step Size Adaptation in Reproducing Kernel Hilbert Space , 2006, J. Mach. Learn. Res..