Structured Prediction via the Extragradient Method

We present a simple and scalable algorithm for large-margin estimation of structured models, including an important class of Markov networks and combinatorial models. We formulate the estimation problem as a convex-concave saddle-point problem and apply the extragradient method, yielding an algorithm with linear convergence using simple gradient and projection calculations. The projection step can be solved using combinatorial algorithms for min-cost quadratic flow. This makes the approach an efficient alternative to formulations based on reductions to a quadratic program (QP). We present experiments on two very different structured prediction tasks: 3D image segmentation and word alignment, illustrating the favorable scaling properties of our algorithm.

[1]  S. M. Robinson Bounds for error in the solution set of a perturbed linear program , 1973 .

[2]  G. M. Korpelevich The extragradient method for finding saddle points and other problems , 1976 .

[3]  Leslie G. Valiant,et al.  The Complexity of Computing the Permanent , 1979, Theor. Comput. Sci..

[4]  Lamberto Cesari,et al.  Optimization-Theory And Applications , 1983 .

[5]  R. Glowinski,et al.  Numerical Methods for Nonlinear Variational Problems , 1985 .

[6]  E. Khobotov Modification of the extra-gradient method for solving variational inequalities and certain optimization problems , 1989 .

[7]  D. Greig,et al.  Exact Maximum A Posteriori Estimation for Binary Images , 1989 .

[8]  Paul Tseng,et al.  Error Bound and Convergence Analysis of Matrix Splitting Algorithms for the Affine Variational Inequality Problem , 1992, SIAM J. Optim..

[9]  Mark Jerrum,et al.  Polynomial-Time Approximation Algorithms for the Ising Model , 1990, SIAM J. Comput..

[10]  P. Tseng On linear convergence of iterative methods for the variational inequality problem , 1995 .

[11]  Paul Tseng,et al.  An ε-Relaxation Method for Separable Convex Cost Network Flow Problems , 1997, SIAM J. Optim..

[12]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[13]  Michael Collins,et al.  Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[14]  L. Liao,et al.  Improvements of Some Projection Methods for Monotone Nonlinear Variational Inequalities , 2002 .

[15]  P. Tseng,et al.  Implementation and Test of Auction Methods for Solving Generalized Network Flow Problems with Separable Convex Cost , 2002 .

[16]  Martial Hebert,et al.  Discriminative Fields for Modeling Spatial Dependencies in Natural Images , 2003, NIPS.

[17]  Ben Taskar,et al.  Max-Margin Markov Networks , 2003, NIPS.

[18]  Ted Pedersen,et al.  An Evaluation Exercise for Word Alignment , 2003, ParallelTexts@NAACL-HLT.

[19]  Thomas Hofmann,et al.  Hidden Markov Support Vector Machines , 2003, ICML.

[20]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[21]  Alexander Schrijver,et al.  Combinatorial optimization. Polyhedra and efficiency. , 2003 .

[22]  Pierre Baldi,et al.  Large-Scale Prediction of Disulphide Bond Connectivity , 2004, NIPS.

[23]  Xiaojin Zhu,et al.  Kernel conditional random fields: representation and clique selection , 2004, ICML.

[24]  Hermann Ney,et al.  Symmetric Word Alignments for Statistical Machine Translation , 2004, COLING.

[25]  Ben Taskar,et al.  Learning associative Markov networks , 2004, ICML.

[26]  Ben Taskar,et al.  Exponentiated Gradient Algorithms for Large-margin Structured Classification , 2004, NIPS.

[27]  Ben Taskar,et al.  Discriminative learning of Markov random fields for segmentation of 3D scan data , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[28]  Ben Taskar,et al.  A Discriminative Matching Approach to Word Alignment , 2005, HLT.

[29]  Ben Taskar,et al.  Learning structured prediction models: a large margin approach , 2005, ICML.

[30]  Ben Taskar,et al.  Structured Prediction, Dual Extragradient and Bregman Projections , 2006, J. Mach. Learn. Res..