Structured Output Learning with High Order Loss Functions

Often when modeling structured domains, it is desirable to leverage information that is not naturally expressed as simply a label. Examples include knowledge about the evaluation measure that will be used at test time, and partial (weak) label information. When the additional information has structure that factorizes according to small subsets of variables (i.e., is low order, or decomposable), several approaches can be used to incorporate it into a learning procedure. Our focus in this work is the more challenging case, where the additional information does not factorize according to low order graphical model structure; we call this the high order case. We propose to formalize various forms of this additional information as high order loss functions, which may have complex interactions over large subsets of variables. We then address the computational challenges inherent in learning according to such loss functions, particularly focusing on the loss-augmented inference problem that arises in large margin learning; we show that learning with high order loss functions is often practical, giving strong empirical results, with one popular and several novel high-order loss functions, in several settings.

[1]  Pushmeet Kohli,et al.  Higher-Order Models in Computer Vision , 2012 .

[2]  Sebastian Nowozin,et al.  Global connectivity potentials for random field models , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Pushmeet Kohli,et al.  Dynamic Graph Cuts for Efficient Inference in Markov Random Fields , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Gideon S. Mann,et al.  Generalized Expectation Criteria , 2007 .

[5]  Stephen E. Robertson,et al.  SoftRank: optimizing non-smooth rank metrics , 2008, WSDM '08.

[6]  Tai Sing Lee,et al.  Efficient belief propagation for higher-order cliques using linear constraint nodes , 2008, Comput. Vis. Image Underst..

[7]  Thorsten Joachims,et al.  A support vector method for multivariate performance measures , 2005, ICML.

[8]  Vladimir Kolmogorov,et al.  Graph cut based image segmentation with connectivity priors , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Nikos Komodakis,et al.  Beyond pairwise energies: Efficient optimization for higher-order MRFs , 2009, CVPR.

[10]  Tamir Hazan,et al.  A Primal-Dual Message-Passing Algorithm for Approximated Large Scale Structured Prediction , 2010, NIPS.

[11]  Ben Taskar,et al.  Max-Margin Markov Networks , 2003, NIPS.

[12]  Rahul Gupta,et al.  Efficient inference with cardinality-based clique potentials , 2007, ICML '07.

[13]  Pushmeet Kohli,et al.  Minimizing sparse higher order energy functions of discrete variables , 2009, CVPR.

[14]  Thomas Hofmann,et al.  Using Combinatorial Optimization within Max-Product Belief Propagation , 2007 .

[15]  Ben Taskar,et al.  Posterior Regularization for Structured Latent Variable Models , 2010, J. Mach. Learn. Res..

[16]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[17]  Derek Hoiem,et al.  Learning CRFs Using Graph Cuts , 2008, ECCV.

[18]  Vladimir Kolmogorov,et al.  An experimental comparison of min-cut/max- flow algorithms for energy minimization in vision , 2001, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Thorsten Joachims,et al.  Cutting-plane training of structural SVMs , 2009, Machine Learning.

[20]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[21]  Thomas Hofmann,et al.  Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[22]  Gideon S. Mann,et al.  Generalized Expectation Criteria for Semi-Supervised Learning with Weakly Labeled Data , 2010, J. Mach. Learn. Res..

[23]  Yang Wang,et al.  Optimizing Complex Loss Functions in Structured Prediction , 2010, ECCV.

[24]  Pushmeet Kohli,et al.  Learning Low-order Models for Enforcing High-order Statistics , 2012, AISTATS.

[25]  Richard S. Zemel,et al.  HOP-MAP: Efficient Message Passing with High Order Potentials , 2010, AISTATS.

[26]  Toby Sharp,et al.  Image segmentation with a bounding box prior , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[27]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.