Loss-sensitive Training of Probabilistic Conditional Random Fields

We consider the problem of training probabilistic conditional random fields (CRFs) in the context of a task where performance is measured using a specific loss function. While maximum likelihood is the most common approach to training CRFs, it ignores the inherent structure of the task's loss function. We describe alternatives to maximum likelihood which take that loss into account. These include a novel adaptation of a loss upper bound from the structured SVMs literature to the CRF context, as well as a new loss-inspired KL divergence objective which relies on the probabilistic nature of CRFs. These loss-sensitive objectives are compared to maximum likelihood using ranking as a benchmark task. This comparison confirms the importance of incorporating loss information in the probabilistic training of CRFs, with the loss-inspired KL outperforming all other objectives.

[1]  Miguel Á. Carreira-Perpiñán,et al.  Multiscale conditional random fields for image labeling , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[2]  Jaime G. Carbonell,et al.  Protein Fold Recognition Using Segmentation Conditional Random Fields (SCRFs) , 2006, J. Comput. Biol..

[3]  Fernando Pereira,et al.  Shallow Parsing with Conditional Random Fields , 2003, NAACL.

[4]  Andrew McCallum,et al.  Piecewise Training for Undirected Models , 2005, UAI.

[5]  Wei-Ying Ma,et al.  2D Conditional Random Fields for Web information extraction , 2005, ICML.

[6]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[7]  Jin Yu,et al.  Exponential Family Graph Matching and Ranking , 2009, NIPS.

[8]  Martial Hebert,et al.  Discriminative Fields for Modeling Spatial Dependencies in Natural Images , 2003, NIPS.

[9]  J. Besag Statistical Analysis of Non-Lattice Data , 1975 .

[10]  Tao Qin,et al.  LETOR: Benchmark Dataset for Research on Learning to Rank for Information Retrieval , 2007 .

[11]  Maksims Volkovs,et al.  BoltzRank: learning to maximize expected ranking gain , 2009, ICML '09.

[12]  Alex Acero,et al.  Hidden conditional random fields for phone classification , 2005, INTERSPEECH.

[13]  William W. Cohen,et al.  Semi-Markov Conditional Random Fields for Information Extraction , 2004, NIPS.

[14]  Thomas Hofmann,et al.  Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[15]  Tamir Hazan,et al.  Direct Loss Minimization for Structured Prediction , 2010, NIPS.

[16]  Yasubumi Sakakibara,et al.  RNA secondary structural alignment with conditional random fields , 2005, ECCB/JBI.

[17]  Stephen E. Robertson,et al.  SoftRank: optimizing non-smooth rank metrics , 2008, WSDM '08.

[18]  Dan Roth,et al.  Integer linear programming inference for conditional random fields , 2005, ICML.

[19]  Phil Blunsom,et al.  Semantic Role Labelling with Tree Conditional Random Fields , 2005, CoNLL.

[20]  Jun Suzuki,et al.  Training Conditional Random Fields with Multivariate Evaluation Measures , 2006, ACL.

[21]  Tamir Hazan,et al.  A Primal-Dual Message-Passing Algorithm for Approximated Large Scale Structured Prediction , 2010, NIPS.

[22]  Trevor Darrell,et al.  Conditional Random Fields for Object Recognition , 2004, NIPS.

[23]  Yee Whye Teh,et al.  An Alternate Objective Function for Markovian Fields , 2002, ICML.

[24]  Olga Russakovsky,et al.  Training Conditional Random Fields for Maximum Labelwise Accuracy , 2006, NIPS.

[25]  Ben Taskar,et al.  Max-Margin Markov Networks , 2003, NIPS.