High Order Regularization for Semi-Supervised Learning of Structured Output Problems

Semi-supervised learning, which uses unlabeled data to help learn a discriminative model, is especially important for structured output problems, as considerably more effort is needed to label its multi-dimensional outputs versus standard single output problems. We propose a new max-margin framework for semi-supervised structured output learning, that allows the use of powerful discrete optimization algorithms and high order regularizers defined directly on model predictions for the unlabeled examples. We show that our framework is closely related to Posterior Regularization, and the two frameworks optimize special cases of the same objective. The new framework is instantiated on two image segmentation tasks, using both a graph regularizer and a cardinality regularizer. Experiments also demonstrate that this framework can utilize unlabeled data from a different source than the labeled data to significantly improve performance while saving labeling effort.

[1]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[2]  Ben Taskar,et al.  Posterior Regularization for Structured Latent Variable Models , 2010, J. Mach. Learn. Res..

[3]  Thomas Hofmann,et al.  Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[4]  Mikhail Belkin,et al.  Maximum Margin Semi-Supervised Learning for Structured Variables , 2005, NIPS 2005.

[5]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.

[6]  Mikhail Belkin,et al.  Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006, J. Mach. Learn. Res..

[7]  Sebastian Nowozin,et al.  Structured Prediction and Learning in Computer Vision , 2011 .

[8]  Sebastian Thrun,et al.  Learning to Classify Text from Labeled and Unlabeled Documents , 1998, AAAI/IAAI.

[9]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[10]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[11]  Gideon S. Mann,et al.  Generalized Expectation Criteria for Semi-Supervised Learning with Weakly Labeled Data , 2010, J. Mach. Learn. Res..

[12]  H. J. Scudder,et al.  Probability of error of some adaptive pattern-recognition machines , 1965, IEEE Trans. Inf. Theory.

[13]  Marie-Pierre Jolly,et al.  Interactive graph cuts for optimal boundary & region segmentation of objects in N-D images , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[14]  Pietro Perona,et al.  Caltech-UCSD Birds 200 , 2010 .

[15]  Ulf Brefeld,et al.  Semi-supervised learning for structured output variables , 2006, ICML.

[16]  Vladimir Kolmogorov,et al.  Graph cut based image segmentation with connectivity priors , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Rayid Ghani,et al.  Analyzing the effectiveness and applicability of co-training , 2000, CIKM '00.

[18]  Estevam R. Hruschka,et al.  Coupled semi-supervised learning for information extraction , 2010, WSDM '10.

[19]  Alexander Zien,et al.  Semi-Supervised Learning , 2006 .

[20]  Yoshua Bengio,et al.  Semi-supervised Learning by Entropy Minimization , 2004, CAP.

[21]  Slav Petrov,et al.  Efficient Graph-Based Semi-Supervised Learning of Structured Tagging Models , 2010, EMNLP.

[22]  Ben Taskar,et al.  Max-Margin Markov Networks , 2003, NIPS.

[23]  Ben Taskar,et al.  Graph-Based Posterior Regularization for Semi-Supervised Structured Prediction , 2013, CoNLL.

[24]  Richard S. Zemel,et al.  HOP-MAP: Efficient Message Passing with High Order Potentials , 2010, AISTATS.

[25]  Xiaojin Zhu,et al.  Introduction to Semi-Supervised Learning , 2009, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[26]  Pushmeet Kohli,et al.  Robust Higher Order Potentials for Enforcing Label Consistency , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[28]  Shih-Fu Chang,et al.  Graph transduction via alternating minimization , 2008, ICML '08.

[29]  Thorsten Joachims,et al.  Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[30]  Joachim M. Buhmann,et al.  Weakly supervised semantic segmentation with a multi-image model , 2011, 2011 International Conference on Computer Vision.

[31]  Alexander Zien,et al.  Transductive support vector machines for structured variables , 2007, ICML '07.

[32]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[33]  Dale Schuurmans,et al.  Learning to Model Spatial Dependency: Semi-Supervised Discriminative Random Fields , 2006, NIPS.

[34]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[35]  Tamir Hazan,et al.  A Primal-Dual Message-Passing Algorithm for Approximated Large Scale Structured Prediction , 2010, NIPS.

[36]  Xiaojin Zhu,et al.  Semi-Supervised Learning Literature Survey , 2005 .

[37]  Vladimir Kolmogorov,et al.  "GrabCut": interactive foreground extraction using iterated graph cuts , 2004, ACM Trans. Graph..

[38]  Ming-Wei Chang,et al.  Guiding Semi-Supervision with Constraint-Driven Learning , 2007, ACL.

[39]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[40]  Pushmeet Kohli,et al.  A Principled Deep Random Field Model for Image Segmentation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[41]  Sebastian Nowozin,et al.  Structured Learning and Prediction in Computer Vision , 2011, Found. Trends Comput. Graph. Vis..

[42]  D. Sontag 1 Introduction to Dual Decomposition for Inference , 2010 .

[43]  Rahul Gupta,et al.  Efficient inference with cardinality-based clique potentials , 2007, ICML '07.

[44]  Shimon Ullman,et al.  Class-Specific, Top-Down Segmentation , 2002, ECCV.

[45]  Tommi S. Jaakkola,et al.  Introduction to dual composition for inference , 2011 .

[46]  Richard S. Zemel,et al.  Structured Output Learning with High Order Loss Functions , 2012, AISTATS.