Stochastic Segmentation Trees for Multiple Ground Truths

A strong machine learning system should aim to produce the full range of valid interpretations rather than a single mode in tasks involving inherent ambiguity. This is particularly true for image segmentation, in which there are many sensible ways to partition an image into regions. We formulate a tree-structured probabilistic model, the stochastic segmentation tree, that represents a distribution over segmentations of a given image. We train this model by optimizing a novel objective that quantifies the degree of match between statistics of the model and ground truth segmentations. Our method allows learning of both the parameters in the tree and the structure itself. We demonstrate on two datasets, including the challenging Berkeley Segmentation Dataset, that our model is able to successfully capture the range of ground truths and to produce novel plausible segmentations beyond those found in the data.

[1]  Pushmeet Kohli,et al.  Multiple Choice Learning: Learning to Produce Multiple Structured Outputs , 2012, NIPS.

[2]  Gregory Shakhnarovich,et al.  Image Segmentation by Cascaded Region Agglomeration , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[4]  Cristian Sminchisescu,et al.  Image segmentation by figure-ground composition into maximal cliques , 2011, 2011 International Conference on Computer Vision.

[5]  Shankar Kumar,et al.  Lattice Minimum Bayes-Risk Decoding for Statistical Machine Translation , 2008, EMNLP.

[6]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[7]  Ben Taskar,et al.  Discovering Diverse and Salient Threads in Document Collections , 2012, EMNLP.

[8]  Yang Liu,et al.  Minimum Risk Training for Neural Machine Translation , 2015, ACL.

[9]  Ashwin K. Vijayakumar,et al.  Diverse Beam Search: Decoding Diverse Solutions from Neural Sequence Models , 2016, ArXiv.

[10]  Jitendra Malik,et al.  A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[11]  Charless C. Fowlkes,et al.  Contour Detection and Hierarchical Image Segmentation , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Martial Hebert,et al.  Measures of Similarity , 2005, 2005 Seventh IEEE Workshops on Applications of Computer Vision (WACV/MOTION'05) - Volume 1.

[13]  Marie-Pierre Jolly,et al.  Interactive graph cuts for optimal boundary & region segmentation of objects in N-D images , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[14]  H J Schultz-Coulon [Objective criteria for the evaluation of the vocal function]. , 1978, Fortschritte der Medizin.

[15]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[16]  Ronen Basri,et al.  Hierarchy and adaptivity in segmenting visual scenes , 2006, Nature.

[17]  Gregory Shakhnarovich,et al.  Diverse M-Best Solutions in Markov Random Fields , 2012, ECCV.

[18]  Varun Ramakrishna,et al.  Predicting Multiple Structured Visual Interpretations , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[19]  Ben Taskar,et al.  Determinantal Point Processes for Machine Learning , 2012, Found. Trends Mach. Learn..

[20]  Raymond J. Mooney,et al.  Learning for Semantic Parsing with Statistical Machine Translation , 2006, NAACL.

[21]  Jonathan T. Barron,et al.  Multiscale Combinatorial Grouping , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Christopher K. I. Williams,et al.  Tree-Cut for Probabilistic Image Segmentation , 2015, ArXiv.

[23]  Jorge Nocedal,et al.  On the limited memory BFGS method for large scale optimization , 1989, Math. Program..

[24]  Christoph H. Lampert Maximum Margin Multi-Label Structured Prediction , 2011, NIPS.

[25]  Gang Song,et al.  Object Detection Combining Recognition and Segmentation , 2007, ACCV.

[26]  Charless C. Fowlkes,et al.  Shape-based pedestrian parsing , 2011, CVPR 2011.

[27]  Christof Koch,et al.  Boundary Detection Benchmarking: Beyond F-Measures , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Gregory Shakhnarovich,et al.  Discriminative Re-ranking of Diverse Segmentations , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Jitendra Malik,et al.  Using contours to detect and localize junctions in natural images , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Tamir Hazan,et al.  Direct Loss Minimization for Structured Prediction , 2010, NIPS.

[31]  Marc'Aurelio Ranzato,et al.  Sequence Level Training with Recurrent Neural Networks , 2015, ICLR.

[32]  Carsten Rother,et al.  Inferring M-Best Diverse Labelings in a Single One , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[33]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[34]  H. Sebastian Seung,et al.  Learning to Agglomerate Superpixel Hierarchies , 2011, NIPS.

[35]  Yang Song,et al.  Direct Loss Minimization for Training Deep Neural Nets , 2015, ArXiv.