Occlusion Boundary Detection via Deep Exploration of Context

Occlusion boundaries contain rich perceptual information about the underlying scene structure. They also provide important cues in many visual perception tasks such as scene understanding, object recognition, and segmentation. In this paper, we improve occlusion boundary detection via enhanced exploration of contextual information (e.g., local structural boundary patterns, observations from surrounding regions, and temporal context), and in doing so develop a novel approach based on convolutional neural networks (CNNs) and conditional random fields (CRFs). Experimental results demonstrate that our detector significantly outperforms the state-of-the-art (e.g., improving the F-measure from 0.62 to 0.71 on the commonly used CMU benchmark). Last but not least, we empirically assess the roles of several important components of the proposed detector, so as to validate the rationale behind this approach.

[1]  Hanumant Singh,et al.  Camouflaging an Object from Many Viewpoints , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Xiaogang Wang,et al.  Deep Learning Face Representation from Predicting 10,000 Classes , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Andrew W. Fitzgibbon,et al.  Learning spatiotemporal T-junctions for occlusion detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[4]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[5]  Jean Ponce,et al.  Learning a convolutional neural network for non-uniform motion blur removal , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Irving Biederman,et al.  On the Semantics of a Glance at a Scene , 2017 .

[7]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[8]  Shaogang Gong,et al.  Model Selection for Unsupervised Learning of Visual Context , 2006, International Journal of Computer Vision.

[9]  Cordelia Schmid,et al.  Learning to detect Motion Boundaries , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  A. Torralba,et al.  The role of context in object recognition , 2007, Trends in Cognitive Sciences.

[11]  Shiguang Shan,et al.  Shape driven kernel adaptation in Convolutional Neural Network for robust facial trait recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Peter Kontschieder,et al.  Structured class-labels in random forests for semantic image labelling , 2011, 2011 International Conference on Computer Vision.

[13]  Martial Hebert,et al.  Occlusion Boundaries from Motion: Low-Level Detection and Mid-Level Reasoning , 2009, International Journal of Computer Vision.

[14]  Victor S. Lempitsky,et al.  N^4 -Fields: Neural Network Nearest Neighbor Fields for Image Transforms , 2014, ACCV.

[15]  Cristian Sminchisescu,et al.  Efficient Closed-Form Solution to Generalized Boundary Detection , 2012, ECCV.

[16]  Jean Ponce,et al.  Learning mid-level features for recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[17]  Pushmeet Kohli,et al.  Markov Random Fields for Vision and Image Processing , 2011 .

[18]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[19]  Alexei A. Efros,et al.  Beyond Categories: The Visual Memex Model for Reasoning About Object Relationships , 2009, NIPS.

[20]  Zhuowen Tu,et al.  Auto-Context and Its Application to High-Level Vision Tasks and 3D Brain Image Segmentation , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  David V. Anderson,et al.  Finding Temporally Consistent Occlusion Boundaries in Videos Using Geometric Context , 2015, 2015 IEEE Winter Conference on Applications of Computer Vision.

[22]  Xi Wang,et al.  High-Resolution Stereo Datasets with Subpixel-Accurate Ground Truth , 2014, GCPR.

[23]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[24]  Abhinav Gupta,et al.  Designing deep networks for surface normal estimation , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Brian Taylor,et al.  Causal video object segmentation from persistence of occlusions , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Michael J. Black,et al.  A Quantitative Analysis of Current Practices in Optical Flow Estimation and the Principles Behind Them , 2013, International Journal of Computer Vision.

[27]  Antonio Torralba,et al.  Context-based vision system for place and object recognition , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[28]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[29]  Yan Wang,et al.  DeepContour: A deep convolutional feature learned by positive-sharing loss for contour detection , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Thomas Brox,et al.  A Unified Video Segmentation Benchmark: Annotation, Metrics and Analysis , 2013, 2013 IEEE International Conference on Computer Vision.

[31]  Gabriel J. Brostow,et al.  Learning to find occlusion regions , 2011, CVPR 2011.

[32]  Alexei A. Efros,et al.  Recovering Occlusion Boundaries from a Single Image , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[33]  Jitendra Malik,et al.  Learning to detect natural image boundaries using local brightness, color, and texture cues , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Kang Zheng,et al.  Combining local appearance and holistic view: Dual-Source Deep Neural Networks for human pose estimation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[36]  Nikos Komodakis,et al.  Markov Random Field modeling, inference & learning in computer vision & image understanding: A survey , 2013, Comput. Vis. Image Underst..

[37]  Nando de Freitas,et al.  A Statistical Model for General Contextual Object Recognition , 2004, ECCV.

[38]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[39]  C. Lawrence Zitnick,et al.  Structured Forests for Fast Edge Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[40]  Jitendra Malik,et al.  Large Displacement Optical Flow: Descriptor Matching in Variational Motion Estimation , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  B. S. Manjunath,et al.  Probabilistic occlusion boundary detection on spatio-temporal lattices , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[42]  David J. Fleet,et al.  Probabilistic Detection and Tracking of Motion Boundaries , 2000, International Journal of Computer Vision.

[43]  Tai-Pang Wu,et al.  Video repairing under variable illumination using cyclic motions , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  Cordelia Schmid,et al.  DeepFlow: Large Displacement Optical Flow with Deep Matching , 2013, 2013 IEEE International Conference on Computer Vision.

[45]  Alan L. Yuille,et al.  Occlusion Boundary Detection Using Pseudo-depth , 2010, ECCV.

[46]  Charless C. Fowlkes,et al.  Contour Detection and Hierarchical Image Segmentation , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[47]  Stefano Soatto,et al.  Detachable Object Detection: Segmentation and Depth Ordering from Short-Baseline Video , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[48]  Jitendra Malik,et al.  Occlusion boundary detection and figure/ground assignment from optical flow , 2011, CVPR 2011.

[49]  Michael J. Black Combining Intensity and Motion for Incremental Segmentation and Tracking Over Long Image Sequences , 1992, ECCV.

[50]  M. Opper,et al.  Comparing the Mean Field Method and Belief Propagation for Approximate Inference in MRFs , 2001 .

[51]  Daphna Weinshall,et al.  Motion Segmentation and Depth Ordering Using an Occlusion Detector , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[52]  Ashutosh Saxena,et al.  Learning Depth from Single Monocular Images , 2005, NIPS.

[53]  Heesoo Myeong,et al.  Learning object relationships via graph-based context model , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[54]  Fei-Fei Li,et al.  Modeling mutual context of object and human pose in human-object interaction activities , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.