Optical Flow with Semantic Segmentation and Localized Layers

Existing optical flow methods make generic, spatially homogeneous, assumptions about the spatial structure of the flow. In reality, optical flow varies across an image depending on object class. Simply put, different objects move differently. Here we exploit recent advances in static semantic scene segmentation to segment the image into objects of different types. We define different models of image motion in these regions depending on the type of object. For example, we model the motion on roads with homographies, vegetation with spatially smooth flow, and independently moving objects like cars and planes with affine motion plus deviations. We then pose the flow estimation problem using a novel formulation of localized layers, which addresses limitations of traditional layered models for dealing with complex scene motion. Our semantic flow method achieves the lowest error of any published monocular method in the KITTI-2015 flow benchmark and produces qualitatively better flow and segmentation than recent top methods on a wide range of natural videos.

[1]  Michael J. Black,et al.  Skin and bones: multi-layer, locally affine, optical flow and regularization with transparency , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[2]  Xuming He,et al.  Multiclass semantic video segmentation with object-level active inference , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  William B. Thompson,et al.  Exploiting Discontinuities in Optical Flow , 1998, International Journal of Computer Vision.

[4]  Tony X. Han,et al.  Ensemble Video Object Cut in Highly Dynamic Scenes , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Andrew Zisserman,et al.  Learning Layered Motion Segmentations of Video , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[6]  Luc Van Gool,et al.  Object Flow: Learning Object Displacement , 2010, ACCV Workshops.

[7]  Vladlen Koltun,et al.  Full Flow: Optical Flow Estimation By Global Optimization over Regular Grids , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[9]  Andreas Geiger,et al.  Displets: Resolving stereo ambiguities using object knowledge , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Jitendra Malik,et al.  Occlusion boundary detection and figure/ground assignment from optical flow , 2011, CVPR 2011.

[11]  Sanja Fidler,et al.  The Role of Context for Object Detection and Semantic Segmentation in the Wild , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Michael J. Black,et al.  A Fully-Connected Layered Model of Foreground and Background Flow , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Brian Taylor,et al.  Semantic Video Segmentation from Occlusion Relations within a Convex Optimization Framework , 2013, EMMCVPR.

[14]  Iasonas Kokkinos,et al.  Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs , 2014, ICLR.

[15]  David W. Murray,et al.  Scene Segmentation from Visual Motion Using Global Optimization , 1987, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Michael J. Black,et al.  Mixture models for optical flow computation , 1993, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Stefan Roth,et al.  Stixmantics: A Medium-Level Model for Real-Time Semantic Scene Understanding , 2014, ECCV.

[18]  David J. Fleet,et al.  Probabilistic Detection and Tracking of Motion Boundaries , 2000, International Journal of Computer Vision.

[19]  Yair Weiss,et al.  Smoothness in layers: Motion segmentation using nonparametric mixture estimation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[20]  Patrick Pérez,et al.  Dense estimation and object-based segmentation of the optical flow with robust techniques , 1998, IEEE Trans. Image Process..

[21]  Edward H. Adelson,et al.  On seeing stuff: the perception of materials by humans and machines , 2001, IS&T/SPIE Electronic Imaging.

[22]  Daphne Koller,et al.  Learning Spatial Context: Using Stuff to Find Things , 2008, ECCV.

[23]  Michael J. Black,et al.  Estimating Optical Flow in Segmented Images Using Variable-Order Parametric Models With Local Deformations , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[24]  Richard Szeliski,et al.  A Database and Evaluation Methodology for Optical Flow , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[25]  Jiaolong Yang,et al.  Dense, accurate optical flow estimation with piecewise parametric model , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Andreas Geiger,et al.  Object scene flow for autonomous vehicles , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Carsten Rother,et al.  FusionFlow: Discrete-continuous optimization for optical flow estimation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[29]  Michael J. Black,et al.  Efficient sparse-to-dense optical flow estimation using a learned basis and layers , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Jitendra Malik,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence Segmentation of Moving Objects by Long Term Video Analysis , 2022 .

[31]  Harpreet S. Sawhney,et al.  Layered representation of motion video using robust maximum-likelihood estimation of mixture models and MDL encoding , 1995, Proceedings of IEEE International Conference on Computer Vision.

[32]  Vladlen Koltun,et al.  Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials , 2011, NIPS.

[33]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Ce Liu,et al.  Scene Collaging: Analysis and Synthesis of Natural Images with Semantic Layers , 2013, 2013 IEEE International Conference on Computer Vision.

[35]  Michael J. Black,et al.  Layered segmentation and optical flow estimation over time , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[36]  Vladlen Koltun,et al.  Efficient Nonlocal Regularization for Optical Flow , 2012, ECCV.

[37]  Deqing Sun,et al.  Local Layering for Joint Motion Estimation and Occlusion Detection , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[38]  Raquel Urtasun,et al.  Robust Monocular Epipolar Flow Estimation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[39]  W. F. Clocksin,et al.  Joint Optimization for Object Class Segmentation and Dense Stereo Reconstruction , 2012, International Journal of Computer Vision.

[40]  Hai Tao,et al.  A background layer model for object tracking through occlusion , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[41]  Cordelia Schmid,et al.  EpicFlow: Edge-preserving interpolation of correspondences for optical flow , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Stéphane Mallat,et al.  A Wavelet Tour of Signal Processing - The Sparse Way, 3rd Edition , 2008 .

[43]  Marc Pollefeys,et al.  Joint 3D Scene Reconstruction and Class Segmentation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[44]  Edward H. Adelson,et al.  Representing moving images with layers , 1994, IEEE Trans. Image Process..

[45]  P. Anandan,et al.  Accurate computation of optical flow by using layered motion representations , 1994, Proceedings of 12th International Conference on Pattern Recognition.

[46]  Brendan J. Frey,et al.  Learning flexible sprites in video layers , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[47]  Andreas Geiger,et al.  Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[48]  Michael J. Black,et al.  A Naturalistic Open Source Movie for Optical Flow Evaluation , 2012, ECCV.

[49]  P. Anandan,et al.  A unified approach to moving object detection in 2D and 3D scenes , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[50]  Mei Han,et al.  Efficient hierarchical graph-based video segmentation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[51]  Chenliang Xu,et al.  Streaming Hierarchical Video Segmentation , 2012, ECCV.

[52]  Christian Heipke,et al.  Discrete Optimization for Optical Flow , 2015, GCPR.

[53]  David J. Fleet,et al.  A Layered Motion Representation with Occlusion and Compact Spatial Support , 2002, ECCV.

[54]  Martial Hebert,et al.  Occlusion Boundaries from Motion: Low-Level Detection and Mid-Level Reasoning , 2009, International Journal of Computer Vision.

[55]  Nikos Paragios,et al.  Segmentation, ordering and multi-object tracking using graphical models , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[56]  E. Adelson,et al.  The Plenoptic Function and the Elements of Early Vision , 1991 .

[57]  A. Pentland,et al.  Robust estimation of a multi-layered motion representation , 1991, Proceedings of the IEEE Workshop on Visual Motion.

[58]  Marc Pollefeys,et al.  Learning a Confidence Measure for Optical Flow , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[59]  Thomas Brox,et al.  A Unified Video Segmentation Benchmark: Annotation, Metrics and Analysis , 2013, 2013 IEEE International Conference on Computer Vision.