FlowNet: Learning Optical Flow with Convolutional Networks

Convolutional neural networks (CNNs) have recently been very successful in a variety of computer vision tasks, especially on those linked to recognition. Optical flow estimation has not been among the tasks CNNs succeeded at. In this paper we construct CNNs which are capable of solving the optical flow estimation problem as a supervised learning task. We propose and compare two architectures: a generic architecture and another one including a layer that correlates feature vectors at different image locations. Since existing ground truth data sets are not sufficiently large to train a CNN, we generate a large synthetic Flying Chairs dataset. We show that networks trained on this unrealistic data still generalize very well to existing datasets such as Sintel and KITTI, achieving competitive accuracy at frame rates of 5 to 10 fps.

[1]  Berthold K. P. Horn,et al.  Determining Optical Flow , 1981, Other Conferences.

[2]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[3]  David J. Fleet,et al.  Learning parameterized models of image motion , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[4]  Patrick Pérez,et al.  Dense estimation and object-based segmentation of the optical flow with robust techniques , 1998, IEEE Trans. Image Process..

[5]  Thomas Brox,et al.  High Accuracy Optical Flow Estimation Based on a Theory for Warping , 2004, ECCV.

[6]  Richard Szeliski,et al.  A Database and Evaluation Methodology for Optical Flow , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[7]  Michael J. Black,et al.  Learning Optical Flow , 2008, ECCV.

[8]  Alexei A. Efros,et al.  IM2GPS: estimating geographic information from a single image , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Daniel Cremers,et al.  Structure- and motion-adaptive regularization for high accuracy optic flow , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[10]  Yann LeCun,et al.  Convolutional Learning of Spatio-temporal Features , 2010, ECCV.

[11]  Graham W. Taylor,et al.  Adaptive deconvolutional networks for mid and high level feature learning , 2011, 2011 International Conference on Computer Vision.

[12]  Jitendra Malik,et al.  Large Displacement Optical Flow: Descriptor Matching in Variational Motion Estimation , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Luca Maria Gambardella,et al.  Deep Neural Networks Segment Neuronal Membranes in Electron Microscopy Images , 2012, NIPS.

[14]  Cristian Sminchisescu,et al.  Efficient Closed-Form Solution to Generalized Boundary Detection , 2012, ECCV.

[15]  Michael J. Black,et al.  A Naturalistic Open Source Movie for Optical Flow Evaluation , 2012, ECCV.

[16]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[17]  Roland Memisevic,et al.  Unsupervised learning of depth and motion , 2013, ArXiv.

[18]  Camille Couprie,et al.  Learning Hierarchical Features for Scene Labeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Cordelia Schmid,et al.  DeepFlow: Large Displacement Optical Flow with Deep Matching , 2013, 2013 IEEE International Conference on Computer Vision.

[20]  Andreas Geiger,et al.  Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[21]  Cristian Sminchisescu,et al.  Locally Affine Sparse-to-Dense Matching for Motion and Occlusion Estimation , 2013, 2013 IEEE International Conference on Computer Vision.

[22]  Yair Weiss,et al.  Learning the Local Statistics of Optical Flow , 2013, NIPS.

[23]  Rob Fergus,et al.  Depth Map Prediction from a Single Image using a Multi-Scale Deep Network , 2014, NIPS.

[24]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[25]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Hailin Jin,et al.  Fast Edge-Preserving PatchMatch for Large Displacement Optical Flow , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[28]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[29]  Alexei A. Efros,et al.  Seeing 3D Chairs: Exemplar Part-Based 2D-3D Alignment Using a Large Dataset of CAD Models , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Camillo J. Taylor,et al.  Optical Flow with Geometric Occlusion Estimation and Fusion of Multiple Frames , 2014, EMMCVPR.

[31]  Thomas Brox,et al.  Descriptor Matching with Convolutional Neural Networks: a Comparison to SIFT , 2014, ArXiv.

[32]  Victor S. Lempitsky,et al.  N^4 -Fields: Neural Network Nearest Neighbor Fields for Image Transforms , 2014, ACCV.

[33]  Yann LeCun,et al.  Computing the stereo matching cost with a convolutional neural network , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Jitendra Malik,et al.  Hypercolumns for object segmentation and fine-grained localization , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[36]  Thomas Brox,et al.  Learning to generate chairs with convolutional neural networks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Cordelia Schmid,et al.  EpicFlow: Edge-preserving interpolation of correspondences for optical flow , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.