论文信息 - Temporal Interpolation as an Unsupervised Pretraining Task for Optical Flow Estimation

Temporal Interpolation as an Unsupervised Pretraining Task for Optical Flow Estimation

The difficulty of annotating training data is a major obstacle to using CNNs for low-level tasks in video. Synthetic data often does not generalize to real videos, while unsupervised methods require heuristic losses. Proxy tasks can overcome these issues, and start by training a network for a task for which annotation is easier or which can be trained unsupervised. The trained network is then fine-tuned for the original task using small amounts of ground truth data. Here, we investigate frame interpolation as a proxy task for optical flow. Using real movies, we train a CNN unsupervised for temporal interpolation. Such a network implicitly estimates motion, but cannot handle untextured regions. By fine-tuning on small amounts of ground truth flow, the network can learn to fill in homogeneous regions and compute full optical flow fields. Using this unsupervised pre-training, our network outperforms similar architectures that were trained supervised using synthetic optical flow.

Michael J. Black | Jonas Wulff | Jonas Wulff

[1] Gregory Shakhnarovich,et al. Colorization as a Proxy Task for Visual Understanding , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2] Noah Snavely,et al. Unsupervised Learning of Depth and Ego-Motion from Video , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3] Hongdong Li,et al. Learning Image Matching by Simply Watching Video , 2016, ECCV.

[4] Jia Xu,et al. Accurate Optical Flow via Direct Cost Volume Processing , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5] Thomas Brox,et al. FlowNet: Learning Optical Flow with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[6] Bingbing Ni,et al. Unsupervised Deep Learning for Optical Flow Estimation , 2017, AAAI.

[7] Thomas Brox,et al. A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8] Vladlen Koltun,et al. Playing for Benchmarks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[9] Oisin Mac Aodha,et al. Unsupervised Monocular Depth Estimation with Left-Right Consistency , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10] Ming-Hsuan Yang,et al. Semi-Supervised Learning for Optical Flow with Generative Adversarial Networks , 2017, NIPS.

[11] Ioannis Patras,et al. Unsupervised convolutional neural networks for motion estimation , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[12] Andreas Geiger,et al. Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[13] Michael J. Black,et al. A Naturalistic Open Source Movie for Optical Flow Evaluation , 2012, ECCV.

[14] Thomas Brox,et al. FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15] Chen Sun,et al. Revisiting Unreasonable Effectiveness of Data in Deep Learning Era , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[16] Christian Heipke,et al. Discrete Optimization for Optical Flow , 2015, GCPR.

[17] Jan Kautz,et al. PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[18] Andrew Zisserman,et al. Spatial Transformer Networks , 2015, NIPS.

[19] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[20] Antonio Torralba,et al. Anticipating Visual Representations from Unlabeled Video , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21] Michael J. Black,et al. Slow Flow: Exploiting High-Speed Cameras for Accurate and Diverse Optical Flow Reference Data , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22] Michael Möller,et al. Learning Proximal Operators: Using Denoising Networks for Regularizing Inverse Imaging Problems , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[23] Eero P. Simoncelli,et al. Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[24] Michael J. Black,et al. Optical Flow Estimation Using a Spatial Pyramid Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25] Konstantinos G. Derpanis,et al. Back to Basics: Unsupervised Learning of Optical Flow via Brightness Constancy and Motion Smoothness , 2016, ECCV Workshops.

[26] Michael J. Black,et al. A Quantitative Analysis of Current Practices in Optical Flow Estimation and the Principles Behind Them , 2013, International Journal of Computer Vision.

[27] Cordelia Schmid,et al. Self-Supervised Learning of Structure and Motion from Video , 2017 .

[28] Alexei A. Efros,et al. Unsupervised Visual Representation Learning by Context Prediction , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[29] Simone Calderara,et al. TransFlow: Unsupervised Motion Flow by Joint Geometric and Pixel-level Estimation , 2017, ArXiv.

[30] Nima Sedaghat. Next-Flow: Hybrid Multi-Tasking with Next-Frame Prediction to Boost Optical-Flow Estimation in the Wild , 2016, ArXiv.

[31] Edward H. Adelson,et al. Human-assisted motion annotation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[32] Thomas Brox,et al. Hybrid Learning of Optical Flow and Next Frame Prediction to Boost Optical Flow in the Wild , 2017 .

[33] Michael J. Black,et al. Layered segmentation and optical flow estimation over time , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[34] Feng Liu,et al. Video Frame Interpolation via Adaptive Separable Convolution , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[35] Francisco Angel Moreno,et al. The Málaga urban dataset: High-rate stereo and LiDAR in a realistic urban scenario , 2014, Int. J. Robotics Res..

[36] Andreas Geiger,et al. Deep Discrete Flow , 2016, ACCV.

[37] Stefan Roth,et al. UnFlow: Unsupervised Learning of Optical Flow with a Bidirectional Census Loss , 2017, AAAI.