Intrinsic Depth: Improving Depth Transfer with Intrinsic Images

We formulate the estimation of dense depth maps from video sequences as a problem of intrinsic image estimation. Our approach synergistically integrates the estimation of multiple intrinsic images including depth, albedo, shading, optical flow, and surface contours. We build upon an example-based framework for depth estimation that uses label transfer from a database of RGB and depth pairs. We combine this with a method that extracts consistent albedo and shading from video. In contrast to raw RGB values, albedo and shading provide a richer, more physical, foundation for depth transfer. Additionally we train a new contour detector to predict surface boundaries from albedo, shading, and pixel values and use this to improve the estimation of depth boundaries. We also integrate sparse structure from motion with our method to improve the metric accuracy of the estimated depth maps. We evaluate our Intrinsic Depth method quantitatively by estimating depth from videos in the NYU RGB-D and SUN3D datasets. We find that combining the estimation of multiple intrinsic images improves depth estimation relative to the baseline method.

[1]  Michael J. Black,et al.  A Quantitative Analysis of Current Practices in Optical Flow Estimation and the Principles Behind Them , 2013, International Journal of Computer Vision.

[2]  Ronen Basri,et al.  Example Based 3D Reconstruction from Single 2D Images , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[3]  Guosheng Lin,et al.  Deep convolutional neural fields for depth estimation from a single image , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Steven M. Seitz,et al.  Multicore bundle adjustment , 2011, CVPR 2011.

[5]  Mei Han,et al.  Efficient hierarchical graph-based video segmentation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[6]  Tsuhan Chen,et al.  Toward Holistic Scene Understanding: Feedback Enabled Cascaded Classification Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Ashutosh Saxena,et al.  3-D Depth Reconstruction from a Single Still Image , 2007, International Journal of Computer Vision.

[8]  Hujun Bao,et al.  Robust Bilayer Segmentation and Motion/Depth Estimation with a Handheld Camera , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Andrew Owens,et al.  SUN3D: A Database of Big Spaces Reconstructed Using SfM and Object Labels , 2013, 2013 IEEE International Conference on Computer Vision.

[10]  Meng Wang,et al.  2D-to-3D image conversion by learning depth from examples , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[11]  Jitendra Malik,et al.  Intrinsic Scene Properties from a Single RGB-D Image , 2013, CVPR.

[12]  Ce Liu,et al.  Exploring new representations and applications for motion analysis , 2009 .

[13]  Andrew J. Davison,et al.  DTAM: Dense tracking and mapping in real-time , 2011, 2011 International Conference on Computer Vision.

[14]  H. Barrow,et al.  RECOVERING INTRINSIC SCENE CHARACTERISTICS FROM IMAGES , 1978 .

[15]  Qionghai Dai,et al.  Intrinsic video and applications , 2014, ACM Trans. Graph..

[16]  Cordelia Schmid,et al.  DeepFlow: Large Displacement Optical Flow with Deep Matching , 2013, 2013 IEEE International Conference on Computer Vision.

[17]  Ce Liu,et al.  Depth Extraction from Video Using Non-parametric Sampling , 2012, ECCV.

[18]  Hans-Peter Seidel,et al.  A perceptual model for disparity , 2011, ACM Trans. Graph..

[19]  Hujun Bao,et al.  Consistent Depth Maps Recovery from a Video Sequence , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Vladlen Koltun,et al.  A Simple Model for Intrinsic Image Decomposition with Depth Cues , 2013, 2013 IEEE International Conference on Computer Vision.

[21]  Ashutosh Saxena,et al.  Learning the right model: Efficient max-margin learning in Laplacian CRFs , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Michael J. Black,et al.  A Naturalistic Open Source Movie for Optical Flow Evaluation , 2012, ECCV.

[23]  Derek Hoiem,et al.  Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[24]  Cordelia Schmid,et al.  EpicFlow: Edge-preserving interpolation of correspondences for optical flow , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Seungyong Lee,et al.  Intrinsic Image Decomposition Using Structure-Texture Separation and Surface Normals , 2014, ECCV.

[26]  Stephen Gould,et al.  Single image depth estimation from predicted semantic labels , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[27]  Jitendra Malik,et al.  Perceptual Organization and Recognition of Indoor Scenes from RGB-D Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[29]  Antonio Torralba,et al.  SIFT Flow: Dense Correspondence across Scenes and Its Applications , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  C. Lawrence Zitnick,et al.  Fast Edge Detection Using Structured Forests , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Peter V. Gehler,et al.  Intrinsic Video , 2014, ECCV.

[32]  Sylvain Paris,et al.  Reflectance and Illumination Video Editing using Fast User-Guided Intrinsic Decomposition , 2014 .