3D Traffic Scene Understanding From Movable Platforms

In this paper, we present a novel probabilistic generative model for multi-object traffic scene understanding from movable platforms which reasons jointly about the 3D scene layout as well as the location and orientation of objects in the scene. In particular, the scene topology, geometry, and traffic activities are inferred from short video sequences. Inspired by the impressive driving capabilities of humans, our model does not rely on GPS, lidar, or map knowledge. Instead, it takes advantage of a diverse set of visual cues in the form of vehicle tracklets, vanishing points, semantic scene labels, scene flow, and occupancy grids. For each of these cues, we propose likelihood functions that are integrated into a probabilistic generative model. We learn all model parameters from training data using contrastive divergence. Experiments conducted on videos of 113 representative intersections show that our approach successfully infers the correct layout in a variety of very challenging scenarios. To evaluate the importance of each feature cue, experiments using different feature combinations are conducted. Furthermore, we show how by employing context derived from the proposed method we are able to improve over the state-of-the-art in terms of object detection and object orientation estimation in challenging and cluttered urban environments.

[1]  Alberto Broggi,et al.  Autonomous vehicles control in the VisLab Intercontinental Autonomous Challenge , 2012, Annu. Rev. Control..

[2]  Alexei A. Efros,et al.  Putting Objects in Perspective , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[3]  Luc Van Gool,et al.  Robust Multiperson Tracking from a Mobile Platform , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Sebastian Thrun,et al.  Probabilistic robotics , 2002, CACM.

[5]  Andreas Geiger,et al.  Probabilistic Models for 3D Urban Scene Understanding from Movable Platforms , 2013 .

[6]  Bernt Schiele,et al.  Monocular Visual Scene Understanding: Understanding Multi-Object Traffic Scenes , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Christopher Rasmussen,et al.  Road shape classification for detecting and negotiating intersections , 2003, IEEE IV2003 Intelligent Vehicles Symposium. Proceedings (Cat. No.03TH8683).

[8]  J. Geisler,et al.  ROMA - a system for model-based analysis of road markings , 1995, Proceedings of the Intelligent Vehicles '95. Symposium.

[9]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Dean A. Pomerleau,et al.  RALPH: rapidly adapting lateral position handler , 1995, Proceedings of the Intelligent Vehicles '95. Symposium.

[11]  C. Stiller,et al.  Team AnnieWAYs entry to the Grand Cooperative Driving Challenge 2011 , 2012 .

[12]  Luc Van Gool,et al.  Coupled Object Detection and Tracking from Static Cameras and Moving Vehicles , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  W. Eric L. Grimson,et al.  Unsupervised Activity Perception by Hierarchical Bayesian Models , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Arnold W. M. Smeulders,et al.  Stages as Models of Scene Geometry , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Jana Kosecka,et al.  Acquiring semantics induced topology in urban environments , 2012, 2012 IEEE International Conference on Robotics and Automation.

[16]  E.D. Dickmanns,et al.  EMS-vision: recognition of intersections on unmarked road networks , 2000, Proceedings of the IEEE Intelligent Vehicles Symposium 2000 (Cat. No.00TH8511).

[17]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Andreas Geiger,et al.  Lost! Leveraging the Crowd for Probabilistic Visual Self-Localization , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Dariu Gavrila,et al.  A Bayesian, Exemplar-Based Approach to Hierarchical Shape Matching , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Ashutosh Saxena,et al.  3-D Depth Reconstruction from a Single Still Image , 2007, International Journal of Computer Vision.

[21]  Sergiu Nedevschi,et al.  Probabilistic Lane Tracking in Difficult Road Scenarios Using Stereovision , 2009, IEEE Transactions on Intelligent Transportation Systems.

[22]  Andreas Geiger,et al.  Efficient Large-Scale Stereo Matching , 2010, ACCV.

[23]  Luc Van Gool,et al.  What's going on? Discovering spatio-temporal dependencies in dynamic scenes , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[24]  Julius Ziegler,et al.  StereoScan: Dense 3d reconstruction in real-time , 2011, 2011 IEEE Intelligent Vehicles Symposium (IV).

[25]  Alexei A. Efros,et al.  Recovering Surface Layout from an Image , 2007, International Journal of Computer Vision.

[26]  Bernt Schiele,et al.  A Dynamic Conditional Random Field Model for Joint Labeling of Object and Scene Classes , 2008, ECCV.

[27]  Luis E. Ortiz,et al.  Who are you with and where are you going? , 2011, CVPR 2011.

[28]  Amnon Shashua,et al.  Off-road Path Following using Region Classification and Geometric Projection Constraints , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[29]  Pushmeet Kohli,et al.  Geometric Image Parsing in Man-Made Environments , 2010, International Journal of Computer Vision.

[30]  Martial Hebert,et al.  A hierarchical field framework for unified context-based classification , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[31]  Mohamed Aly,et al.  Real time detection of lane markers in urban streets , 2008, 2008 IEEE Intelligent Vehicles Symposium.

[32]  Dariu Gavrila,et al.  Multi-cue Pedestrian Detection and Tracking from a Moving Vehicle , 2007, International Journal of Computer Vision.

[33]  Harold W. Kuhn,et al.  The Hungarian method for the assignment problem , 1955, 50 Years of Integer Programming.

[34]  Alexei A. Efros,et al.  Blocks World Revisited: Image Understanding Using Qualitative Geometry and Mechanics , 2010, ECCV.

[35]  Silvio Savarese,et al.  Toward coherent object detection and scene layout understanding , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[36]  Sebastian Thrun,et al.  Self-supervised Monocular Road Detection in Desert Terrain , 2006, Robotics: Science and Systems.

[37]  Silvio Savarese,et al.  A Unified Framework for Multi-target Tracking and Collective Activity Recognition , 2012, ECCV.

[38]  W. Eric L. Grimson,et al.  Unsupervised Activity Perception in Crowded and Complicated Scenes Using Hierarchical Bayesian Models , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  H.-H. Nagel,et al.  Model-based recognition of intersections and lane structures , 1995, Proceedings of the Intelligent Vehicles '95. Symposium.

[40]  David A. Forsyth,et al.  Finding and tracking people from the bottom up , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[41]  Mohan M. Trivedi,et al.  Video-based lane estimation and tracking for driver assistance: survey, system, and evaluation , 2006, IEEE Transactions on Intelligent Transportation Systems.

[42]  Martin Lauer,et al.  Team AnnieWAY's Entry to the 2011 Grand Cooperative Driving Challenge , 2012, IEEE Transactions on Intelligent Transportation Systems.

[43]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[44]  Nando de Freitas,et al.  An Introduction to MCMC for Machine Learning , 2004, Machine Learning.

[45]  T. Başar,et al.  A New Approach to Linear Filtering and Prediction Problems , 2001 .

[46]  Charles E. Thorpe,et al.  SCARF: a color vision system that tracks roads and intersections , 1993, IEEE Trans. Robotics Autom..

[47]  Gérard G. Medioni,et al.  Multiple-Target Tracking by Spatiotemporal Monte Carlo Markov Chain Data Association , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[48]  Charless C. Fowlkes,et al.  Discriminative Models for Multi-Class Object Layout , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[49]  Martin Lauer,et al.  A generative model for 3D urban scene understanding from movable platforms , 2011, CVPR 2011.

[50]  Michalis E. Zervakis,et al.  A survey of video processing techniques for traffic applications , 2003, Image Vis. Comput..

[51]  Peirong Ji StereoScan : Dense 3 D Reconstruction in Real-time , 2016 .

[52]  Luc Van Gool,et al.  Robust tracking-by-detection using a detector confidence particle filter , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[53]  C. R. Deboor,et al.  A practical guide to splines , 1978 .

[54]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[55]  Andreas Geiger,et al.  Joint 3D Estimation of Objects and Scene Layout , 2011, NIPS.

[56]  Frank Dellaert,et al.  MCMC-based particle filtering for tracking a variable number of interacting targets , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[57]  Silvio Savarese,et al.  Multiple Target Tracking in World Coordinate with Single, Minimally Calibrated Camera , 2010, ECCV.

[58]  Luc Van Gool,et al.  Segmentation-Based Urban Traffic Scene Understanding , 2009, BMVC.

[59]  Bernt Schiele,et al.  Monocular 3D pose estimation and tracking by detection , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[60]  Ernst D. Dickmanns,et al.  Recursive 3-D Road and Relative Ego-State Recognition , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[61]  Qingquan Li,et al.  3D LIDAR point cloud based intersection recognition for autonomous driving , 2012, 2012 IEEE Intelligent Vehicles Symposium.

[62]  Wei Zhang,et al.  Video Compass , 2002, ECCV.

[63]  Theo Gevers,et al.  3D Scene priors for road detection , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[64]  Antonio Torralba,et al.  Sharing features: efficient boosting procedures for multiclass object detection , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..