Variational State-Space Models for Localisation and Dense 3D Mapping in 6 DoF

We solve the problem of 6-DoF localisation and 3D dense reconstruction in spatial environments as approximate Bayesian inference in a deep generative approach which combines learned with engineered models. This principled treatment of uncertainty and probabilistic inference overcomes the shortcoming of current state-of-the-art solutions to rely on heavily engineered, heterogeneous pipelines. Variational inference enables us to use neural networks for system identification, while a differentiable raycaster is used for the emission model. This ensures that our model is amenable to end-to-end gradient-based optimisation. We evaluate our approach on realistic unmanned aerial vehicle flight data, nearing the performance of a state-of-the-art visual inertial odometry system. The applicability of the learned model to downstream tasks such as generative prediction and planning is investigated.

[1]  Stefan Leutenegger,et al.  SceneCode: Monocular Dense Semantic Reconstruction Using Learned Encoded Scene Representations , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[3]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[4]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[5]  Juan D. Tardós,et al.  ORB-SLAM2: An Open-Source SLAM System for Monocular, Stereo, and RGB-D Cameras , 2016, IEEE Transactions on Robotics.

[6]  Maryam Fatemi,et al.  Variational Bayesian EM for SLAM , 2015, 2015 IEEE 6th International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP).

[7]  Wolfram Burgard,et al.  OctoMap: an efficient probabilistic 3D mapping framework based on octrees , 2013, Autonomous Robots.

[8]  Hugh Durrant-Whyte,et al.  Localization of Autonomous Guided Vehicles , 1996 .

[9]  Fabio Viola,et al.  Generative Temporal Models with Spatial Memory for Partially Observed Environments , 2018, ICML.

[10]  Ethan Rublee,et al.  ORB: An efficient alternative to SIFT or SURF , 2011, 2011 International Conference on Computer Vision.

[11]  Davide Scaramuzza,et al.  A Tutorial on Quantitative Trajectory Evaluation for Visual(-Inertial) Odometry , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[12]  Juan Carlos Niebles,et al.  Learning to Decompose and Disentangle Representations for Video Prediction , 2018, NeurIPS.

[13]  Ganesh Iyer,et al.  ∇SLAM: Dense SLAM meets Automatic Differentiation , 2019, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[14]  Dorian Gálvez-López,et al.  Bags of Binary Words for Fast Place Recognition in Image Sequences , 2012, IEEE Transactions on Robotics.

[15]  Lennart Svensson,et al.  Variational Bayesian Expectation Maximization for Radar Map Estimation , 2016, IEEE Transactions on Signal Processing.

[16]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[17]  Sertac Karaman,et al.  The Blackbird Dataset: A large-scale dataset for UAV perception in aggressive flight , 2018, ISER.

[18]  Davide Scaramuzza,et al.  VIMO: Simultaneous Visual Inertial Model-Based Odometry and Force Estimation , 2019, IEEE Robotics and Automation Letters.

[19]  Sebastian Thrun,et al.  FastSLAM: a factored solution to the simultaneous localization and mapping problem , 2002, AAAI/IAAI.

[20]  Davide Scaramuzza,et al.  SVO: Fast semi-direct monocular visual odometry , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[21]  Wolfram Burgard,et al.  Probabilistic Robotics (Intelligent Robotics and Autonomous Agents) , 2005 .

[22]  Wolfram Burgard,et al.  Neural SLAM: Learning to Explore with External Memory , 2017, 1706.09520.

[23]  Andrew W. Fitzgibbon,et al.  KinectFusion: Real-time dense surface mapping and tracking , 2011, 2011 10th IEEE International Symposium on Mixed and Augmented Reality.

[24]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[25]  Shaojie Shen,et al.  VINS-Mono: A Robust and Versatile Monocular Visual-Inertial State Estimator , 2017, IEEE Transactions on Robotics.

[26]  Wolfram Burgard,et al.  Improving Grid-based SLAM with Rao-Blackwellized Particle Filters by Adaptive Proposals and Selective Resampling , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.

[27]  Justin Bayer,et al.  Approximate Bayesian inference in spatial environments , 2018, Robotics: Science and Systems.

[28]  Fabio Tozeto Ramos,et al.  Bayesian Hilbert Maps for Dynamic Continuous Occupancy Mapping , 2017, CoRL.

[29]  Stefan Leutenegger,et al.  CodeSLAM - Learning a Compact, Optimisable Representation for Dense Visual SLAM , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[30]  Patrick van der Smagt,et al.  Unsupervised Real-Time Control Through Variational Empowerment , 2017, ISRR.

[31]  Jörg Stückler,et al.  Deep Virtual Stereo Odometry: Leveraging Deep Depth Prediction for Monocular Direct Sparse Odometry , 2018, ECCV.

[32]  John J. Leonard,et al.  Past, Present, and Future of Simultaneous Localization and Mapping: Toward the Robust-Perception Age , 2016, IEEE Transactions on Robotics.

[33]  Julien Cornebise,et al.  Weight Uncertainty in Neural Networks , 2015, ArXiv.

[34]  Xiaoyue Jiang,et al.  Robust Linear-Complexity Approach to Full SLAM Problems: Stochastic Variational Bayes Inference , 2019, 2019 IEEE 90th Vehicular Technology Conference (VTC2019-Fall).

[35]  Ole Winther,et al.  Sequential Neural Models with Stochastic Layers , 2016, NIPS.

[36]  Dustin Tran,et al.  Simple, Distributed, and Accelerated Probabilistic Programming , 2018, NeurIPS.

[37]  Yoshua Bengio,et al.  Large-Scale Learning of Embeddings with Reconstruction Sampling , 2011, ICML.

[38]  Yee Whye Teh,et al.  Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects , 2018, NeurIPS.

[39]  Andrew J. Davison,et al.  DTAM: Dense tracking and mapping in real-time , 2011, 2011 International Conference on Computer Vision.

[40]  Thomas Brox,et al.  DeepTAM: Deep Tracking and Mapping , 2018, ECCV.

[41]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[42]  Kevin P. Murphy,et al.  Bayesian Map Learning in Dynamic Environments , 1999, NIPS.

[43]  Christopher Hunt,et al.  Notes on the OpenSURF Library , 2009 .

[44]  Hans P. Moravec,et al.  High resolution maps from wide angle sonar , 1985, Proceedings. 1985 IEEE International Conference on Robotics and Automation.

[45]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[46]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Maximilian Karl,et al.  Deep Variational Bayes Filters: Unsupervised Learning of State Space Models from Raw Data , 2016, ICLR.

[48]  Daniel Cremers,et al.  Direct Sparse Odometry , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[49]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[50]  Justin Bayer,et al.  Variational Tracking and Prediction with Generative Disentangled State-Space Models , 2019, ArXiv.

[51]  Andrew W. Fitzgibbon,et al.  Bundle Adjustment - A Modern Synthesis , 1999, Workshop on Vision Algorithms.