Efficient Sampling in POMDPs with Lipschitz Bandits for Motion Planning in Continuous Spaces

Decision making under uncertainty can be framed as a partially observable Markov decision process (POMDP). Finding exact solutions of POMDPs is generally computationally intractable, but the solution can be approximated by sampling-based approaches. These sampling-based POMDP solvers rely on multi-armed bandit (MAB) heuristics, which assume the outcomes of different actions to be uncorrelated. In some applications, like motion planning in continuous spaces, similar actions yield similar outcomes. In this paper, we utilize variants of MAB heuristics that make Lipschitz continuity assumptions on the outcomes of actions to improve the efficiency of sampling-based planning approaches. We demonstrate the effectiveness of this approach in the context of motion planning for automated driving.

[1]  Adam D. Bull,et al.  Adaptive-treed bandits , 2013, 1302.2489.

[2]  Helbing,et al.  Congested traffic states in empirical observations and microscopic simulations , 2000, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[3]  Hanna Kurniawati,et al.  CEMAB: A Cross-Entropy-based Method for Large-Scale Multi-Armed Bandits , 2017, ACALCI.

[4]  Marcelo H. Ang,et al.  An online approach for intersection navigation of autonomous vehicle , 2014, 2014 IEEE International Conference on Robotics and Biomimetics (ROBIO 2014).

[5]  Rémi Munos,et al.  Online Learning in Adversarial Lipschitz Environments , 2010, ECML/PKDD.

[6]  Michael L. Littman,et al.  Sample-Based Planning for Continuous Action Markov Decision Processes , 2011, ICAPS.

[7]  Rüdiger Dillmann,et al.  Probabilistic decision-making under uncertainty for autonomous driving using continuous POMDPs , 2014, 17th International IEEE Conference on Intelligent Transportation Systems (ITSC).

[8]  David Hsu,et al.  Monte Carlo Value Iteration for Continuous-State POMDPs , 2010, WAFR.

[9]  Mykel J. Kochenderfer,et al.  Scalable Decision Making with Sensor Occlusions for Autonomous Driving , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[10]  Hanna Kurniawati,et al.  TAPIR: A software toolkit for approximating and adapting POMDP solutions online , 2014, ICRA 2014.

[11]  Michael L. Littman,et al.  Bandit-Based Planning and Learning in Continuous-Action Markov Decision Processes , 2012, ICAPS.

[12]  David Hsu,et al.  DESPOT: Online POMDP Planning with Regularization , 2013, NIPS.

[13]  Sebastian Brechtel,et al.  Dynamic Decision-making in Continuous Partially Observable Domains: A Novel Method and its Application for Autonomous Driving , 2015 .

[14]  Mykel J. Kochenderfer,et al.  Belief state planning for autonomously navigating urban intersections , 2017, 2017 IEEE Intelligent Vehicles Symposium (IV).

[15]  Mykel J. Kochenderfer,et al.  The value of inferring the internal state of traffic participants for autonomous freeway driving , 2017, 2017 American Control Conference (ACC).

[16]  Jonathan T. Barron,et al.  A General and Adaptive Robust Loss Function , 2017, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Jia Yuan Yu,et al.  Lipschitz Bandits without the Lipschitz Constant , 2011, ALT.

[18]  Christoph Stiller,et al.  Automated Driving in Uncertain Environments: Planning With Interaction and Uncertain Maneuver Prediction , 2018, IEEE Transactions on Intelligent Vehicles.

[19]  Csaba Szepesvári,et al.  Online Optimization in X-Armed Bandits , 2008, NIPS.

[20]  Christoph Stiller,et al.  A POMDP Maneuver Planner For Occlusions in Urban Scenarios , 2019, 2019 IEEE Intelligent Vehicles Symposium (IV).

[21]  Yishay Mansour,et al.  Learning Rates for Q-learning , 2004, J. Mach. Learn. Res..

[22]  Achim Kampker,et al.  Towards tactical behaviour planning under uncertainties for automated vehicles in urban scenarios , 2017, 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC).

[23]  Christer Ericson,et al.  Real-Time Collision Detection , 2004 .

[24]  Csaba Szepesvári,et al.  Tuning Bandit Algorithms in Stochastic Environments , 2007, ALT.

[25]  Aurélien Garivier,et al.  The KL-UCB Algorithm for Bounded Stochastic Bandits and Beyond , 2011, COLT.

[26]  David Hsu,et al.  SARSOP: Efficient Point-Based POMDP Planning by Approximating Optimally Reachable Belief Spaces , 2008, Robotics: Science and Systems.

[27]  S. Zucker,et al.  Toward Efficient Trajectory Planning: The Path-Velocity Decomposition , 1986 .

[28]  Csaba Szepesvári,et al.  Bandit Based Monte-Carlo Planning , 2006, ECML.

[29]  Joel Veness,et al.  Monte-Carlo Planning in Large POMDPs , 2010, NIPS.

[30]  Mykel J. Kochenderfer,et al.  Online Algorithms for POMDPs with Continuous State, Action, and Observation Spaces , 2017, ICAPS.

[31]  Aleksandrs Slivkins,et al.  Introduction to Multi-Armed Bandits , 2019, Found. Trends Mach. Learn..

[32]  Eli Upfal,et al.  Multi-Armed Bandits in Metric Spaces ∗ , 2008 .

[33]  Stefan Magureanu,et al.  Efficient Online Learning under Bandit Feedback , 2018 .

[34]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.