Integrated perception and planning in the continuous space: A POMDP approach

The partially observable Markov decision process (POMDP) provides a principled mathematical model for integrating perception and planning, a major challenge in robotics. While there are efficient algorithms for moderately large discrete POMDPs, continuous models are often more natural for robotic tasks, and currently there are no practical algorithms that handle continuous POMDPs at an interesting scale. This paper presents an algorithm for continuous-state, continuous-observation POMDPs. We provide experimental results demonstrating its potential in robot planning and learning under uncertainty and a theoretical analysis of its performance. A direct benefit of the algorithm is to simplify model construction.

[1]  David Haussler,et al.  Decision Theoretic Generalizations of the PAC Model for Neural Net and Other Learning Applications , 1992, Inf. Comput..

[2]  Andrew G. Barto,et al.  Reinforcement learning , 1998 .

[3]  Eric A. Hansen,et al.  Solving POMDPs by Searching in Policy Space , 1998, UAI.

[4]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[5]  Sebastian Thrun,et al.  Coastal Navigation with Mobile Robots , 1999, NIPS.

[6]  Kee-Eung Kim,et al.  Learning Finite-State Controllers for Partially Observable Environments , 1999, UAI.

[7]  Sebastian Thrun,et al.  Monte Carlo POMDPs , 1999, NIPS.

[8]  Sebastian Thrun,et al.  Probabilistic robotics , 2002, CACM.

[9]  Andrew G. Barto,et al.  Optimal learning: computational procedures for bayes-adaptive markov decision processes , 2002 .

[10]  Joelle Pineau,et al.  Point-based value iteration: An anytime algorithm for POMDPs , 2003, IJCAI.

[11]  Craig Boutilier,et al.  Bounded Finite State Controllers , 2003, NIPS.

[12]  Jeff G. Schneider,et al.  Policy Search by Dynamic Programming , 2003, NIPS.

[13]  Michail G. Lagoudakis,et al.  Reinforcement Learning as Classification: Leveraging Modern Classifiers , 2003, ICML.

[14]  Jesse Hoey,et al.  Solving POMDPs with Continuous or Large Discrete Observation Spaces , 2005, IJCAI.

[15]  Howie Choset,et al.  Principles of Robot Motion: Theory, Algorithms, and Implementation ERRATA!!!! 1 , 2007 .

[16]  Reid G. Simmons,et al.  Point-Based POMDP Algorithms: Improved Analysis and Implementation , 2005, UAI.

[17]  Pascal Poupart,et al.  Point-Based Value Iteration for Continuous POMDPs , 2006, J. Mach. Learn. Res..

[18]  Alexei Makarenko,et al.  Parametric POMDPs for planning in continuous state spaces , 2006, Robotics Auton. Syst..

[19]  Nicholas Roy,et al.  The Belief Roadmap: Efficient Planning in Linear POMDPs by Factoring the Covariance , 2007, ISRR.

[20]  Warren B. Powell,et al.  Approximate Dynamic Programming - Solving the Curses of Dimensionality , 2007 .

[21]  Warren B. Powell,et al.  Approximate Dynamic Programming: Solving the Curses of Dimensionality (Wiley Series in Probability and Statistics) , 2007 .

[22]  Leslie Pack Kaelbling,et al.  Grasping POMDPs , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[23]  Joelle Pineau,et al.  Online Planning Algorithms for POMDPs , 2008, J. Artif. Intell. Res..

[24]  Seth Hutchinson,et al.  Hyper-particle filtering for stochastic systems , 2008, 2008 IEEE International Conference on Robotics and Automation.

[25]  David Hsu,et al.  SARSOP: Efficient Point-Based POMDP Planning by Approximating Optimally Reachable Belief Spaces , 2008, Robotics: Science and Systems.

[26]  Panos M. Pardalos,et al.  Approximate dynamic programming: solving the curses of dimensionality , 2009, Optim. Methods Softw..

[27]  Oliver Brock,et al.  SARSOP: Efficient Point-Based POMDP Planning by Approximating Optimally Reachable Belief Spaces , 2009 .

[28]  William D. Smart,et al.  A Scalable Method for Solving High-Dimensional Continuous POMDPs Using Local Approximation , 2010, UAI.

[29]  Leslie Pack Kaelbling,et al.  Belief space planning assuming maximum likelihood observations , 2010, Robotics: Science and Systems.

[30]  David Hsu,et al.  Monte Carlo Value Iteration for Continuous-State POMDPs , 2010, WAFR.

[31]  Kris K. Hauser,et al.  Randomized Belief-Space Replanning in Partially-Observable Continuous Spaces , 2010, WAFR.

[32]  Nicholas Roy,et al.  Efficient Planning under Uncertainty with Macro-actions , 2014, J. Artif. Intell. Res..

[33]  Mykel J. Kochenderfer,et al.  Unmanned Aircraft Collision Avoidance using Continuous-State POMDPs , 2011, Robotics: Science and Systems.

[34]  Pieter Abbeel,et al.  LQG-MP: Optimized path planning for robots with motion uncertainty and imperfect state information , 2010, Int. J. Robotics Res..

[35]  Ron Alterovitz,et al.  Motion planning under uncertainty using iterative local optimization in belief space , 2012, Int. J. Robotics Res..

[36]  David Hsu,et al.  Monte Carlo Bayesian Reinforcement Learning , 2012, ICML.

[37]  David Hsu,et al.  Planning how to learn , 2013, 2013 IEEE International Conference on Robotics and Automation.

[38]  Rüdiger Dillmann,et al.  Solving Continuous POMDPs: Value Iteration with Incremental Learning of an Efficient Space Representation , 2013, ICML.

[39]  Young J. Kim,et al.  GPU-based motion planning under uncertainties using POMDP , 2013, 2013 IEEE International Conference on Robotics and Automation.

[40]  Andreas Krause,et al.  Advances in Neural Information Processing Systems (NIPS) , 2014 .