An On-Line Planner for POMDPs with Large Discrete Action Space: A Quantile-Based Approach

Making principled decisions in the presence of uncertainty is often facilitated by Partially Observable Markov Decision Processes (POMDPs). Despite tremendous advances in POMDP solvers, finding good policies with large action spaces remains difficult. To alleviate this difficulty, this paper presents an on-line approximate solver, called Quantile-Based Action Selector (QBASE). It uses quantile-statistics to adaptively evaluate a small subset of the action space without sacrificing the quality of the generated decision strategies by much. Experiments on four different robotics tasks with up to 10,000 actions indicate that QBASE can generate substantially better strategies than a state-of-the-art method.

[1]  Rajeev Motwani,et al.  Randomized Algorithms , 1995, SIGA.

[2]  Hanna Kurniawati,et al.  CEMAB: A Cross-Entropy-based Method for Large-Scale Multi-Armed Bandits , 2017, ACALCI.

[3]  David Hsu,et al.  DESPOT: Online POMDP Planning with Regularization , 2013, NIPS.

[4]  David Hsu,et al.  Motion planning under uncertainty for robotic tasks with long time horizons , 2010, Int. J. Robotics Res..

[5]  Reid G. Simmons,et al.  Point-Based POMDP Algorithms: Improved Analysis and Implementation , 2005, UAI.

[6]  Surya P. N. Singh,et al.  An online and approximate solver for POMDPs with continuous action space , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[7]  Guy Shani,et al.  Forward Search Value Iteration for POMDPs , 2007, IJCAI.

[8]  David Hsu,et al.  SARSOP: Efficient Point-Based POMDP Planning by Approximating Optimally Reachable Belief Spaces , 2008, Robotics: Science and Systems.

[9]  Hanna Kurniawati,et al.  An Online POMDP Solver for Uncertainty Planning in Dynamic Environment , 2013, ISRR.

[10]  Wolfram Burgard,et al.  Principles of Robot Motion: Theory, Algorithms, and Implementation ERRATA!!!! 1 , 2007 .

[11]  Joel Veness,et al.  Monte-Carlo Planning in Large POMDPs , 2010, NIPS.

[12]  Yishay Mansour,et al.  A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes , 1999, Machine Learning.

[13]  Shivaram Kalyanakrishnan,et al.  PAC Identification of a Bandit Arm Relative to a Reward Quantile , 2017, AAAI.

[14]  Pascal Poupart,et al.  Point-Based Value Iteration for Continuous POMDPs , 2006, J. Mach. Learn. Res..

[15]  Reid G. Simmons,et al.  Heuristic Search Value Iteration for POMDPs , 2004, UAI.

[16]  David Hsu,et al.  Integrated Perception and Planning in the Continuous Space: A POMDP Approach , 2013, Robotics: Science and Systems.

[17]  Joelle Pineau,et al.  Point-based value iteration: An anytime algorithm for POMDPs , 2003, IJCAI.

[18]  Edward J. Sondik,et al.  The Optimal Control of Partially Observable Markov Processes over a Finite Horizon , 1973, Oper. Res..

[19]  David Hsu,et al.  Importance sampling for online planning under uncertainty , 2018, Int. J. Robotics Res..

[20]  Jesse Hoey,et al.  Solving POMDPs with Continuous or Large Discrete Observation Spaces , 2005, IJCAI.

[21]  Dirk P. Kroese,et al.  The Cross Entropy Method: A Unified Approach To Combinatorial Optimization, Monte-carlo Simulation (Information Science and Statistics) , 2004 .

[22]  John N. Tsitsiklis,et al.  The Complexity of Markov Decision Processes , 1987, Math. Oper. Res..

[23]  Michael L. Littman,et al.  Sample-Based Planning for Continuous Action Markov Decision Processes , 2011, ICAPS.

[24]  Michael L. Littman,et al.  The Cross-Entropy Method Optimizes for Quantiles , 2013, ICML.