Solving Nonlinear Continuous State-Action-Observation POMDPs for Mechanical Systems with Gaussian Noise

In this paper, we introduce a novel model-based approach to solving the important subclass of partially observable Markov decision processes (POMDPs) with Gaussian noise in continuous states, actions, and observations. This kind of POMDP frequently appears in robotics and many other real-world control problems. However, except for the linear quadratic Gaussian case, no ecient ways of computing optimal controllers are known. We propose a novel method for eciently approximating optimal solutions of nonlinear stochastic continuous state-action-observation POMDPs in high dimensions. We use Gaussian processes (GPs) to model both the latent transition dynamics and the measurement mapping. By explicit marginalization over the GP posteriors our method is robust to model errors and can be used for principled belief space inference, policy learning, and policy execution.

[1]  Uwe D. Hanebeck,et al.  Analytic moment-based Gaussian process filtering , 2009, ICML '09.

[2]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[3]  Carl E. Rasmussen,et al.  PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[4]  David Hsu,et al.  SARSOP: Efficient Point-Based POMDP Planning by Approximating Optimally Reachable Belief Spaces , 2008, Robotics: Science and Systems.

[5]  Stefan Schaal,et al.  Natural Actor-Critic , 2003, Neurocomputing.

[6]  Pascal Poupart,et al.  Point-Based Value Iteration for Continuous POMDPs , 2006, J. Mach. Learn. Res..

[7]  Jan Peters,et al.  Noname manuscript No. (will be inserted by the editor) Policy Search for Motor Primitives in Robotics , 2022 .

[8]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[9]  Joelle Pineau,et al.  Point-based value iteration: An anytime algorithm for POMDPs , 2003, IJCAI.

[10]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[11]  Jesse Hoey,et al.  An analytic solution to discrete Bayesian reinforcement learning , 2006, ICML.

[12]  Brahim Chaib-draa,et al.  Bayesian reinforcement learning in continuous POMDPs with gaussian processes , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[13]  Dieter Fox,et al.  GP-BayesFilters: Bayesian filtering using Gaussian process prediction and observation models , 2008, IROS.

[14]  David J. Fleet,et al.  This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE Gaussian Process Dynamical Model , 2007 .

[15]  Dieter Fox,et al.  Learning GP-BayesFilters via Gaussian process latent variable models , 2009, Auton. Robots.

[16]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[17]  Jeff G. Schneider,et al.  Autonomous helicopter control using reinforcement learning policy search methods , 2001, Proceedings 2001 ICRA. IEEE International Conference on Robotics and Automation (Cat. No.01CH37164).

[18]  Joelle Pineau,et al.  Bayesian reinforcement learning in continuous POMDPs with application to robot navigation , 2008, 2008 IEEE International Conference on Robotics and Automation.

[19]  Carl E. Rasmussen,et al.  Robust Filtering and Smoothing with Gaussian Processes , 2012, IEEE Transactions on Automatic Control.

[20]  Michael I. Jordan,et al.  PEGASUS: A policy search method for large MDPs and POMDPs , 2000, UAI.

[21]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[22]  Carl E. Rasmussen,et al.  Gaussian Process Training with Input Noise , 2011, NIPS.