Particle Filter-based Policy Gradient in POMDPs

Our setting is a Partially Observable Markov Decision Process with continuous state, observation and action spaces. Decisions are based on a Particle Filter for estimating the belief state given past observations. We consider a policy gradient approach for parameterized policy optimization. For that purpose, we investigate sensitivity analysis of the performance measure with respect to the parameters of the policy, focusing on Finite Difference (FD) techniques. We show that the naive FD is subject to variance explosion because of the non-smoothness of the resampling procedure. We propose a more sophisticated FD method which overcomes this problem and establish its consistency.

[1]  J. Baxter,et al.  Direct gradient-based reinforcement learning , 2000, 2000 IEEE International Symposium on Circuits and Systems. Emerging Technologies for the 21st Century. Proceedings (IEEE Cat No.00CH36353).

[2]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[3]  Eric Moulines,et al.  Comparison of resampling schemes for particle filtering , 2005, ISPA 2005. Proceedings of the 4th International Symposium on Image and Signal Processing and Analysis, 2005..

[4]  W. Lovejoy A survey of algorithmic methods for partially observed Markov decision processes , 1991 .

[5]  P. Moral,et al.  Branching and interacting particle systems. Approximations of Feynman-Kac formulae with applications to non-linear filtering , 2000 .

[6]  A. Doucet,et al.  Parameter estimation in general state-space models using particle methods , 2003 .

[7]  Paul Glasserman,et al.  Gradient Estimation Via Perturbation Analysis , 1990 .

[8]  Nishan Canagarajah,et al.  ROI coding of volumetric medical images with application to visualisation , 2003, 3rd International Symposium on Image and Signal Processing and Analysis, 2003. ISPA 2003. Proceedings of the.

[9]  Paul Glasserman,et al.  Monte Carlo Methods in Financial Engineering , 2003 .

[10]  James C. Spall,et al.  Adaptive stochastic approximation by the simultaneous perturbation method , 2000, IEEE Trans. Autom. Control..

[11]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[12]  Rémi Munos,et al.  Sensitivity Analysis in Particle Filters. Application to Policy Optimization in POMDPs , 2008 .

[13]  Nando de Freitas,et al.  Sequential Monte Carlo Methods in Practice , 2001, Statistics for Engineering and Information Science.

[14]  Timothy J. Robinson,et al.  Sequential Monte Carlo Methods in Practice , 2003 .

[15]  Christophe Andrieu,et al.  Particle methods for change detection, system identification, and control , 2004, Proceedings of the IEEE.

[16]  P. Moral Feynman-Kac Formulae: Genealogical and Interacting Particle Systems with Applications , 2004 .

[17]  Wolfram Burgard,et al.  Particle Filters for Mobile Robot Localization , 2001, Sequential Monte Carlo Methods in Practice.

[18]  Lawrence R. Rabiner,et al.  A tutorial on Hidden Markov Models , 1986 .

[19]  Arnaud Doucet,et al.  Particle methods for optimal filter derivative: application to parameter estimation , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[20]  G. Kitagawa Monte Carlo Filter and Smoother for Non-Gaussian Nonlinear State Space Models , 1996 .

[21]  Johann Fichou,et al.  Particle-based methods for parameter estimation and tracking: Numerical experiments , 2004 .

[22]  N. Gordon,et al.  Novel approach to nonlinear/non-Gaussian Bayesian state estimation , 1993 .

[23]  R. Douc,et al.  Limit theorems for weighted samples with applications to sequential Monte Carlo methods , 2005, math/0507042.