An Expectation Maximization Algorithm for Continuous Markov Decision Processes with Arbitrary Reward
暂无分享,去创建一个
Nando de Freitas | Arnaud Doucet | Jan Peters | Matthew D. Hoffman | Jan Peters | A. Doucet | N. D. Freitas | M. Hoffman
[1] Edward J. Sondik,et al. The Optimal Control of Partially Observable Markov Processes over a Finite Horizon , 1973, Oper. Res..
[2] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[3] J. Nocedal,et al. A Limited Memory Algorithm for Bound Constrained Optimization , 1995, SIAM J. Sci. Comput..
[4] Sebastian Thrun,et al. Monte Carlo POMDPs , 1999, NIPS.
[5] Geoffrey E. Hinton,et al. Using EM for Reinforcement Learning , 2000 .
[6] Michael I. Jordan,et al. PEGASUS: A policy search method for large MDPs and POMDPs , 2000, UAI.
[7] Peter L. Bartlett,et al. Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..
[8] Jan M. Maciejowski,et al. Predictive control : with constraints , 2002 .
[9] Frank L. Lewis,et al. Robot Manipulator Control: Theory and Practice , 2003 .
[10] Hagai Attias,et al. Planning by Probabilistic Inference , 2003, AISTATS.
[11] Zoubin Ghahramani,et al. Sparse Gaussian Processes using Pseudo-inputs , 2005, NIPS.
[12] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.
[13] Pascal Poupart,et al. Point-Based Value Iteration for Continuous POMDPs , 2006, J. Mach. Learn. Res..
[14] Nando de Freitas,et al. Fast particle smoothing: if I had a million particles , 2006, ICML.
[15] Rajesh P. N. Rao,et al. Planning and Acting in Uncertain Environments using Probabilistic Inference , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[16] Marc Toussaint,et al. Probabilistic inference for solving discrete and continuous state Markov Decision Processes , 2006, ICML.
[17] Marc Toussaint,et al. Probabilistic inference for solving (PO) MDPs , 2006 .
[18] Nando de Freitas,et al. Bayesian Policy Learning with Trans-Dimensional MCMC , 2007, NIPS.
[19] Stefan Schaal,et al. Reinforcement Learning for Operational Space Control , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.
[20] Arnaud Doucet,et al. On solving integral equations using Markov chain Monte Carlo methods , 2010, Appl. Math. Comput..