Reinforcement Learning in POMDPs Without Resets
暂无分享,去创建一个
[1] Ronald L. Rivest,et al. Inference of finite automata using homing sequences , 1989, STOC '89.
[2] William S. Lovejoy,et al. Computationally Feasible Bounds for Partially Observed Markov Decision Processes , 1991, Oper. Res..
[3] W. Lovejoy. A survey of algorithmic methods for partially observed Markov decision processes , 1991 .
[4] Ronald L. Rivest,et al. Inference of finite automata using homing sequences , 1989, STOC '89.
[5] Stuart J. Russell,et al. Approximating Optimal Policies for Partially Observable Stochastic Domains , 1995, IJCAI.
[6] Ronen I. Brafman,et al. A Heuristic Variable Grid Solution Method for POMDPs , 1997, AAAI/IAAI.
[7] Xavier Boyen,et al. Tractable Inference for Complex Stochastic Processes , 1998, UAI.
[8] A. Cassandra,et al. Exact and approximate algorithms for partially observable markov decision processes , 1998 .
[9] Yishay Mansour,et al. Approximate Planning in Large POMDPs via Reusable Trajectories , 1999, NIPS.
[10] David A. McAllester,et al. Approximate Planning for Factored POMDPs using Belief State Simplification , 1999, UAI.
[11] Kee-Eung Kim,et al. Learning to Cooperate via Policy Search , 2000, UAI.
[12] Judy Goldsmith,et al. Nonapproximability Results for Partially Observable Markov Decision Processes , 2011, Universität Trier, Mathematik/Informatik, Forschungsbericht.
[13] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 1998, Machine Learning.