Solving POMDPs by Searching the Space of Finite Policies
暂无分享,去创建一个
Kee-Eung Kim | Leslie Pack Kaelbling | Nicolas Meuleau | Anthony R. Cassandra | A. Cassandra | L. Kaelbling | N. Meuleau | Kee-Eung Kim
[1] Ronald A. Howard,et al. Dynamic Programming and Markov Processes , 1960 .
[2] Karl Johan Åström,et al. Optimal control of Markov processes with incomplete state information , 1965 .
[3] E. J. Sondik,et al. The Optimal Control of Partially Observable Markov Decision Processes. , 1971 .
[4] J. Satia,et al. Markovian Decision Processes with Probabilistic Observation of States , 1973 .
[5] Edward J. Sondik,et al. The Optimal Control of Partially Observable Markov Processes over a Finite Horizon , 1973, Oper. Res..
[6] Edward J. Sondik,et al. The Optimal Control of Partially Observable Markov Processes over the Infinite Horizon: Discounted Costs , 1978, Oper. Res..
[7] C. Watkins. Learning from delayed rewards , 1989 .
[8] Andrew McCallum,et al. Overcoming Incomplete Perception with Utile Distinction Memory , 1993, ICML.
[9] Leslie Pack Kaelbling,et al. Acting Optimally in Partially Observable Stochastic Domains , 1994, AAAI.
[10] Michael L. Littman,et al. Memoryless policies: theoretical limitations and practical results , 1994 .
[11] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[12] Michael I. Jordan,et al. Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems , 1994, NIPS.
[13] Andrew McCallum,et al. Reinforcement learning with selective perception and hidden state , 1996 .
[14] Stuart J. Russell,et al. Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.
[15] Jürgen Schmidhuber,et al. HQ-Learning , 1997, Adapt. Behav..
[16] Eric A. Hansen,et al. An Improved Policy Iteration Algorithm for Partially Observable MDPs , 1997, NIPS.
[17] Milos Hauskrecht,et al. Planning and control in stochastic domains with imperfect information , 1997 .
[18] Leslie Pack Kaelbling,et al. Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..
[19] Eric A. Hansen,et al. Solving POMDPs by Searching in Policy Space , 1998, UAI.
[20] Andrew W. Moore,et al. Gradient Descent for General Reinforcement Learning , 1998, NIPS.
[21] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[22] A. Cassandra,et al. Exact and approximate algorithms for partially observable markov decision processes , 1998 .
[23] Shlomo Zilberstein,et al. Finite-memory control of partially observable systems , 1998 .
[24] Leslie Pack Kaelbling,et al. Learning Policies with External Memory , 1999, ICML.
[25] Craig Boutilier,et al. Decision-Theoretic Planning: Structural Assumptions and Computational Leverage , 1999, J. Artif. Intell. Res..
[26] Kee-Eung Kim,et al. Learning Finite-State Controllers for Partially Observable Environments , 1999, UAI.