Internal-State Policy-Gradient Algorithms for Partially Observable Markov Decision Processes
暂无分享,去创建一个
[1] Craig Boutilier,et al. Computing Optimal Policies for Partially Observable Decision Processes Using Compact Representations , 1996, AAAI/IAAI, Vol. 2.
[2] Daphne Koller,et al. Reinforcement Learning Using Approximate Belief States , 1999, NIPS.
[3] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[4] John N. Tsitsiklis,et al. The Complexity of Markov Decision Processes , 1987, Math. Oper. Res..
[5] Alan Weiss,et al. Sensitivity Analysis for Simulations via Likelihood Ratios , 1989, Oper. Res..
[6] A. Poritz,et al. Hidden Markov models: a guided tour , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.
[7] Leonid Peshkin,et al. Learning from Scarce Experience , 2002, ICML.
[8] Peter L. Bartlett,et al. Estimation and Approximation Bounds for Gradient-Based Reinforcement Learning , 2000, J. Comput. Syst. Sci..
[9] Michael I. Jordan,et al. Learning Without State-Estimation in Partially Observable Markovian Decision Processes , 1994, ICML.
[10] Christian R. Shelton,et al. Policy Improvement for POMDPs Using Normalized Importance Sampling , 2001, UAI.
[11] Shigenobu Kobayashi,et al. Reinforcement learning for continuous action using stochastic gradient ascent , 1998 .
[12] Michael I. Jordan,et al. Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems , 1994, NIPS.
[13] Michael I. Jordan,et al. Reinforcement Learning with Soft State Aggregation , 1994, NIPS.
[14] Andrew W. Moore,et al. Gradient Descent for General Reinforcement Learning , 1998, NIPS.
[15] Katia P. Sycara,et al. Evolutionary Search, Stochastic Policies with Memory, and Reinforcement Learning with Hidden State , 2001, ICML.
[16] Peter W. Glynn,et al. Likelihood ratio gradient estimation for stochastic systems , 1990, CACM.
[17] Craig Boutilier,et al. Vector-space Analysis of Belief-state Approximation for POMDPs , 2001, UAI.
[18] Milos Hauskrecht,et al. Incremental Methods for Computing Bounds in Partially Observable Markov Decision Processes , 1997, AAAI/IAAI.
[19] Anne Greenbaum,et al. Iterative methods for solving linear systems , 1997, Frontiers in applied mathematics.
[20] Lonnie Chrisman,et al. Reinforcement Learning with Perceptual Aliasing: The Perceptual Distinctions Approach , 1992, AAAI.
[21] Amos Storkey,et al. Advances in Neural Information Processing Systems 20 , 2007 .
[22] Zhengzhu Feng,et al. Dynamic Programming for POMDPs Using a Factored State Representation , 2000, AIPS.
[23] Leslie Pack Kaelbling,et al. Learning Policies with External Memory , 1999, ICML.
[24] Milos Hauskrecht,et al. Value-Function Approximations for Partially Observable Markov Decision Processes , 2000, J. Artif. Intell. Res..
[25] Ronen I. Brafman,et al. A Heuristic Variable Grid Solution Method for POMDPs , 1997, AAAI/IAAI.
[26] P. Lanzi,et al. Adaptive Agents with Reinforcement Learning and Internal Memory , 2000 .
[27] Ilse C. F. Ipsen,et al. THE IDEA BEHIND KRYLOV METHODS , 1998 .
[28] Katsuhiko Ogata,et al. Modern Control Engineering , 1970 .
[29] Leslie Pack Kaelbling,et al. Adaptive Importance Sampling for Estimation in Structured Domains , 2000, UAI.
[30] Peter W. Glynn,et al. Stochastic approximation for Monte Carlo optimization , 1986, WSC '86.
[31] Lawrence R. Rabiner,et al. A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.
[32] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..
[33] P. Marbach. Simulation-Based Methods for Markov Decision Processes , 1998 .
[34] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.
[35] Andrew McCallum,et al. Reinforcement learning with selective perception and hidden state , 1996 .
[36] Sebastian Thrun,et al. Monte Carlo POMDPs , 1999, NIPS.
[37] Illah R. Nourbakhsh,et al. DERVISH - An Office-Navigating Robot , 1995, AI Mag..
[38] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[39] Wenju Liu,et al. Planning in Stochastic Domains: Problem Characteristics and Approximation , 1996 .
[40] Peter L. Bartlett,et al. Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning , 2001, J. Mach. Learn. Res..
[41] Alan Weiss,et al. Sensitivity analysis via likelihood ratios , 1986, WSC '86.
[42] Peter L. Bartlett,et al. Experiments with Infinite-Horizon, Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..
[43] John J. Grefenstette,et al. Evolutionary Algorithms for Reinforcement Learning , 1999, J. Artif. Intell. Res..
[44] Dana Ron,et al. The Power of Amnesia , 1993, NIPS.
[45] A. Cassandra,et al. Exact and approximate algorithms for partially observable markov decision processes , 1998 .
[46] Peter L. Bartlett,et al. Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..
[47] E. J. Sondik,et al. The Optimal Control of Partially Observable Markov Decision Processes. , 1971 .
[48] Craig Boutilier,et al. Value-directed sampling methods for monitoring POMDPs , 2001, UAI 2001.
[49] Brian Sallans,et al. Learning Factored Representations for Partially Observable Markov Decision Processes , 1999, NIPS.
[50] Shigenobu Kobayashi,et al. Reinforcement Learning in POMDPs with Function Approximation , 1997, ICML.
[51] Kee-Eung Kim,et al. Solving POMDPs by Searching the Space of Finite Policies , 1999, UAI.
[52] J. Douglas Faires,et al. Numerical Analysis , 1981 .
[53] Terrence L. Fine,et al. Feedforward Neural Network Methodology , 1999, Information Science and Statistics.
[54] Yoshua Bengio,et al. Input-output HMMs for sequence processing , 1996, IEEE Trans. Neural Networks.
[55] J. Tsitsiklis,et al. Gradient-Based Optimization of Markov Reward Processes: Practical Variants , 2000 .
[56] Leslie Pack Kaelbling,et al. Learning Policies for Partially Observable Environments: Scaling Up , 1997, ICML.
[57] Andrew McCallum,et al. Maximum Entropy Markov Models for Information Extraction and Segmentation , 2000, ICML.
[58] Stuart J. Russell,et al. Approximating Optimal Policies for Partially Observable Stochastic Domains , 1995, IJCAI.
[59] Kee-Eung Kim,et al. Learning Finite-State Controllers for Partially Observable Environments , 1999, UAI.
[60] Long Lin,et al. Memory Approaches to Reinforcement Learning in Non-Markovian Domains , 1992 .