Planning Delayed-Response Queries and Transient Policies under Reward Uncertainty
暂无分享,去创建一个
[1] Ronald J. Williams,et al. Tight Performance Bounds on Greedy Policies Based on Imperfect Value Functions , 1993 .
[2] Eyal Amir,et al. Bayesian Inverse Reinforcement Learning , 2007, IJCAI.
[3] Joelle Pineau,et al. Reinforcement learning with limited reinforcement: using Bayes risk for active learning in POMDPs , 2008, ICML '08.
[4] Joelle Pineau,et al. Efficient Planning and Tracking in POMDPs with Large Observation Spaces , 2006 .
[5] Satinder Singh,et al. An upper bound on the loss from approximate optimal-value functions , 1994, Machine Learning.
[6] Malcolm J. A. Strens,et al. A Bayesian Framework for Reinforcement Learning , 2000, ICML.
[7] Claudia V. Goldman,et al. Transition-independent decentralized markov decision processes , 2003, AAMAS '03.
[8] Sridhar Mahadevan,et al. Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..
[9] Edmund H. Durfee,et al. Selecting Operator Queries Using Expected Myopic Gain , 2010, 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.
[10] Manuela Veloso,et al. What to Communicate? Execution-Time Decision in Multi-agent POMDPs , 2006, DARS.
[11] Nikos A. Vlassis,et al. Multiagent Planning Under Uncertainty with Stochastic Communication Delays , 2008, ICAPS.
[12] Edmund H. Durfee,et al. Comparing Action-Query Strategies in Semi-Autonomous Agents , 2011, AAAI.
[13] Joelle Pineau,et al. Anytime Point-Based Approximations for Large POMDPs , 2006, J. Artif. Intell. Res..
[14] Mausam,et al. Planning with Durative Actions in Stochastic Domains , 2008, J. Artif. Intell. Res..
[15] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..
[16] Edmund H. Durfee,et al. Influence-Based Policy Abstraction for Weakly-Coupled Dec-POMDPs , 2010, ICAPS.