Thompson Sampling for Learning Parameterized Markov Decision Processes
暂无分享,去创建一个
[1] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .
[2] H. Robbins,et al. Boundary Crossing Probabilities for the Wiener Process and Sample Sums , 1970 .
[3] G. Grimmett,et al. Probability and random processes , 2002 .
[4] P. R. Kumar,et al. Optimal control of a queueing system with two heterogeneous servers , 1984 .
[5] Pravin Varaiya,et al. Stochastic Systems: Estimation, Identification, and Adaptive Control , 1986 .
[6] L. Tierney,et al. Accurate Approximations for Posterior Moments and Marginal Densities , 1986 .
[7] R. Agrawal,et al. Asymptotically efficient adaptive allocation schemes for controlled Markov chains: finite parameter space , 1989 .
[8] G. Koole. A simple proof of the optimality of a threshold policy in a two-server queueing system , 1995 .
[9] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..
[10] Apostolos Burnetas,et al. Optimal Adaptive Policies for Markov Decision Processes , 1997, Math. Oper. Res..
[11] Tj Sweeting,et al. Invited discussion of A. R. Barron: Information-theoretic characterization of Bayes performance and the choice of priors in parametric and nonparametric problems , 1998 .
[12] David Andre,et al. Model based Bayesian Exploration , 1999, UAI.
[13] J. Ghosh,et al. POSTERIOR CONSISTENCY OF DIRICHLET MIXTURES IN DENSITY ESTIMATION , 1999 .
[14] A. V. D. Vaart,et al. Convergence rates of posterior distributions , 2000 .
[15] L. Wasserman,et al. Rates of convergence of posterior distributions , 2001 .
[16] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
[17] Gábor Lugosi,et al. Concentration Inequalities , 2008, COLT.
[18] Ambuj Tewari,et al. Optimistic Linear Programming gives Logarithmic Regret for Irreducible MDPs , 2007, NIPS.
[19] T. Lai,et al. Pseudo-maximization and self-normalized processes , 2007, 0709.2233.
[20] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[21] Shie Mannor,et al. Efficient reinforcement learning in parameterized models: discrete parameters , 2008, VALUETOOLS.
[22] Elizabeth L. Wilmer,et al. Markov Chains and Mixing Times , 2008 .
[23] Sean P. Meyn,et al. An analysis of reinforcement learning with function approximation , 2008, ICML '08.
[24] R. Ramamoorthi,et al. Remarks on consistency of posterior distributions , 2008, 0805.3248.
[25] Ambuj Tewari,et al. REGAL: A Regularization based Algorithm for Reinforcement Learning in Weakly Communicating MDPs , 2009, UAI.
[26] Jean-Yves Audibert,et al. Regret Bounds and Minimax Policies under Partial Monitoring , 2010, J. Mach. Learn. Res..
[27] Daniel A. Braun,et al. A Minimum Relative Entropy Principle for Learning and Acting , 2008, J. Artif. Intell. Res..
[28] Rémi Munos,et al. Thompson Sampling: An Asymptotically Optimal Finite-Time Analysis , 2012, ALT.
[29] Shipra Agrawal,et al. Analysis of Thompson Sampling for the Multi-armed Bandit Problem , 2011, COLT.
[30] Devavrat Shah,et al. Computing the Stationary Distribution Locally , 2013, NIPS.
[31] Tor Lattimore,et al. The Sample-Complexity of General Reinforcement Learning , 2013, ICML.
[32] Benjamin Van Roy,et al. (More) Efficient Reinforcement Learning via Posterior Sampling , 2013, NIPS.
[33] Shipra Agrawal,et al. Thompson Sampling for Contextual Bandits with Linear Payoffs , 2012, ICML.
[34] Benjamin Van Roy,et al. Eluder Dimension and the Sample Complexity of Optimistic Exploration , 2013, NIPS.
[35] Rémi Munos,et al. Thompson Sampling for 1-Dimensional Exponential Family Bandits , 2013, NIPS.
[36] Benjamin Van Roy,et al. Near-optimal Reinforcement Learning in Factored MDPs , 2014, NIPS.
[37] Shie Mannor,et al. Thompson Sampling for Complex Online Problems , 2013, ICML.
[38] Csaba Szepesvári,et al. Bayesian Optimal Control of Smoothly Parameterized Systems: The Lazy Posterior Sampling Algorithm , 2014, ArXiv.
[39] Benjamin Van Roy,et al. Model-based Reinforcement Learning and the Eluder Dimension , 2014, NIPS.