Regret bounds for restless Markov bandits

[1]  Phuong Nguyen,et al.  Optimal Regret Bounds for Selecting the State Representation in Reinforcement Learning , 2013, ICML.

[2]  V. Climenhaga Markov chains and mixing times , 2013 .

[3]  Ronald Ortner,et al.  Online Regret Bounds for Undiscounted Continuous Reinforcement Learning , 2012, NIPS.

[4]  Sébastien Bubeck,et al.  Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..

[5]  Rémi Munos,et al.  Selecting the State-Representation in Reinforcement Learning , 2011, NIPS.

[6]  Mingyan Liu,et al.  Adaptive learning of uncontrolled restless bandits with logarithmic regret , 2011, 2011 49th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[7]  Aurélien Garivier,et al.  Optimally Sensing a Single Channel Without Prior Information: The Tiling Algorithm and Regret Bounds , 2011, IEEE Journal of Selected Topics in Signal Processing.

[8]  Vittorio Ferrari,et al.  Advances in Neural Information Processing Systems 24 , 2011 .

[9]  Jean-Yves Audibert,et al.  Minimax Policies for Adversarial and Stochastic Bandits. , 2009, COLT 2009.

[10]  Ambuj Tewari,et al.  REGAL: A Regularization based Algorithm for Reinforcement Learning in Weakly Communicating MDPs , 2009, UAI.

[11]  Peter Auer,et al.  Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..

[12]  Doina Precup,et al.  Bounding Performance Loss in Approximate MDP Homomorphisms , 2008, NIPS.

[13]  Marcus Hutter,et al.  On the Possibility of Learning in Reactive Environments with Arbitrary Dependence , 2008, Theor. Comput. Sci..

[14]  Ian F. Akyildiz,et al.  A survey on spectrum management in cognitive radio networks , 2008, IEEE Communications Magazine.

[15]  Ronald Ortner,et al.  Pseudometrics for State Aggregation in Average Reward Markov Decision Processes , 2007, ALT.

[16]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[17]  Robert Givan,et al.  Equivalence notions and model minimization in Markov decision processes , 2003, Artif. Intell..

[18]  J. van Leeuwen,et al.  Theoretical Computer Science , 2003, Lecture Notes in Computer Science.

[19]  Balaraman Ravindran,et al.  Model Minimization in Hierarchical Reinforcement Learning , 2002, SARA.

[20]  Peter Auer,et al.  The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[21]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[22]  D. Aldous Threshold limits for cover times , 1991 .

[23]  David J. Aldous,et al.  Lower bounds for covering times for reversible Markov chains and random walks on graphs , 1989 .

[24]  P. Whittle Restless bandits: activity allocation in a changing world , 1988, Journal of Applied Probability.

[25]  J. Walrand,et al.  Asymptotically efficient allocation rules for the multiarmed bandit problem with multiple plays-Part II: Markovian rewards , 1987 .

[26]  M. Nair On Chebyshev-Type Inequalities for Primes , 1982 .

[27]  J. Gittins Bandit processes and dynamic allocation indices , 1979 .

[28]  T. L. Lai Andherbertrobbins Asymptotically Efficient Adaptive Allocation Rules , 2022 .