Adaptive Bandits: Towards the best history-dependent strategy
暂无分享,去创建一个
[1] Yoav Freund,et al. A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.
[2] S. Hart,et al. A simple adaptive procedure leading to correlated equilibrium , 2000 .
[3] Ehud Lehrer,et al. A wide range no-regret theorem , 2003, Games Econ. Behav..
[4] Dean P. Foster,et al. Regret in the On-Line Decision Problem , 1999 .
[5] Robert D. Kleinberg,et al. Regret bounds for sleeping experts and bandits , 2010, Machine Learning.
[6] Marcus Hutter,et al. Feature Reinforcement Learning: Part I. Unstructured MDPs , 2009, J. Artif. Gen. Intell..
[7] Gilles Stoltz. Incomplete information and internal regret in prediction of individual sequences , 2005 .
[8] Jean-Yves Audibert,et al. Minimax Policies for Adversarial and Stochastic Bandits. , 2009, COLT 2009.
[9] Ronald Ortner. Online Regret Bounds for Markov Decision Processes with Deterministic Transitions , 2008, ALT.
[10] Nicolò Cesa-Bianchi,et al. Potential-Based Algorithms in On-Line Prediction and Game Theory , 2003, Machine Learning.
[11] Varun Kanade,et al. Sleeping Experts and Bandits with Stochastic Action Availability and Adversarial Rewards , 2009, AISTATS.
[12] Csaba Szepesvári,et al. Exploration-exploitation tradeoff using variance estimates in multi-armed bandits , 2009, Theor. Comput. Sci..
[13] Sébastien Bubeck. Bandits Games and Clustering Foundations , 2010 .
[14] Csaba Szepesvári,et al. Variance estimates and exploration function in multi-armed bandit , 2008 .
[15] Yoav Freund,et al. A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.
[16] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[17] Marcus Hutter,et al. On the Possibility of Learning in Reactive Environments with Arbitrary Dependence , 2008, Theor. Comput. Sci..
[18] Louis Wehenkel,et al. Clinical data based optimal STI strategies for HIV: a reinforcement learning approach , 2006, Proceedings of the 45th IEEE Conference on Decision and Control.
[19] H. Robbins. Some aspects of the sequential design of experiments , 1952 .
[20] Yishay Mansour,et al. From External to Internal Regret , 2005, J. Mach. Learn. Res..
[21] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..