–armed Bandits
暂无分享,去创建一个
Csaba Szepesvári | Rémi Munos | Sébastien Bubeck | Gilles Stoltz | Csaba Szepesvari | R. Munos | Gilles Stoltz | Sébastien Bubeck
[1] David Silver,et al. Combining online and offline knowledge in UCT , 2007, ICML '07.
[2] Rémi Munos,et al. Pure exploration in finitely-armed and continuous-armed bandits , 2011, Theor. Comput. Sci..
[3] Gábor Lugosi,et al. Prediction, learning, and games , 2006 .
[4] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.
[5] Rémi Munos,et al. Pure Exploration in Multi-armed Bandits Problems , 2009, ALT.
[6] Yngvi Björnsson,et al. Simulation-Based Approach to General Game Playing , 2008, AAAI.
[7] David Silver,et al. Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence (2008) Achieving Master Level Play in 9 × 9 Computer Go , 2022 .
[8] Peter Auer,et al. Improved Rates for the Stochastic Continuum-Armed Bandit Problem , 2007, COLT.
[9] Robert D. Kleinberg. Nearly Tight Bounds for the Continuum-Armed Bandit Problem , 2004, NIPS.
[10] Eric W. Cope,et al. Regret and Convergence Bounds for a Class of Continuum-Armed Bandit Problems , 2009, IEEE Transactions on Automatic Control.
[11] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..
[12] Rémi Munos,et al. Bandit Algorithms for Tree Search , 2007, UAI.
[13] Csaba Szepesvári,et al. Online Optimization in X-Armed Bandits , 2008, NIPS.
[14] Maarten P. D. Schadd,et al. Addressing NP-Complete Puzzles with Monte-Carlo Methods 1 , 2008 .
[15] Yuhong Yang,et al. How Powerful Can Any Regression Learning Procedure Be? , 2007, AISTATS.
[16] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[17] H. Jaap van den Herik,et al. Progressive Strategies for Monte-Carlo Tree Search , 2008 .
[18] Y. Freund,et al. The non-stochastic multi-armed bandit problem , 2001 .
[19] R. Agrawal. The Continuum-Armed Bandit Problem , 1995 .
[20] Eli Upfal,et al. Multi-Armed Bandits in Metric Spaces ∗ , 2008 .
[21] H. Robbins. Some aspects of the sequential design of experiments , 1952 .
[22] J. Bather,et al. Multi‐Armed Bandit Allocation Indices , 1990 .
[23] Rémi Munos,et al. Open Loop Optimistic Planning , 2010, COLT.
[24] Olivier Teytaud,et al. Modification of UCT with Patterns in Monte-Carlo Go , 2006 .
[25] J. Doob. Stochastic processes , 1953 .
[26] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 2022 .
[27] R. Agrawal. Sample mean based index policies by O(log n) regret for the multi-armed bandit problem , 1995, Advances in Applied Probability.
[28] Elad Hazan,et al. Competing in the Dark: An Efficient Algorithm for Bandit Linear Optimization , 2008, COLT.
[29] W. Hoeffding. Probability Inequalities for sums of Bounded Random Variables , 1963 .