暂无分享,去创建一个
[1] Rémi Munos,et al. Thompson Sampling: An Asymptotically Optimal Finite-Time Analysis , 2012, ALT.
[2] Osamu Watanabe,et al. Adaptive Sampling Methods for Scaling Up Knowledge Discovery Algorithms , 1999, Discovery Science.
[3] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 1985 .
[4] Jürgen Branke,et al. Evolutionary optimization in uncertain environments-a survey , 2005, IEEE Transactions on Evolutionary Computation.
[5] R. Weber. On the Gittins Index for Multiarmed Bandits , 1992 .
[6] Aurélien Garivier,et al. The KL-UCB Algorithm for Bounded Stochastic Bandits and Beyond , 2011, COLT.
[7] A. E. Eiben,et al. From evolutionary computation to the evolution of things , 2015, Nature.
[8] Shie Mannor,et al. Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems , 2006, J. Mach. Learn. Res..
[9] Steven L. Scott,et al. A modern Bayesian look at the multi-armed bandit , 2010 .
[10] Rémi Munos,et al. A Finite-Time Analysis of Multi-armed Bandits Problems with Kullback-Leibler Divergences , 2011, COLT.
[11] Aurélien Garivier,et al. On Bayesian Upper Confidence Bounds for Bandit Problems , 2012, AISTATS.
[12] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .
[13] Lihong Li,et al. An Empirical Evaluation of Thompson Sampling , 2011, NIPS.
[14] Rémi Munos,et al. Pure Exploration in Multi-armed Bandits Problems , 2009, ALT.
[15] Ole-Christoffer Granmo,et al. Solving two-armed Bernoulli bandit problems using a Bayesian learning automaton , 2010, Int. J. Intell. Comput. Cybern..
[16] Shipra Agrawal,et al. Analysis of Thompson Sampling for the Multi-armed Bandit Problem , 2011, COLT.
[17] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[18] Moto Kamiura,et al. Optimism in the face of uncertainty supported by a statistically-designed multi-armed bandit algorithm , 2017, Biosyst..
[19] H. Robbins. Some aspects of the sequential design of experiments , 1952 .
[20] John Langford,et al. Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits , 2014, ICML.
[21] H. Chernoff. A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the sum of Observations , 1952 .