暂无分享,去创建一个
Csaba Szepesvári | Shipra Agrawal | Steffen Grünewälder | Ciara Pike-Burke | Csaba Szepesvari | Shipra Agrawal | Steffen Grünewälder | Ciara Pike-Burke | S. Grünewälder | Ciara Pike-Burke
[1] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .
[2] T. W. Anderson. Sequential Analysis with Delayed Observations , 1964 .
[3] 鈴木 雪夫. On sequential decision problems with delayed observations = 時間おくれの逐次決定問題について , 1967 .
[4] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[5] Peter Auer,et al. UCB revisited: Improved regret bounds for the stochastic multi-armed bandit problem , 2010, Period. Math. Hung..
[6] Lihong Li,et al. An Empirical Evaluation of Thompson Sampling , 2011, NIPS.
[7] John Langford,et al. Efficient Optimal Learning for Contextual Bandits , 2011, UAI.
[8] Andreas Krause,et al. Parallelizing Exploration-Exploitation Tradeoffs with Gaussian Process Bandit Optimization , 2012, ICML.
[9] András György,et al. Online Learning under Delayed Feedback , 2013, ICML.
[10] Gábor Lugosi,et al. Concentration Inequalities - A Nonasymptotic Theory of Independence , 2013, Concentration Inequalities.
[11] Csaba Szepesvári,et al. Online Markov Decision Processes Under Bandit Feedback , 2010, IEEE Transactions on Automatic Control.
[12] Zoran Popovic,et al. The Queue Method: Handling Delay, Heuristics, Prior Data, and Evaluation in Bandits , 2015, AAAI.
[13] Tor Lattimore,et al. On Explore-Then-Commit strategies , 2016, NIPS.
[14] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 2022 .