Rotting Bandits
暂无分享,去创建一个
[1] Rémi Munos,et al. Thompson Sampling: An Asymptotically Optimal Finite-Time Analysis , 2012, ALT.
[2] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 2022 .
[3] Omar Besbes,et al. Stochastic Multi-Armed-Bandit Problem with Non-stationary Rewards , 2014, NIPS.
[4] Shie Mannor,et al. Piecewise-stationary bandit problems with side observations , 2009, ICML '09.
[5] P. Whittle. Arm-Acquiring Bandits , 1981 .
[6] Qing Zhao,et al. Extended UCB Policy for Multi-Armed Bandit with Light-Tailed Reward Distributions , 2011, ArXiv.
[7] Ambuj Tewari,et al. Online Bandit Learning against an Adaptive Adversary: from Regret to Policy Regret , 2012, ICML.
[8] Wael Badawy,et al. Automatic License Plate Recognition (ALPR): A State-of-the-Art Review , 2013, IEEE Transactions on Circuits and Systems for Video Technology.
[9] A. Mandelbaum,et al. Multi-armed bandits in discrete and continuous time , 1998 .
[10] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .
[11] P. Whittle. Restless Bandits: Activity Allocation in a Changing World , 1988 .
[12] Nicholas R. Jennings,et al. Efficient Crowdsourcing of Unknown Experts using Multi-Armed Bandits , 2012, ECAI.
[13] Shie Mannor,et al. Thompson Sampling for Complex Online Problems , 2013, ICML.
[14] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[15] Shipra Agrawal,et al. Further Optimal Regret Bounds for Thompson Sampling , 2012, AISTATS.
[16] Elad Hazan,et al. Better Algorithms for Benign Bandits , 2009, J. Mach. Learn. Res..
[17] Deepayan Chakrabarti,et al. Bandits for Taxonomies: A Model-based Approach , 2007, SDM.
[18] Deepak Agarwal,et al. Spatio-temporal models for estimating click-through rate , 2009, WWW '09.
[19] H. Robbins. Some aspects of the sequential design of experiments , 1952 .
[20] Aurélien Garivier,et al. On Upper-Confidence Bound Policies for Non-Stationary Bandit Problems , 2008 .
[21] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..
[22] Aurélien Garivier,et al. The KL-UCB Algorithm for Bounded Stochastic Bandits and Beyond , 2011, COLT.
[23] Frank Thomson Leighton,et al. The value of knowing a demand curve: bounds on regret for online posted-price auctions , 2003, 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings..
[24] Tao Qin,et al. Time-Decaying Bandits for Non-stationary Systems , 2014, WINE.
[25] Gábor Lugosi,et al. Prediction, learning, and games , 2006 .
[26] A. Mandelbaum. CONTINUOUS MULTI-ARMED BANDITS AND MULTIPARAMETER PROCESSES , 1987 .
[27] Filip Radlinski,et al. Mortal Multi-Armed Bandits , 2008, NIPS.
[28] J. Gittins. Bandit processes and dynamic allocation indices , 1979 .
[29] Mingyan Liu,et al. Online Learning of Rested and Restless Bandits , 2011, IEEE Transactions on Information Theory.
[30] Baruch Awerbuch,et al. Adaptive routing with end-to-end feedback: distributed learning and geometric approaches , 2004, STOC '04.
[31] Eli Upfal,et al. Adapting to a Changing Environment: the Brownian Restless Bandits , 2008, COLT.
[32] Rémi Munos,et al. A Finite-Time Analysis of Multi-armed Bandits Problems with Kullback-Leibler Divergences , 2011, COLT.