Thompson Sampling for Complex Online Problems
暂无分享,去创建一个
[1] Shipra Agrawal,et al. Analysis of Thompson Sampling for the Multi-armed Bandit Problem , 2011, COLT.
[2] Shipra Agrawal,et al. Thompson Sampling for Contextual Bandits with Linear Payoffs , 2012, ICML.
[3] Rémi Munos,et al. Thompson Sampling: An Asymptotically Optimal Finite-Time Analysis , 2012, ALT.
[4] Daniel A. Braun,et al. A Minimum Relative Entropy Principle for Learning and Acting , 2008, J. Artif. Intell. Res..
[5] Branko Ristic,et al. Beyond the Kalman Filter: Particle Filters for Tracking Applications , 2004 .
[6] J. Bather,et al. Multi‐Armed Bandit Allocation Indices , 1990 .
[7] Csaba Szepesvári,et al. Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.
[8] Aurélien Garivier,et al. The KL-UCB Algorithm for Bounded Stochastic Bandits and Beyond , 2011, COLT.
[9] Rudolf Ahlswede,et al. Maximum Number of Constant Weight Vertices of the Unit n-Cube Contained in a k-Dimensional Subspace , 2003, Comb..
[10] Lihong Li,et al. An Empirical Evaluation of Thompson Sampling , 2011, NIPS.
[11] Rémi Munos,et al. Thompson Sampling for 1-Dimensional Exponential Family Bandits , 2013, NIPS.
[12] Benjamin Van Roy,et al. (More) Efficient Reinforcement Learning via Posterior Sampling , 2013, NIPS.
[13] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .
[14] Thomas P. Hayes,et al. Stochastic Linear Optimization under Bandit Feedback , 2008, COLT.
[15] Andreas Krause,et al. Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2009, IEEE Transactions on Information Theory.
[16] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[17] Csaba Szepesvári,et al. –armed Bandits , 2022 .
[18] Jean-Yves Audibert,et al. Minimax Policies for Adversarial and Stochastic Bandits. , 2009, COLT 2009.
[19] Geoffrey I. Webb,et al. Encyclopedia of Machine Learning , 2011, Encyclopedia of Machine Learning.
[20] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 2022 .
[21] Benjamin Van Roy,et al. Learning to Optimize via Posterior Sampling , 2013, Math. Oper. Res..
[22] Timothy J. Robinson,et al. Sequential Monte Carlo Methods in Practice , 2003 .
[23] Peter Auer,et al. Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..
[24] Steven L. Scott,et al. A modern Bayesian look at the multi-armed bandit , 2010 .
[25] Gábor Lugosi,et al. Minimax Policies for Combinatorial Prediction Games , 2011, COLT.