Stochastic bandits with pathwise constraints
暂无分享,去创建一个
[1] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 2022 .
[2] I. M. Jacobs,et al. Principles of Communication Engineering , 1965 .
[3] Joseph Mitola,et al. Cognitive radio: making software radios more personal , 1999, IEEE Wirel. Commun..
[4] R. Agrawal. Sample mean based index policies by O(log n) regret for the multi-armed bandit problem , 1995, Advances in Applied Probability.
[5] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[6] Shie Mannor,et al. Stochastic bandits with pathwise constraints , 2012 .
[7] Keith W. Ross,et al. Randomized and Past-Dependent Policies for Markov Decision Processes with Multiple Constraints , 1989, Oper. Res..
[8] Armand M. Makowski,et al. Implementation Issues for Markov Decision Processes , 1988 .
[9] Shie Mannor,et al. A Geometric Approach to Multi-Criterion Reinforcement Learning , 2004, J. Mach. Learn. Res..
[10] Csaba Szepesvári,et al. Tuning Bandit Algorithms in Stochastic Environments , 2007, ALT.
[11] P. W. Jones,et al. Bandit Problems, Sequential Allocation of Experiments , 1987 .
[12] John N. Tsitsiklis,et al. Online Learning with Sample Path Constraints , 2009, J. Mach. Learn. Res..
[13] Armand M. Makowski,et al. A class of steering policies under a recurrence condition , 1988, Proceedings of the 27th IEEE Conference on Decision and Control.
[14] R. F.,et al. Mathematical Statistics , 1944, Nature.
[15] Wassim Jouini,et al. Multi-armed bandit based policies for cognitive radio's decision making issues , 2009, 2009 3rd International Conference on Signals, Circuits and Systems (SCS).