Local Bandit Approximation for Optimal Learning Problems
暂无分享,去创建一个
[1] Robert E. Kalaba,et al. On adaptive control processes , 1959 .
[2] D. Naidu,et al. Optimal Control Systems , 2018 .
[3] A. G. Butkovskiy,et al. Optimal control of systems , 1966 .
[4] J. MacQueen. A MODIFIED DYNAMIC PROGRAMMING METHOD FOR MARKOVIAN DECISION PROBLEMS , 1966 .
[5] J. K. Satia,et al. Markovian Decision Processes with Uncertain Transition Probabilities , 1973, Oper. Res..
[6] Yaakov Bar-Shalom,et al. Caution, Probing, and the Value of Information in the Control of Uncertain Systems , 1976 .
[7] V. Borkar,et al. Adaptive control of Markov chains, I: Finite parameter set , 1979 .
[8] V. Borkar,et al. Adaptive control of Markov chains, I: Finite parameter set , 1979, 1979 18th IEEE Conference on Decision and Control including the Symposium on Adaptive Processes.
[9] J. Gittins. Bandit processes and dynamic allocation indices , 1979 .
[10] D. R. Robinson. Algorithms for evaluating the dynamic allocation index , 1982, Oper. Res. Lett..
[11] Jean Walrand,et al. Extensions of the multiarmed bandit problem: The discounted case , 1985 .
[12] Michael N. Katehakis,et al. The Multi-Armed Bandit Problem: Decomposition and Computation , 1987, Math. Oper. Res..
[13] C. Watkins. Learning from delayed rewards , 1989 .
[14] J. Tsitsiklis. A short proof of the Gittins index theorem , 1993, Proceedings of 32nd IEEE Conference on Decision and Control.
[15] Michael O. Duff,et al. Q-Learning for Bandit Problems , 1995, ICML.
[16] P. Dayan,et al. Exploration bonuses and dual control , 1996 .