Efficient reinforcement learning in parameterized models: discrete parameters
暂无分享,去创建一个
[1] Ali Esmaili,et al. Probability and Random Processes , 2005, Technometrics.
[2] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[3] G. Grimmett,et al. Probability and random processes , 2002 .
[4] Pravin Varaiya,et al. Stochastic Systems: Estimation, Identification, and Adaptive Control , 1986 .
[5] P. Mandl,et al. Estimation and control in Markov chains , 1974, Advances in Applied Probability.
[6] Shie Mannor,et al. Efficient reinforcement learning in parameterized models: discrete parameters , 2008, Valuetools 2008.
[7] J. Andel. Sequential Analysis , 2022, The SAGE Encyclopedia of Research Design.
[8] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
[9] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[10] Lihong Li,et al. PAC model-free reinforcement learning , 2006, ICML.
[11] P. Kumar,et al. A new family of optimal adaptive controllers for Markov chains , 1982 .
[12] Kazuoki Azuma. WEIGHTED SUMS OF CERTAIN DEPENDENT RANDOM VARIABLES , 1967 .
[13] K. Cocks. Discrete Stochastic Programming , 1968 .
[14] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[15] C. Kraft. Some conditions for consistency and uniform consistency of statistical procedures , 1955 .
[16] Sham M. Kakade,et al. On the sample complexity of reinforcement learning. , 2003 .
[17] Michael Kearns,et al. Efficient Reinforcement Learning in Factored MDPs , 1999, IJCAI.
[18] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.
[19] Michael L. Littman,et al. A theoretical analysis of Model-Based Interval Estimation , 2005, ICML.
[20] L. Rogers,et al. Diffusions, Markov processes, and martingales , 1979 .
[21] John N. Tsitsiklis,et al. The Sample Complexity of Exploration in the Multi-Armed Bandit Problem , 2004, J. Mach. Learn. Res..
[22] T. Kailath. The Divergence and Bhattacharyya Distance Measures in Signal Selection , 1967 .
[23] Marcus Hutter,et al. Asymptotic Learnability of Reinforcement Problems with Arbitrary Dependence , 2006, ALT.