Large Deviations Bounds for Markov Decision Processes Under General Policies
暂无分享,去创建一个
[1] Cyrus Derman,et al. Finite State Markovian Decision Processes , 1970 .
[2] John S. Edwards,et al. Linear Programming and Finite Markovian Control Problems , 1983 .
[3] Pravin Varaiya,et al. Stochastic Systems: Estimation, Identification, and Adaptive Control , 1986 .
[4] E. Altman,et al. Markov decision problems and state-action frequencies , 1991 .
[5] N. Shimkin. Extremal large deviations in controlled i.i.d. processes with applications to hypothesis testing , 1993, Advances in Applied Probability.
[6] Eitan Altman,et al. Rate of Convergence of Empirical Measures and Costs in Controlled Markov Chains and Transient Optimality , 1994, Math. Oper. Res..
[7] Robert G. Gallager,et al. Discrete Stochastic Processes , 1995 .
[8] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[9] John N. Tsitsiklis,et al. Introduction to linear optimization , 1997 .
[10] Amir Dembo,et al. Large Deviations Techniques and Applications , 1998 .
[11] Richard S. Sutton,et al. Dimensions of Reinforcement Learning , 1998 .
[12] P. Glynn,et al. Hoeffding's inequality for uniformly ergodic Markov chains , 2002 .
[13] S. Balajia,et al. Multiplicative ergodicity and large deviations for an irreducible Markov chain ( , .