暂无分享,去创建一个
[1] Huizhen Yu,et al. Weak Convergence Properties of Constrained Emphatic Temporal-difference Learning with Constant and Slowly Diminishing Stepsize , 2015, J. Mach. Learn. Res..
[2] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.
[3] Sanjoy Dasgupta,et al. Off-Policy Temporal Difference Learning with Function Approximation , 2001, ICML.
[4] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.
[5] Andrew G. Barto,et al. Reinforcement learning , 1998 .
[6] John N. Tsitsiklis,et al. Analysis of Temporal-Diffference Learning with Function Approximation , 1996, NIPS.
[7] Yousef Saad,et al. Iterative methods for sparse linear systems , 2003 .
[8] S. Meyn. Ergodic theorems for discrete time stochastic systems using a stochastic lyapunov function , 1989 .
[9] Marek Petrik,et al. Finite-Sample Analysis of Proximal Gradient TD Algorithms , 2015, UAI.
[10] Shie Mannor,et al. The Cross Entropy Method for Fast Policy Search , 2003, ICML.
[11] Shalabh Bhatnagar,et al. Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.
[12] Motoaki Kawanabe,et al. Generalized TD Learning , 2011, J. Mach. Learn. Res..
[13] E. Seneta. Non-negative Matrices and Markov Chains , 2008 .
[14] A. C. Brooms. Stochastic Approximation and Recursive Algorithms with Applications, 2nd edn by H. J. Kushner and G. G. Yin , 2006 .
[15] Justin A. Boyan,et al. Least-Squares Temporal Difference Learning , 1999, ICML.
[16] Le Song,et al. SBEED: Convergent Reinforcement Learning with Nonlinear Function Approximation , 2017, ICML.
[17] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[18] D. Bertsekas,et al. Weighted Bellman Equations and their Applications in Approximate Dynamic Programming ∗ , 2012 .
[19] Dimitri P. Bertsekas,et al. Error Bounds for Approximations from Projected Linear Equations , 2010, Math. Oper. Res..
[20] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[21] Martha White,et al. An Emphatic Approach to the Problem of Off-policy Temporal-Difference Learning , 2015, J. Mach. Learn. Res..
[22] Geoffrey E. Hinton,et al. Feudal Reinforcement Learning , 1992, NIPS.
[23] Huizhen Yu,et al. On Convergence of some Gradient-based Temporal-Differences Algorithms for Off-Policy Learning , 2017, ArXiv.
[24] Richard S. Sutton,et al. Multi-step Off-policy Learning Without Importance Sampling Ratios , 2017, ArXiv.
[25] Jan Peters,et al. Policy evaluation with temporal differences: a survey and comparison , 2015, J. Mach. Learn. Res..
[26] M. Schäl,et al. Stationary policies and Markov policies in Borel dynamic programming , 1987 .
[27] John N. Tsitsiklis,et al. Asynchronous Stochastic Approximation and Q-Learning , 1994, Machine Learning.
[28] R. Cooke. Real and Complex Analysis , 2011 .
[29] Marc G. Bellemare,et al. Safe and Efficient Off-Policy Reinforcement Learning , 2016, NIPS.
[30] R. Sutton. The Grand Challenge of Predictive Empirical Abstract Knowledge , 2009 .
[31] H. Kushner,et al. Stochastic Approximation and Recursive Algorithms and Applications , 2003 .
[32] Huizhen Yu,et al. On Convergence of Emphatic Temporal-Difference Learning , 2015, COLT.
[33] Matthieu Geist,et al. Off-policy learning with eligibility traces: a survey , 2013, J. Mach. Learn. Res..
[34] Richard L. Tweedie,et al. Markov Chains and Stochastic Stability , 1993, Communications and Control Engineering Series.
[35] R. S. Randhawa,et al. Combining importance sampling and temporal difference control variates to simulate Markov Chains , 2004, TOMC.
[36] R. Sutton,et al. A convergent O ( n ) algorithm for off-policy temporal-difference learning with linear function approximation , 2008, NIPS 2008.
[37] Peter Stone,et al. Reinforcement learning , 2019, Scholarpedia.
[38] Richard S. Sutton,et al. Weighted importance sampling for off-policy learning with linear function approximation , 2014, NIPS.
[39] Charles R. Johnson,et al. Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.
[40] Huizhen Yu. Some Simulation Results for Emphatic Temporal-Difference Learning Algorithms , 2016, ArXiv.
[41] R. Sutton,et al. Gradient temporal-difference learning algorithms , 2011 .
[42] Donald L. Iglehart,et al. Importance sampling for stochastic simulations , 1989 .
[43] Bo Liu,et al. Proximal Reinforcement Learning: A New Theory of Sequential Decision Making in Primal-Dual Spaces , 2014, ArXiv.
[44] Bruno Scherrer,et al. Should one compute the Temporal Difference fix point or minimize the Bellman Residual? The unified oblique projection view , 2010, ICML.
[45] Huizhen Yu,et al. Least Squares Temporal Difference Methods: An Analysis under General Conditions , 2012, SIAM J. Control. Optim..
[46] Dudley,et al. Real Analysis and Probability: Measurability: Borel Isomorphism and Analytic Sets , 2002 .
[47] E. Nummelin. General irreducible Markov chains and non-negative operators: List of symbols and notation , 1984 .
[48] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[49] Sayan Mukherjee,et al. Bayesian group factor analysis with structured sparsity , 2016, J. Mach. Learn. Res..
[50] Richard S. Sutton,et al. TD Models: Modeling the World at a Mixture of Time Scales , 1995, ICML.
[51] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .