暂无分享,去创建一个
[1] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.
[2] Shimon Whiteson,et al. GradientDICE: Rethinking Generalized Offline Estimation of Stationary Values , 2020, ICML.
[3] Ronald A. Howard,et al. Dynamic Programming and Markov Processes , 1960 .
[4] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[5] Philip Bachman,et al. Deep Reinforcement Learning that Matters , 2017, AAAI.
[6] Donald E. Kirk,et al. Optimal control theory : an introduction , 1970 .
[7] Ali H. Sayed,et al. Distributed Policy Evaluation Under Multiple Behavior Strategies , 2013, IEEE Transactions on Automatic Control.
[8] Richard S. Sutton,et al. A Convergent O(n) Temporal-difference Algorithm for Off-policy Learning with Linear Function Approximation , 2008, NIPS.
[9] Sergey Levine,et al. Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models , 2018, NeurIPS.
[10] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[11] Le Song,et al. SBEED: Convergent Reinforcement Learning with Nonlinear Function Approximation , 2017, ICML.
[12] Shimon Whiteson,et al. Provably Convergent Two-Timescale Off-Policy Actor-Critic with Function Approximation , 2019, ICML.
[13] Herke van Hoof,et al. Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.
[14] Yao Liu,et al. Combining Parametric and Nonparametric Models for Off-Policy Evaluation , 2019, ICML.
[15] Lihong Li,et al. Stochastic Variance Reduction Methods for Policy Evaluation , 2017, ICML.
[16] Gabriel Dulac-Arnold,et al. Challenges of Real-World Reinforcement Learning , 2019, ArXiv.
[17] Qiang Liu,et al. Black-box Off-policy Estimation for Infinite-Horizon Reinforcement Learning , 2020, ICLR.
[18] Ilya Kostrikov,et al. AlgaeDICE: Policy Gradient from Arbitrary Experience , 2019, ArXiv.
[19] Thorsten Joachims,et al. MOReL : Model-Based Offline Reinforcement Learning , 2020, NeurIPS.
[20] Yifei Ma,et al. Optimal Off-Policy Evaluation for Reinforcement Learning with Marginalized Importance Sampling , 2019, NeurIPS.
[21] Alborz Geramifard,et al. Dyna-Style Planning with Linear Function Approximation and Prioritized Sweeping , 2008, UAI.
[22] Yao Liu,et al. Representation Balancing MDPs for Off-Policy Policy Evaluation , 2018, NeurIPS.
[23] Martha White,et al. Planning with Expectation Models , 2019, IJCAI.
[24] Panos M. Pardalos,et al. Approximate dynamic programming: solving the curses of dimensionality , 2009, Optim. Methods Softw..
[25] Lantao Yu,et al. MOPO: Model-based Offline Policy Optimization , 2020, NeurIPS.
[26] Shie Mannor,et al. Consistent On-Line Off-Policy Evaluation , 2017, ICML.
[27] J. Zico Kolter,et al. The Fixed Points of Off-Policy TD , 2011, NIPS.
[28] Richard S. Sutton,et al. Learning and Planning in Average-Reward Markov Decision Processes , 2020, ArXiv.
[29] Marek Petrik,et al. Finite-Sample Analysis of Proximal Gradient TD Algorithms , 2015, UAI.
[30] Philip S. Thomas,et al. High-Confidence Off-Policy Evaluation , 2015, AAAI.
[31] Qiang Liu,et al. Doubly Robust Bias Reduction in Infinite Horizon Off-Policy Estimation , 2019, ICLR.
[32] Shimon Whiteson,et al. Mean-Variance Policy Iteration for Risk-Averse Reinforcement Learning , 2020, AAAI.
[33] Emma Brunskill,et al. Off-Policy Policy Gradient with State Distribution Correction , 2019, UAI 2019.
[34] Huizhen Yu,et al. On Convergence of some Gradient-based Temporal-Differences Algorithms for Off-Policy Learning , 2017, ArXiv.
[35] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[36] Justin A. Boyan,et al. Least-Squares Temporal Difference Learning , 1999, ICML.
[37] Masatoshi Uehara,et al. Minimax Weight and Q-Function Learning for Off-Policy Evaluation , 2019, ICML.
[38] Geoffrey E. Hinton,et al. Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.
[39] Lihong Li. A perspective on off-policy evaluation in reinforcement learning , 2019, Frontiers of Computer Science.
[40] Bo Liu,et al. Proximal Reinforcement Learning: A New Theory of Sequential Decision Making in Primal-Dual Spaces , 2014, ArXiv.
[41] Qiang Liu,et al. Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation , 2018, NeurIPS.
[42] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[43] Bo Liu,et al. A Block Coordinate Ascent Algorithm for Mean-Variance Optimization , 2018, NeurIPS.
[44] Bo Dai,et al. GenDICE: Generalized Offline Estimation of Stationary Values , 2020, ICLR.
[45] Martha White,et al. An Emphatic Approach to the Problem of Off-policy Temporal-Difference Learning , 2015, J. Mach. Learn. Res..
[46] Peter L. Bartlett,et al. POLITEX: Regret Bounds for Policy Iteration using Expert Prediction , 2019, ICML.
[47] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.
[48] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[49] Dilan Görür,et al. A maximum-entropy approach to off-policy evaluation in average-reward MDPs , 2020, NeurIPS.
[50] Dimitri P. Bertsekas,et al. Convergence Results for Some Temporal Difference Methods Based on Least Squares , 2009, IEEE Transactions on Automatic Control.
[51] Bo Dai,et al. DualDICE: Behavior-Agnostic Estimation of Discounted Stationary Distribution Corrections , 2019, NeurIPS.
[52] Marc G. Bellemare,et al. Off-Policy Deep Reinforcement Learning by Bootstrapping the Covariate Shift , 2019, AAAI.
[53] Shalabh Bhatnagar,et al. Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.
[54] V. Borkar. Stochastic Approximation: A Dynamical Systems Viewpoint , 2008, Texts and Readings in Mathematics.
[55] Nan Jiang,et al. Doubly Robust Off-policy Value Evaluation for Reinforcement Learning , 2015, ICML.
[56] Shalabh Bhatnagar,et al. A Convergent Off-Policy Temporal Difference Algorithm , 2019, ECAI.
[57] Peter Norvig,et al. Artificial Intelligence: A Modern Approach , 1995 .