Representations for Stable Off-Policy Reinforcement Learning
暂无分享,去创建一个
[1] R. E. Kalman,et al. Linear system theory-The state space approach , 1965 .
[2] Pierre Priouret,et al. Adaptive Algorithms and Stochastic Approximations , 1990, Applications of Mathematics.
[3] Etienne Barnard,et al. Temporal-difference methods and Markov models , 1993, IEEE Trans. Syst. Man Cybern..
[4] Peter Dayan,et al. Improving Generalization for Temporal Difference Learning: The Successor Representation , 1993, Neural Computation.
[5] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[6] Geoffrey J. Gordon. Stable Function Approximation in Dynamic Programming , 1995, ICML.
[7] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[8] John N. Tsitsiklis,et al. Analysis of Temporal-Diffference Learning with Function Approximation , 1996, NIPS.
[9] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..
[10] Sean P. Meyn,et al. The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning , 2000, SIAM J. Control. Optim..
[11] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..
[12] Steven J. Bradtke,et al. Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.
[13] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[14] Marek Petrik,et al. An Analysis of Laplacian Methods for Value Function Approximation in MDPs , 2007, IJCAI.
[15] Lihong Li,et al. Analyzing feature generation for value-function approximation , 2007, ICML '07.
[16] Sridhar Mahadevan,et al. Proto-value Functions: A Laplacian Framework for Learning Representation and Control in Markov Decision Processes , 2007, J. Mach. Learn. Res..
[17] Lihong Li,et al. An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning , 2008, ICML '08.
[18] Dimitri P. Bertsekas,et al. Basis function adaptation methods for cost approximation in MDP , 2009, 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning.
[19] Shalabh Bhatnagar,et al. Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation , 2009, NIPS.
[20] Robert H. Halstead,et al. Matrix Computations , 2011, Encyclopedia of Parallel Computing.
[21] D. Bertsekas. Approximate policy iteration: a survey and some new methods , 2011 .
[22] Yimin Wei,et al. A sharp version of Bauer-Fike's theorem , 2012, J. Comput. Appl. Math..
[23] Jan Peters,et al. Policy evaluation with temporal differences: a survey and comparison , 2015, J. Mach. Learn. Res..
[24] Samuel Gershman,et al. Design Principles of the Hippocampal Cognitive Map , 2014, NIPS.
[25] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[26] Alexei A. Efros,et al. Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[27] Shie Mannor,et al. Shallow Updates for Deep Reinforcement Learning , 2017, NIPS.
[28] Marc G. Bellemare,et al. A Distributional Perspective on Reinforcement Learning , 2017, ICML.
[29] Tom Schaul,et al. Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.
[30] Marlos C. Machado,et al. Eigenoption Discovery through the Deep Successor Representation , 2017, ICLR.
[31] Matteo Hessel,et al. Deep Reinforcement Learning and the Deadly Triad , 2018, ArXiv.
[32] Shie Mannor,et al. Finite Sample Analyses for TD(0) With Function Approximation , 2017, AAAI.
[33] Yann Ollivier,et al. Approximate Temporal Difference Learning is a Gradient Descent for Reversible Policies , 2018, ArXiv.
[34] Marc G. Bellemare,et al. DeepMDP: Learning Continuous Latent Space Models for Representation Learning , 2019, ICML.
[35] Oleg O. Sushkov,et al. A Practical Approach to Insertion with Variable Socket Position Using Deep Reinforcement Learning , 2018, 2019 International Conference on Robotics and Automation (ICRA).
[36] Dimitri P. Bertsekas,et al. Feature-based aggregation and deep reinforcement learning: a survey and some new implementations , 2018, IEEE/CAA Journal of Automatica Sinica.
[37] Marek Petrik,et al. Fast Feature Selection for Linear Value Function Approximation , 2019, ICAPS.
[38] Yifan Wu,et al. The Laplacian in RL: Learning Representations with Efficient Approximations , 2018, ICLR.
[39] Nicolas Le Roux,et al. A Geometric Perspective on Optimal Representations for Reinforcement Learning , 2019, NeurIPS.
[40] Joelle Pineau,et al. Combined Reinforcement Learning via Abstract Representations , 2018, AAAI.
[41] Martha White,et al. Two-Timescale Networks for Nonlinear Value Function Approximation , 2019, ICLR.
[42] Karol Hausman,et al. Quantile QT-Opt for Risk-Aware Vision-Based Robotic Grasping , 2019, Robotics: Science and Systems.
[43] L. Trefethen,et al. Spectra and Pseudospectra , 2020 .