A Geometric Perspective on Optimal Representations for Reinforcement Learning
暂无分享,去创建一个
Nicolas Le Roux | Dale Schuurmans | Tor Lattimore | Marc G. Bellemare | Clare Lyle | Adrien Ali Taïga | Will Dabney | Pablo Samuel Castro | Robert Dadashi | Dale Schuurmans | Robert Dadashi | Will Dabney | P. S. Castro | Tor Lattimore | Clare Lyle
[1] Richard L. Lewis,et al. Discovery of Useful Questions as Auxiliary Tasks , 2019, NeurIPS.
[2] Richard S. Sutton,et al. TD Models: Modeling the World at a Mixture of Time Scales , 1995, ICML.
[3] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[4] Tom Schaul,et al. Successor Features for Transfer in Reinforcement Learning , 2016, NIPS.
[5] Rémi Munos,et al. Error Bounds for Approximate Policy Iteration , 2003, ICML.
[6] Matteo Hessel,et al. Deep Reinforcement Learning and the Deadly Triad , 2018, ArXiv.
[7] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[8] Patrick M. Pilarski,et al. Horde: a scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction , 2011, AAMAS.
[9] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Vol. II , 1976 .
[10] Sean R Eddy,et al. What is dynamic programming? , 2004, Nature Biotechnology.
[11] Richard S. Sutton,et al. Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding , 1995, NIPS.
[12] Marc G. Bellemare,et al. DeepMDP: Learning Continuous Latent Space Models for Representation Learning , 2019, ICML.
[13] Marc G. Bellemare,et al. Dopamine: A Research Framework for Deep Reinforcement Learning , 2018, ArXiv.
[14] Vladlen Koltun,et al. Learning to Act by Predicting the Future , 2016, ICLR.
[15] Marlos C. Machado,et al. State of the Art Control of Atari Games Using Shallow Reinforcement Learning , 2015, AAMAS.
[16] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[17] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents (Extended Abstract) , 2012, IJCAI.
[18] Arthur L. Samuel,et al. Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..
[19] Oriol Vinyals,et al. Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.
[20] Marlos C. Machado,et al. Eigenoption Discovery through the Deep Successor Representation , 2017, ICLR.
[21] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[22] Shie Mannor,et al. Regularized Policy Iteration with Nonparametric Function Spaces , 2016, J. Mach. Learn. Res..
[23] Marek Petrik,et al. An Analysis of Laplacian Methods for Value Function Approximation in MDPs , 2007, IJCAI.
[24] Marek Petrik,et al. Robust Approximate Bilinear Programming for Value Function Approximation , 2011, J. Mach. Learn. Res..
[25] Nicolas Le Roux,et al. The Value Function Polytope in Reinforcement Learning , 2019, ICML.
[26] Shie Mannor,et al. Basis Function Adaptation in Temporal Difference Reinforcement Learning , 2005, Ann. Oper. Res..
[27] Stephen P. Boyd,et al. Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.
[28] R. Tyrrell Rockafellar,et al. Convex Analysis , 1970, Princeton Landmarks in Mathematics and Physics.
[29] Samuel Gershman,et al. Design Principles of the Hippocampal Cognitive Map , 2014, NIPS.
[30] Gerald Tesauro,et al. Temporal difference learning and TD-Gammon , 1995, CACM.
[31] Dimitri P. Bertsekas,et al. Feature-based aggregation and deep reinforcement learning: a survey and some new implementations , 2018, IEEE/CAA Journal of Automatica Sinica.
[32] Peter Dayan,et al. Improving Generalization for Temporal Difference Learning: The Successor Representation , 1993, Neural Computation.
[33] Lawrence Carin,et al. Linear Feature Encoding for Reinforcement Learning , 2016, NIPS.
[34] Rémi Munos,et al. Performance Bounds in Lp-norm for Approximate Value Iteration , 2007, SIAM J. Control. Optim..
[35] Yifan Wu,et al. The Laplacian in RL: Learning Representations with Efficient Approximations , 2018, ICLR.
[36] Marcello Restelli,et al. Boosted Fitted Q-Iteration , 2017, ICML.
[37] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..
[38] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .
[39] Peter Dayan,et al. Structure in the Space of Value Functions , 2002, Machine Learning.
[40] Thomas J. Walsh,et al. Towards a Unified Theory of State Abstraction for MDPs , 2006, AI&M.
[41] Marc G. Bellemare,et al. An Atari Model Zoo for Analyzing, Visualizing, and Comparing Deep Reinforcement Learning Agents , 2018, IJCAI.
[42] Shie Mannor,et al. Automatic basis function construction for approximate dynamic programming and reinforcement learning , 2006, ICML.
[43] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[44] Bastian Goldlücke,et al. Variational Analysis , 2014, Computer Vision, A Reference Guide.
[45] Alec Solway,et al. Optimal Behavioral Hierarchy , 2014, PLoS Comput. Biol..
[46] Marcus Hutter,et al. Feature Reinforcement Learning: Part I. Unstructured MDPs , 2009, J. Artif. Gen. Intell..
[47] Sergey Levine,et al. Near-Optimal Representation Learning for Hierarchical Reinforcement Learning , 2018, ICLR.
[48] Richard S. Sutton,et al. Predictive Representations of State , 2001, NIPS.
[49] Lihong Li,et al. An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning , 2008, ICML '08.
[50] Doina Precup,et al. Sparse Distributed Memories for On-Line Value-Based Reinforcement Learning , 2004, ECML.
[51] Vivek S. Borkar,et al. Feature Search in the Grassmanian in Online Reinforcement Learning , 2013, IEEE Journal of Selected Topics in Signal Processing.
[52] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..
[53] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..
[54] Lihong Li,et al. Analyzing feature generation for value-function approximation , 2007, ICML '07.
[55] Bo Liu,et al. Basis Construction from Power Series Expansions of Value Functions , 2010, NIPS.
[56] O. G. Selfridge,et al. Pandemonium: a paradigm for learning , 1988 .
[57] Joelle Pineau,et al. Combined Reinforcement Learning via Abstract Representations , 2018, AAAI.
[58] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[59] Bahram Behzadian,et al. Feature Selection by Singular Value Decomposition for Reinforcement Learning , 2019 .
[60] Tom Schaul,et al. Universal Value Function Approximators , 2015, ICML.
[61] Marc G. Bellemare,et al. A Distributional Perspective on Reinforcement Learning , 2017, ICML.
[62] Jens Vygen,et al. The Book Review Column1 , 2020, SIGACT News.
[63] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[64] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.
[65] Marlos C. Machado,et al. A Laplacian Framework for Option Discovery in Reinforcement Learning , 2017, ICML.
[66] Sridhar Mahadevan,et al. Proto-value Functions: A Laplacian Framework for Learning Representation and Control in Markov Decision Processes , 2007, J. Mach. Learn. Res..
[67] Dimitri P. Bertsekas,et al. Basis function adaptation methods for cost approximation in MDP , 2009, 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning.
[68] Shie Mannor,et al. Shallow Updates for Deep Reinforcement Learning , 2017, NIPS.
[69] Michael L. Littman,et al. Near Optimal Behavior via Approximate State Abstraction , 2016, ICML.
[70] Martha White,et al. An Emphatic Approach to the Problem of Off-policy Temporal-Difference Learning , 2015, J. Mach. Learn. Res..
[71] Tom Schaul,et al. Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.
[72] Doina Precup,et al. Representation Discovery for MDPs Using Bisimulation Metrics , 2015, AAAI.
[73] Marcin Andrychowicz,et al. Hindsight Experience Replay , 2017, NIPS.
[74] Martha White,et al. Two-Timescale Networks for Nonlinear Value Function Approximation , 2019, ICLR.