Beyond Reward: The Problem of Knowledge and Data
暂无分享,去创建一个
[1] R. Sutton,et al. A convergent O ( n ) algorithm for off-policy temporal-difference learning with linear function approximation , 2008, NIPS 2008.
[2] Andrew McCallum,et al. Dynamic conditional random fields: factorized probabilistic models for labeling and segmenting sequence data , 2004, J. Mach. Learn. Res..
[3] R. Sutton,et al. Investigating Experience: Temporal Coherence and Empirical Knowledge Representation , 2007 .
[4] Jean-Daniel Zucker,et al. Abstraction, Reformulation and Approximation, 6th International Symposium, SARA 2005, Airth Castle, Scotland, UK, July 26-29, 2005, Proceedings , 2005, SARA.
[5] Doina Precup,et al. Temporal abstraction in reinforcement learning , 2000, ICML 2000.
[6] R. Sutton,et al. Macro-Actions in Reinforcement Learning: An Empirical Analysis , 1998 .
[7] R. Sutton. The Grand Challenge of Predictive Empirical Abstract Knowledge , 2009 .
[8] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[9] R. Sutton,et al. Gradient temporal-difference learning algorithms , 2011 .
[10] Richard S. Sutton,et al. Temporal Abstraction in Temporal-difference Networks , 2005, NIPS.
[11] Doina Precup,et al. Learning Options in Reinforcement Learning , 2002, SARA.
[12] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[13] R. Sutton,et al. GQ(λ): A general gradient algorithm for temporal-difference prediction learning with eligibility traces , 2010 .
[14] Richard S. Sutton,et al. A Convergent O(n) Temporal-difference Algorithm for Off-policy Learning with Linear Function Approximation , 2008, NIPS.
[15] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[16] Patrick M. Pilarski,et al. Horde: a scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction , 2011, AAMAS.
[17] Shalabh Bhatnagar,et al. Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.
[18] Andrew G. Barto,et al. Building Portable Options: Skill Transfer in Reinforcement Learning , 2007, IJCAI.
[19] Nuttapong Chentanez,et al. Intrinsically Motivated Reinforcement Learning , 2004, NIPS.
[20] Shalabh Bhatnagar,et al. Toward Off-Policy Learning Control with Function Approximation , 2010, ICML.
[21] Richard S. Sutton,et al. Multi-timescale Nexting in a Reinforcement Learning Robot , 2012, SAB.
[22] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..
[23] Shalabh Bhatnagar,et al. Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation , 2009, NIPS.
[24] Richard S. Sutton,et al. GQ(lambda): A general gradient algorithm for temporal-difference prediction learning with eligibility traces , 2010, Artificial General Intelligence.
[25] Thomas Degris,et al. Scaling-up Knowledge for a Cognizant Robot , 2012, AAAI Spring Symposium: Designing Intelligent Robots.
[26] John N. Tsitsiklis,et al. Analysis of Temporal-Diffference Learning with Function Approximation , 1996, NIPS.
[27] Ronald E. Parr,et al. Hierarchical control and learning for markov decision processes , 1998 .
[28] Shie Mannor,et al. Dynamic abstraction in reinforcement learning via clustering , 2004, ICML.