Multi-timescale nexting in a reinforcement learning robot
暂无分享,去创建一个
[1] F. W. Irwin. Purposive Behavior in Animals and Men , 1932, The Psychological Clinic.
[2] W. Brogden. Sensory pre-conditioning. , 1939 .
[3] Gwilym M. Jenkins,et al. Time series analysis, forecasting and control , 1971 .
[4] Michael Cunningham. Intelligence: Its Organization and Development , 1972 .
[5] P. Young,et al. Time series analysis, forecasting and control , 1972, IEEE Transactions on Automatic Control.
[6] Roger C. Schank,et al. Computer Models of Thought and Language , 1974 .
[7] R. Rescorla. Simultaneous and successive associations in sensory preconditioning. , 1980, Journal of experimental psychology. Animal behavior processes.
[8] Lennart Ljung,et al. System Identification: Theory for the User , 1987 .
[9] Richard S. Sutton,et al. Time-Derivative Models of Pavlovian Reinforcement , 1990 .
[10] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.
[11] M. Gabriel,et al. Learning and Computational Neuroscience: Foundations of Adaptive Networks , 1990 .
[12] Gary L. Drescher,et al. Made-up minds - a constructivist approach to artificial intelligence , 1991 .
[13] Geoffrey E. Hinton,et al. Feudal Reinforcement Learning , 1992, NIPS.
[14] Gerard Casey. Minds and machines , 1992 .
[15] Satinder P. Singh,et al. Reinforcement Learning with a Hierarchy of Abstract Models , 1992, AAAI.
[16] Leslie Pack Kaelbling,et al. Learning to Achieve Goals , 1993, IJCAI.
[17] Richard S. Sutton,et al. TD Models: Modeling the World at a Mixture of Time Scales , 1995, ICML.
[18] Michael I. Jordan,et al. An internal model for sensorimotor integration. , 1995, Science.
[19] Eduardo F. Camacho,et al. Model predictive control in the process industry , 1995 .
[20] Benjamin Kuipers,et al. Map Learning with Uninterpreted Sensors and Effectors , 1995, Artif. Intell..
[21] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[22] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..
[23] K. Carlsson,et al. Tickling Expectations: Neural Processing in Anticipation of a Sensory Stimulus , 2000, Journal of Cognitive Neuroscience.
[24] Paul R. Cohen,et al. A Method for Clustering the Experiences of a Mobile Robot that Accords with Human Judgments , 2000, AAAI/IAAI.
[25] Richard S. Sutton,et al. Predictive Representations of State , 2001, NIPS.
[26] Sebastian Thrun,et al. Online simultaneous localization and mapping with detection and tracking of moving objects: theory and results from a ground vehicle in crowded urban areas , 2003, 2003 IEEE International Conference on Robotics and Automation (Cat. No.03CH37422).
[27] Marko Bacic,et al. Model predictive control , 2003 .
[28] Martin V. Butz,et al. Anticipatory Behavior in Adaptive Learning Systems , 2003, Lecture Notes in Computer Science.
[29] Olivier Sigaud,et al. Anticipatory Behavior in Adaptive Learning Systems: Foundations, Theories, and Systems , 2003 .
[30] J. L. Roux. An Introduction to the Kalman Filter , 2003 .
[31] Michael R. James,et al. Predictive State Representations: A New Theory for Modeling Dynamical Systems , 2004, UAI.
[32] H. Sebastian Seung,et al. Stochastic policy gradient reinforcement learning on a simple 3D biped , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).
[33] Rick Grush,et al. The emulation theory of representation: Motor control, imagery, and perception , 2004, Behavioral and Brain Sciences.
[34] J. Hawkins,et al. On Intelligence , 2004 .
[35] Richard S. Sutton,et al. Temporal-Difference Networks , 2004, NIPS.
[36] Jean-Arcady Meyer,et al. Adaptive Behavior , 2005 .
[37] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[38] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[39] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.
[40] Steven M. LaValle,et al. Planning algorithms , 2006 .
[41] Sebastian Thrun,et al. Stanley: The robot that won the DARPA Grand Challenge , 2006, J. Field Robotics.
[42] D. Levitin. This Is Your Brain on Music , 2006 .
[43] C. Stevens,et al. Sweet Anticipation: Music and the Psychology of Expectation, by David Huron . Cambridge, Massachusetts: MIT Press, 2006 , 2007 .
[44] Pierre-Yves Oudeyer,et al. Intrinsic Motivation Systems for Autonomous Mental Development , 2007, IEEE Transactions on Evolutionary Computation.
[45] Martin V. Butz,et al. Anticipatory Behavior in Adaptive Learning Systems, From Brains to Individual and Social Behavior [the book is a result from the third workshop on anticipatory behavior in adaptive learning systems, ABiALS 2006, Rome, Italy, September 30, 2006, colocated with SAB 2006] , 2007, ABiALS book.
[46] R. Sutton,et al. A convergent O ( n ) algorithm for off-policy temporal-difference learning with linear function approximation , 2008, NIPS 2008.
[47] Richard S. Sutton,et al. A Convergent O(n) Temporal-difference Algorithm for Off-policy Learning with Linear Function Approximation , 2008, NIPS.
[48] Jun Tani,et al. Emergence of Functional Hierarchy in a Multiple Timescale Neural Network Model: A Humanoid Robot Experiment , 2008, PLoS Comput. Biol..
[49] Giovanni Pezzulo,et al. Coordinating with the Future: The Anticipatory Nature of Representation , 2008, Minds and Machines.
[50] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.
[51] Geoffrey W. Sutton. Stumbling on Happiness , 2008 .
[52] Shalabh Bhatnagar,et al. Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.
[53] R. Sutton. The Grand Challenge of Predictive Empirical Abstract Knowledge , 2009 .
[54] Byron Boots,et al. Closing the learning-planning loop with predictive state representations , 2009, Int. J. Robotics Res..
[55] Richard S. Sutton,et al. GQ(lambda): A general gradient algorithm for temporal-difference prediction learning with eligibility traces , 2010, Artificial General Intelligence.
[56] P. I. Pavlov. Conditioned reflexes: An investigation of the physiological activity of the cerebral cortex. , 1929, Annals of Neurosciences.
[57] R. Sutton,et al. GQ(λ): A general gradient algorithm for temporal-difference prediction learning with eligibility traces , 2010 .
[58] Patrick M. Pilarski,et al. Horde: a scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction , 2011, AAMAS.
[59] Richard S. Sutton,et al. Beyond Reward: The Problem of Knowledge and Data , 2011, ILP.
[60] R. Sutton,et al. Gradient temporal-difference learning algorithms , 2011 .
[61] Richard S. Sutton,et al. Multi-timescale Nexting in a Reinforcement Learning Robot , 2012, SAB.
[62] Patrick M. Pilarski,et al. Model-Free reinforcement learning with continuous action in practice , 2012, 2012 American Control Conference (ACC).
[63] Richard S. Sutton,et al. Scaling life-long off-policy learning , 2012, 2012 IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL).
[64] A. Clark. Whatever next? Predictive brains, situated agents, and the future of cognitive science. , 2013, The Behavioral and brain sciences.
[65] Paul W. Goldberg,et al. Autonomous Agents and Multiagent Systems , 2016, Lecture Notes in Computer Science.