Fast Online Q(λ)

Q(λ)-learning uses TD(λ)-methods to accelerate Q-learning. The update complexity of previous online Q(λ) implementations based on lookup tables is bounded by the size of the state/action space. Our faster algorithm's update complexity is bounded by the number of actions. The method is based on the observation that Q-value updates may be postponed until they are needed.

[1]  James S. Albus,et al.  New Approach to Manipulator Control: The Cerebellar Model Articulation Controller (CMAC)1 , 1975 .

[2]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[3]  Teuvo Kohonen,et al.  Self-Organization and Associative Memory , 1988 .

[4]  C. Watkins Learning from delayed rewards , 1989 .

[5]  S. Thrun Eecient Exploration in Reinforcement Learning , 1992 .

[6]  Long-Ji Lin,et al.  Reinforcement learning for robots using neural networks , 1992 .

[7]  Gerald Tesauro,et al.  Practical Issues in Temporal Difference Learning , 1992, Mach. Learn..

[8]  Steven Douglas Whitehead,et al.  Reinforcement learning for the adaptive control of perception and action , 1992 .

[9]  Sebastian Thrun,et al.  Efficient Exploration In Reinforcement Learning , 1992 .

[10]  Bernd Fritzke Supervised Learning with Growing Cell Structures , 1993, NIPS.

[11]  Mahesan Niranjan,et al.  On-line Q-learning using connectionist systems , 1994 .

[12]  Pawel Cichosz,et al.  Truncating Temporal Differences: On the Efficient Implementation of TD(lambda) for Reinforcement Learning , 1994, J. Artif. Intell. Res..

[13]  Richard S. Sutton,et al.  Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding , 1995, NIPS.

[14]  Pawea Cichosz Truncating Temporal Diierences: on the Eecient Implementation of Td() for Reinforcement Learning , 1995 .

[15]  Richard S. Sutton,et al.  Reinforcement Learning with Replacing Eligibility Traces , 2005, Machine Learning.

[16]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[17]  Jürgen Schmidhuber,et al.  Speeding up Q(lambda)-Learning , 1998, ECML.

[18]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[19]  Gerald Tesauro,et al.  Practical issues in temporal difference learning , 1992, Machine Learning.

[20]  Andrew W. Moore,et al.  Locally Weighted Learning , 1997, Artificial Intelligence Review.

[21]  Peter Dayan,et al.  Technical Note: Q-Learning , 2004, Machine Learning.

[22]  R. Simmons,et al.  The effect of representation and knowledge on goal-directed exploration with reinforcement-learning algorithms , 2004, Machine Learning.

[23]  Jing Peng,et al.  Incremental multi-step Q-learning , 1994, Machine Learning.

[24]  Reid G. Simmons,et al.  The Effect of Representation and Knowledge on Goal-Directed Exploration with Reinforcement-Learning Algorithms , 2005, Machine Learning.

[25]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.