论文信息 - Exponentiated Gradient Methods for Reinforcement Learning

Exponentiated Gradient Methods for Reinforcement Learning

This paper introduces and evaluates a natural extension of linear exponentiated gradient methods that makes them applicable to reinforcement learning problems. Just as these methods speed up supervised learning, we nd that they can also increase the ef-ciency of reinforcement learning. Comparisons are made with conventional reinforcement learning methods on two test problems using CMAC function approximators and replacing traces. On a small prediction task, exponentiated gradient methods showed no improvement, but on a larger control task (Mountain Car) they improved the learning speed by approximately 25%. A more detailed analysis suggests that the diierence may be due to the distribution of irrelevant features.

Doina Precup | Richard S. Sutton | R. Sutton | Doina Precup

[1] Manfred K. Warmuth,et al. Exponentiated Gradient Versus Gradient Descent for Linear Predictors , 1997, Inf. Comput..

[2] James P. Callan,et al. Training algorithms for linear text classifiers , 1996, SIGIR '96.

[3] James P. Callan,et al. Text-Based Information Retrieval Using Exponentiated Gradient Descent , 1996, NIPS.

[4] W. T. Miller,et al. CMAC: an associative neural network alternative to backpropagation , 1990, Proc. IEEE.

[5] HadzikadicMirsad,et al. Learning to Predict , 1997 .

[6] R. Sutton,et al. Empirical Comparison of Gradient Descent and Exponentiated Gradient Descent in Supervised and Reinforcement Learning , 1996 .

[7] Richard S. Sutton,et al. Reinforcement Learning with Replacing Eligibility Traces , 2005, Machine Learning.

[8] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .

[9] N. Littlestone. Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[10] Richard S. Sutton,et al. Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding , 1995, NIPS.