论文信息 - Distributed reinforcement learning for a traffic engineering application

Distributed reinforcement learning for a traffic engineering application

In this paper, the authors describe how a distributed reinforcement learning problem, in which the returns of many agents are simultaneously updating a single shared policy, is addressed by applying novel reinforcement learning techniques. A traffic simulator is used in the learning process. Two new algorithms are introduced: a value function-based algorithm and one that uses a direct policy evaluation approach. Both algorithms are shown to perform comparably well.

Mark D. Pendrith

[1] C. Watkins. Learning from delayed rewards , 1989 .

[2] Devika Subramanian,et al. A Multistrategy Learning Scheme for Agent Knowledge Acquisition , 1993, Informatica.

[3] Anton Schwartz,et al. A Reinforcement Learning Method for Maximizing Undiscounted Rewards , 1993, ICML.

[4] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[5] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..

[6] Andrew G. Barto,et al. Improving Elevator Performance Using Reinforcement Learning , 1995, NIPS.

[7] Andrew McCallum,et al. Reinforcement learning with selective perception and hidden state , 1996 .

[8] Luca Maria Gambardella,et al. A Study of Some Properties of Ant-Q , 1996, PPSN.

[9] Mark D. Pendrith,et al. Estimator Variance in Reinforcement Learning: Theoretical Problems and Practical Solutions , 1997 .

[10] Rahul Sukthankar,et al. Evolving an intelligent vehicle for tactical reasoning in traffic , 1997, Proceedings of International Conference on Robotics and Automation.

[11] Maja J. Mataric,et al. Using Communication to Reduce Locality in Multi-Robot Learning , 1997, AAAI/IAAI.

[12] Rahim F Benekohal,et al. Lane assignment on automated highway systems , 1997 .

[13] Maja J. Mataric,et al. Reinforcement Learning in the Multi-Robot Domain , 1997, Auton. Robots.

[14] Bart Selman,et al. Boosting Combinatorial Search Through Randomization , 1998, AAAI/IAAI.

[15] Pat Langley,et al. Learning Cooperative Lane Selection Strategies for Highways , 1998, AAAI/IAAI.

[16] Kagan Tumer,et al. General principles of learning-based multi-agent systems , 1999, AGENTS '99.

[17] Manuela M. Veloso,et al. Multiagent Systems: A Survey from a Machine Learning Perspective , 2000, Auton. Robots.