Distributed reinforcement learning for a traffic engineering application

In this paper, the authors describe how a distributed reinforcement learning problem, in which the returns of many agents are simultaneously updating a single shared policy, is addressed by applying novel reinforcement learning techniques. A traffic simulator is used in the learning process. Two new algorithms are introduced: a value function-based algorithm and one that uses a direct policy evaluation approach. Both algorithms are shown to perform comparably well.

[1]  C. Watkins Learning from delayed rewards , 1989 .

[2]  Devika Subramanian,et al.  A Multistrategy Learning Scheme for Agent Knowledge Acquisition , 1993, Informatica.

[3]  Anton Schwartz,et al.  A Reinforcement Learning Method for Maximizing Undiscounted Rewards , 1993, ICML.

[4]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[5]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[6]  Andrew G. Barto,et al.  Improving Elevator Performance Using Reinforcement Learning , 1995, NIPS.

[7]  Andrew McCallum,et al.  Reinforcement learning with selective perception and hidden state , 1996 .

[8]  Luca Maria Gambardella,et al.  A Study of Some Properties of Ant-Q , 1996, PPSN.

[9]  Mark D. Pendrith,et al.  Estimator Variance in Reinforcement Learning: Theoretical Problems and Practical Solutions , 1997 .

[10]  Rahul Sukthankar,et al.  Evolving an intelligent vehicle for tactical reasoning in traffic , 1997, Proceedings of International Conference on Robotics and Automation.

[11]  Maja J. Mataric,et al.  Using Communication to Reduce Locality in Multi-Robot Learning , 1997, AAAI/IAAI.

[12]  Rahim F Benekohal,et al.  Lane assignment on automated highway systems , 1997 .

[13]  Maja J. Mataric,et al.  Reinforcement Learning in the Multi-Robot Domain , 1997, Auton. Robots.

[14]  Bart Selman,et al.  Boosting Combinatorial Search Through Randomization , 1998, AAAI/IAAI.

[15]  Pat Langley,et al.  Learning Cooperative Lane Selection Strategies for Highways , 1998, AAAI/IAAI.

[16]  Kagan Tumer,et al.  General principles of learning-based multi-agent systems , 1999, AGENTS '99.

[17]  Manuela M. Veloso,et al.  Multiagent Systems: A Survey from a Machine Learning Perspective , 2000, Auton. Robots.