Current road-traffic optimisation practice around the world is a combination of hand tuned policies with a small degree of automatic adaption. Even state-of-the-art research controllers need good models of the road traffic, which cannot be obtained directly from existing sensors. We use a policy-gradient reinforcement learning approach to directly optimise the traffic signals, mapping currently deployed sensor observations to control signals. Our trained controllers are (theoretically) compatible with the traffic system used in Sydney and many other cities around the world. We apply two policy-gradient methods: (1) the recent natural actor-critic algorithm, and (2) a vanilla policy-gradient algorithm for comparison. Along the way we extend natural-actor critic approaches to work for distributed and online infinite-horizon problems.
M. Peifer,et al.
A.G. Sims,et al.
The Sydney coordinated adaptive traffic (SCAT) system philosophy and benefits
IEEE Transactions on Vehicular Technology.
Justin A. Boyan,et al.
Least-Squares Temporal Difference Learning
Yishay Mansour,et al.
Policy Gradient Methods for Reinforcement Learning with Function Approximation
Marco Wiering,et al.
Multi-Agent Reinforcement Leraning for Traffic Light Control
Peter L. Bartlett,et al.
Experiments with Infinite-Horizon, Policy-Gradient Estimation
J. Artif. Intell. Res..
Sham M. Kakade,et al.
A Natural Policy Gradient
Nathan H. Gartner,et al.
Traffic Flow Theory - A State-of-the-Art Report: Revised Monograph on Traffic Flow Theory
Yann LeCun,et al.
Large Scale Online Learning
Andrew Y. Ng,et al.
On Local Rewards and Scaling Distributed Reinforcement Learning
Stefan Schaal,et al.
Silvia Richter,et al.
Learning Road Traffic Control: Towards Practical Traffic Control Using Policy Gradients