Developing adaptive traffic signal control by actor-critic and direct exploration methods

Mohammad Aslani; Mohammad Saadi Mesgari; Stefan Seipel; Marco Wiering

doi:10.1680/jtran.17.00085

What is it about?

Designing efficient traffic signal controllers has always been an important concern in traffic engineering. This is owing to the complex and uncertain nature of traffic environments. Within such a context, reinforcement learning has been one of the most successful methods owing to its adaptability and its online learning ability. Reinforcement learning provides traffic signals with the ability automatically to determine the ideal behaviour for achieving their objective (alleviating traffic congestion). In fact, traffic signals based on reinforcement learning are able to learn and react flexibly to different traffic situations without the need of a predefined model of the environment. In this research, the actor–critic method is used for adaptive traffic signal control (ATSC-AC). Actor–critic has the advantages of both actor-only and critic-only methods. One of the most important issues in reinforcement learning is the trade-off between exploration of the traffic environment and exploitation of the knowledge already obtained. In order to tackle this challenge, two direct exploration methods are adapted to traffic signal control and compared with two indirect exploration methods. The results reveal that ATSC-ACs based on direct exploration methods have the best performance and they consistently outperform a fixed-time controller, reducing average travel time by 21%.

Why is it important?

In this paper, an adaptive traffic signal controller based on actor-critic (ATSC-AC) was presented. Actor-critic has the advantages of both actor-only and critic-only. Also, it has more suitable convergence properties in comparison to actor-only and critic-only. Each ATSC-AC tries to alleviate the traffic congestion of its intersection. At the beginning of each phase, the algorithm senses the current traffic condition and selects a green time duration based on its knowledge obtained through interaction with the traffic environment. Since ATSC-ACs do not possess enough knowledge of the traffic environment at the beginning of the simulation, they explore different green time durations regardless of their values. As time goes by and ATSC-ACs gain enough knowledge, they tend to exploit more by selecting those green times that have a fairly high value. In fact, they trade-off between exploration and exploitation. In order to do so, two direct exploration methods were adapted and their performances were compared with two indirect exploration techniques. In order to evaluate the proposed ATSC-AC, a 3×3 traffic network was employed, although the proposed method can be easily applied to larger traffic networks. The results indicate that ATSC-AC with direct exploration is the best controller and outperforms the fixed-time controller.

This page is a summary of: Developing adaptive traffic signal control by actor-critic and direct exploration methods, Proceedings of the Institution of Civil Engineers - Transport, January 2018, ICE Publishing,
DOI: 10.1680/jtran.17.00085.
You can read the full text:

Read

Contributors

The following have contributed to this page

Mohamm Aslani

Developing adaptive traffic signal control by actor–critic and direct exploration methods

What is it about?

Why is it important?

Contributors

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management

Developing adaptive traffic signal control by actor–critic and direct exploration methods

What is it about?

Featured Image

Why is it important?

Read the Original

Contributors

Share this page:

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management