Sage Journals: Discover world-class research

Abstract

Over the past few decades, numerous adaptive traffic signal control (ATSC) algorithms have been proposed to alleviate traffic congestion and optimize traffic mobility using real-time traffic data, such as data from connected vehicles (CVs). However, most of the existing ATSC algorithms do not consider optimizing traffic safety, likely because of the lack of tools to evaluate safety in real time. In this paper, we propose a novel ATSC algorithm for real-time safety optimization. The algorithm utilizes a traditional Reinforcement Learning approach (i.e., Q-learning) as well as recently developed extreme value theory (EVT) real-time crash prediction models. The algorithm was validated using real-world traffic video data collected from two signalized intersections in British Columbia. The results indicated that, compared with an existing fully actuated signal controller, the developed algorithm can significantly reduce the real-time crash risk by 43% to 45% at the intersection’s approaches even at low CVs market penetration rates.

Keywords

operations traffic simulation automated/autonomous/connected vehicles microscopic traffic simulation surrogate safety measures traffic management and control vehicle trajectory traffic control devices traffic signals safety performance and analysis crash prediction models

Adaptive traffic signal control (ATSC) systems have been receiving considerable interest in recent years. This interest is expected to grow with the availability of real-time traffic data from emerging connected vehicles (CVs) and advances in sensing technologies. ATSC systems use real-time traffic data to optimize traffic efficiency and minimize traffic delay. The mobility-oriented ATSC techniques have demonstrated considerable benefits in enhancing traffic efficiency at signalized intersections ( 1 – 9 ). However, the safety impact of ATSC systems was not considered in many of these evaluation studies. Only a few studies considered evaluating the safety impact of mobility-oriented ATSC algorithms, producing inconsistent results ( 10 – 15 ). Some studies have shown that ATSC systems can have considerable safety benefits. The safety improvements were represented by either a reduction in the number of crashes ( 10 , 11 , 13 , 16 ) or traffic conflicts ( 15 , 17 ). On the other hand, other studies showed little impact on crashes or even an increase in surrogate safety measures, such as traffic conflicts, after the implementation of ATSC systems ( 12 , 18 ). This may be because ATSC systems focusing on only optimizing traffic flow and maximizing traffic efficiency may not provide improved traffic safety ( 19 ).

There has been some previous work on optimizing the safety of signalized intersections using microsimulation models and the Surrogate Safety Assessment Model (SSAM) ( 19 – 21 ). The optimization process described in these studies includes tuning parameters such as the cycle length, splits, offsets, and left-turn phase sequence at signalized intersections. Then, several signal designs were proposed and evaluated offline, and the safety level of each design was investigated using SSAM. However, using SSAM to evaluate safety has many limitations and may not produce valid results ( 22 , 23 ). Moreover, it is important to evaluate real-time control strategies using real-time traffic data. In this case, self-learning ATSC optimization techniques can be more effective and reliable than offline methods. They can adapt quickly to real-time traffic changes and cover all possible traffic states. The real-time safety optimization of traffic signals has not been generally considered in existing ATSC systems. This is possibly because of the unavailability of adequate real-time safety evaluation tools.

Recently, prediction models for real-time safety evaluation have been developed and validated ( 24 – 26 ). These models predict the number of traffic conflicts (as a safety surrogate) for a short time period (i.e., signal cycles) using several dynamic traffic variables (e.g., traffic volume, shock wave area, shock wave speed, queue length, and platoon ratio). The models can quantitatively assess the safety level of dynamic traffic conditions per signal cycle. Although these models are useful for real-time safety evaluation, the use of traffic conflicts as a safety measure is generally associated with two main shortcomings. First, as reducing crashes is the ultimate goal of safety improvements, measures based on crash risk should be used in the real-time safety evaluation of signalized intersections. Second, to identify a severe conflict event, a threshold for a conflict indicator (e.g., time-to-collision of 1.5 s) should be defined; the results of the safety evaluation can vary depending on the selection of this threshold.

Acknowledging the above-noted limitations of using traffic conflicts, researchers have proposed use of extreme value theory (EVT) for conflict-based crash-risk estimation ( 27 – 31 ). In the EVT approach, traffic crashes can be estimated from the extreme value distribution of traffic conflicts. Recent research has applied the EVT approach to develop an advanced method for real-time safety evaluation of signalized intersections ( 32 , 33 ). In this method, safety indices (e.g., the risk of collision) can be obtained and related to dynamic traffic parameters, reflecting the real-time safety level of the signalized intersection ( 32 , 33 ). Such a method enables development of new ATSC strategies to directly minimize crash risk and optimize safety of signalized intersections in real time.

In this paper, we propose a Q-learning-based adaptive signal control algorithm for real-time safety optimization (QASCS) to minimize crash risk. EVT-based real-time crash prediction models for signalized intersections ( 33 ) were applied to define the reward and evaluate safety. Using these EVT models, two real-time safety measures, the risk of collision (ROC) and the return level of a cycle (RLC), can be extracted to estimate the safety level of each signal cycle based on dynamic traffic parameters. Real-time data from CVs and dynamic traffic changes were considered in the algorithm. Moreover, validation of the trained algorithm was performed using real-world video data collected from two signalized intersections in the city of Surrey, British Columbia, Canada. The novel contribution of this study is the real-time safety optimization of traffic signals using direct crash-risk measures.

Literature Review

Real-Time Crash Prediction Models

Recently, considerable research has been conducted in modeling real-time crash risk to manage and evaluate traffic safety proactively. For example, Abdel-Aty and Abdalla developed a model to predict daytime crashes on a freeway using real-time roadway geometric features and traffic flow characteristics ( 34 ). Other studies introduced real-time safety performance functions for signalized intersections to evaluate traffic safety ( 35 , 36 ). In these studies, crash risk was related to several traffic parameters in a period of less than 1 h. Considering a shorter time period, other studies related the number of traffic conflicts to dynamic traffic parameters (volume, shock wave area, shock wave speed, queue length) at the signal cycle level ( 24 , 25 ). A comprehensive review of the existing real-time crash prediction models was conducted by Hossain et al. ( 37 ).

EVT Models

There has been significant interest in using traffic conflicts to estimate crashes and develop crash-risk measures from observable non-extreme frequent events (i.e., conflicts). This can be realized by applying the EVT, in which models can be developed to enable extrapolation from observed levels to unobserved levels of a stochastic phenomenon. The application of EVT for road safety was first proposed by Campbell et al. and Songchitruksa and Tarko ( 38 , 39 ), who developed several EVT models which were the foundation for most of the subsequent EVT studies in road safety. Other important studies can be found elsewhere (27 –31, 40 –42).

Recently, the use of EVT models in safety analysis has witnessed considerable advances, including new methods and applications. New applications included the use of EVT models to conduct before–after safety evaluations ( 42 – 44 ). New methods included the development of bivariate EVT models to integrate more than one conflict indicator ( 32 , 45 , 46 ), and developing Bayesian hierarchical extreme value models to combine conflicts from several sites to account for non-stationarity and unobserved heterogeneity in conflict extremes ( 32 , 47 ).

Zheng and Sayed proposed an approach for real-time crash-risk prediction at signalized intersections within the EVT framework ( 33 ). Generalized extreme value (GEV) models were developed based on conflict extremes. A Bayesian hierarchical model (BHM) that combines traffic conflict extremes from different sites, incorporating the influence of dynamic traffic parameters, was developed and used to estimate real-time safety indices to measure the safety level of signalized intersections at the signal cycle level. These indices include the ROC and RLC. The ROC can directly identify the cycles with crash-prone traffic conditions when it exceeds zero, whereas the RLC can characterize and differentiate the safety levels even for safe cycles (i.e., ROC = 0).

ATSC Algorithms

ATSC algorithms have recently been implemented in many jurisdictions worldwide to alleviate traffic congestion and reduce delays. The Sydney coordinated adaptive traffic system was the earliest ATSC algorithm (SCATS) ( 1 ). Earlier ATSC examples include the split, cycle and offset optimization technique (SCOOT) ( 48 ) and the real-time demand-responsive traffic signal control framework “Optimization Policies for Adaptive Control” (OPAC) ( 49 ). More recently, a cost-effective solution for applying adaptive control system (ACS-Lite) was proposed by the Federal Highway Administration ( 2 ). All the previously mentioned ATSC techniques aim to address the same objective of improving traffic efficiency and mobility by reducing delays and congestions, even though they use different operations. Although ATSC algorithms provide performance improvements over pre-timed and actuated signals, they experience several operational limitations ( 7 ). These include the difficulty of obtaining adequate microscopic data from available sensors such as loop detectors, and accounting for the large variations in traffic. Another main limitation of existing ATSCs is that they do not consider optimizing traffic safety in relation to crash-risk reduction.

Traffic Signal Optimization Using CV data

With the increasing emergence of CV technology, numerous traffic signal control algorithms have recently been proposed to optimize traffic efficiency using real-time data from CVs. Some studies, for example, proposed various algorithms to optimize and coordinate traffic movement in road intersections without using any traffic lights, assuming that all vehicles are connected and autonomous ( 50 – 53 ). More realistically, other studies assumed various market penetration rates (MPRs) of CVs to develop and test ATSC algorithms. The developed algorithms generally aim at minimizing the total delay ( 54 – 58 ). Several studies have also considered multiple objectives, such as minimizing the total delay and the number of stops ( 59 ), or minimizing the total delay and the queue length ( 60 ). Most of the existing algorithms optimize the traffic signal timing based on real-time vehicle information, assuming one-way vehicle-to-infrastructure (V2I) communications. Some algorithms, however, optimize both the traffic signal timing and vehicle trajectories, assuming a specific percentage of autonomous vehicles and bidirectional V2I communications (56, 61 –64). Although the majority of previous studies have mainly focused on adapting traffic signals to improve mobility, a limited number of studies have considered optimizing traffic signals to reduce traffic emissions and fuel consumption ( 62 – 64 ). On the other hand, optimizing traffic safety has generally been disregarded. More details and a systematic review of research on using real-time CV data for urban traffic signal control can be found in a recent study by Guo et al. ( 65 ).

Use of Reinforcement Learning for Traffic Control

Reinforcement learning (RL) is a machine learning approach that analyzes how agents can take actions to maximize their cumulative reward. RL techniques have been proposed in the literature for signal control as they are suited to the stochastic and dynamic traffic environments. These techniques can learn the control policy by interacting with the environment directly without the need of a model of the traffic environment or human intervention ( 66 – 68 ). Several studies were conducted to evaluate ATSC systems using RL with the objective of organizing traffic movements and reducing delays and congestions. Self-learning ATSCs were implemented using different techniques and methods, including model-based Q-learning and multi-agent self-learning ( 6 , 69 ), Q-learning (3 –5, 70 –73) and Deep Q-Network methodology ( 8 , 9 ). The primary objective of all the above-mentioned studies is minimizing traffic delays and travel times to optimize mobility. Safety optimization was not included in previous RL-ATSC techniques. However, these techniques have demonstrated substantial traffic mobility improvements.

Methodology

RL Formulation

RL is an area of machine learning that analyzes how an agent is interacting with the surrounding environment to realize a goal ( 74 ). The agent discovers by guided trial and error how to react to acquire the most reward, instead of giving it explicit examples of the desired actions such as in the supervised machine learning approach. In this study, RL is applied to develop the proposed algorithm for signal control to reduce the risk of crashes. The agent or the decision maker in this application is represented by the signal control unit, and the surrounding environment is represented by dynamic traffic environment changes in the intersection area. During the learning process, the agent selects randomly from a set of actions (i.e., signal phases) given a specific state of the surrounding environment, then it learns by receiving reward or penalty for the selected action. The agent iteratively seeks to maximize the total reward it receives over time. Therefore, the outputs that maximize the received rewards over time are being selectively retrained by the RL algorithm (Figure 1). The control policy is defined in the algorithm as a mapping of the perceived states of the environment to the pre-defined possible actions corresponding to these states. Thus, the agent learns the control policy over time by trial and error to maximize the total cumulative reward in the long run by performing appropriate actions. Subsequently, the selected actions are not only affecting the immediate reward, but they are affecting the future rewards as well ( 74 ).

Figure 1.

Reinforcement learning framework.

Modeling the Environment

A signalized intersection in Surrey, British Columbia was simulated in this study using the microsimulation platform (VISSIM 7) ( 76 ) to mimic the CVs environment and to investigate the performance of the proposed QASCS algorithm. The modeled intersection consists of four approaches with each approach having two through lanes and one left-turn lane. In the QASCS algorithm, the decision maker (i.e., the agent) receives real-time information (i.e., V2I) from all CVs located within a pre-defined distance from the stop lines. This distance defines the standard V2I dedicated short-range communication (DSRC) zone. The typical value of the DSRC is less than 1,000 m ( 77 ). Previous research on CV-based applications has used DSRC values between 150 and 300 m. We selected the average of this range (i.e., 225 m). To simulate CVs in VISSIM, a new vehicle class (i.e., Connected Vehicle) was created. Various traffic composition percentages were defined, representing the market penetration rates of CVs. For the non-connected vehicles, loop detectors were installed to provide real-time information to the traffic controller about traffic counting of all lanes.

Q-Learning

There are three main methods for formulating the RL algorithms and learning the optimal policy: dynamic programming (DP), Monte Carlo techniques (MC), and temporal difference (TD) learning. The three methods share some similarities and distinctions. For example, the DP method requires a model of the environment to be defined, whereas in the MC and TD methods, the RL algorithm can learn directly from interacting with the surrounding environment. Likewise, the DP and TD methods share an advantage of updating the estimates at each time step without waiting for the final outcome such as in the MC method ( 66 , 75 ). TD methods are the most convenient for solving the ATSC control problem ( 7 , 66 ). There are two main types of TD methods: the on-policy SARSA method and the off-policy Q-learning method. In this study, the 1-step off-policy Q-learning method was selected. This method uses the experience of each state to update one element in the Q-table. This entry in the table is a Q-value for a specific state–action pair $Q (s, a)$ . When the agent performs action $a_{t}$ at state $s_{t}$ , leading to a new state $s_{t + 1}$ and a reward $r_{t + 1}$ , the Q-learning algorithm improves its policy by updating the Q-table according to Bellman’s equation as follows:

Q_{t + 1} (s_{t}, a_{t}) = Q_{t} (s_{t}, a_{t}) + α_{t + 1} [r_{t + 1} + γ \max a Q_{t} (s_{t + 1}, a_{t + 1}) - Q_{t} (s_{t}, a_{t})]

(1)

The learning rate ( $α_{t + 1})$ in the previous equation refers to the number of visits the agent made of each state–action pair $(s_{t}, a_{t})$ . This rate is estimated at each time step as follows ( 7 , 75 ):

α_{t + 1} = \frac{1}{V_{t + 1} (s_{t}, a_{t})}

(2)

where

$s_{t}$ , $a_{t}$ : the current state and the selected action at the current state;

$Q_{t + 1}$ , $Q_{t}$ : the updated and the old Q-value;

$r_{t + 1}$ : the reward of applying action $a_{t}$ at state $s_{t}$ ;

$s_{t + 1}$ , $a_{t + 1}$ : the new state and the best action at the new state;

$α_{t + 1}$ : the learning rate;

$V_{t + 1} :$ the number of visits to a particular state–action pair;

γ: a discount factor, a value of 0.5 is chosen.

State Definition

In Q-learning, a tabular form is usually used to represent all state–action pairs. This approach of storing the states in a look-up table is questionable, especially when RL is applied to stochastic environments that possibly include an infinite number of states. Including many states in the Q-matrix will result in most states not being experienced by the agent. Potentially, a generalization from the states that were visited previously to the ones that have never been experienced (i.e., function approximation) may be helpful to solve this issue. Popular methods of generalization include artificial neural networks and statistical curve fitting ( 66 , 75 ). However, there are many negative consequences of the imperfect value estimations by the generalization of the states, such as the divergence of Q-estimates ( 66 ). A simpler discrete table (i.e., the Q-table) can be created to define the possible states of the environment. In addition, the states can then be divided into ranges and defined in the Q-matrix. This discretization of the states might solve the problem of having an infinite number of states. A Q-matrix with discretized ranges of states was successfully introduced in several previous studies (3, 4, 7, 69, 70, 72). The discretization method was applied in this study for state representation.

The state is represented by the current green phase and the status of the total number of vehicles within a range (i.e., DSRC = 225 m) on its incoming approaches upstream of the stop line. The overall objective of the proposed algorithm in this research is to optimize traffic safety. Therefore, an arrival-queue factor that represents positions and speeds of vehicles and the real-time traffic condition is introduced. The arrival-queue factor of an approach is a weighted sum of the number of vehicles that exist at this approach. This weighted sum considers the position and speed of every vehicle. If the vehicle is stopping or moving at speed less than 5 km/h (i.e., vehicle is in a queue), it is counted as one vehicle. Otherwise, it is counted as a fraction (i.e., between zero and one). The value of this fraction depends on the distance from the vehicle position to the end of the queue or to the stop line, whichever is shorter. To cover most of the possible states of the environment, the value of the arrival-queue factor is divided into 15 ranges to create a Q-matrix. The factor for each approach is calculated as follows:

f_{arrival - queue (App)} = \sum_{i = 1}^{n} f_{arrival - queue (i)}

(3)

f_{arrival - queue (i)} = {\begin{matrix} 1 & if Si \leq 5 km / h \\ 1 / \exp (a . Di) & Else \end{matrix}

(4)

where

$f_{arrival - queue (App)}$ : the arrival-queue factor for the approach;

$f_{arrival - queue (i)}$ : the arrival-queue factor for vehicle i;

n: the number of vehicles that exist on the approach;

Di: the distance from the stop line or the end of the queue to vehicle i;

Si: the speed of vehicle i;

a: constant.

Action Definition

In RL-ATSC algorithms, defining the next green phase is the action that is taken by the signal controller. The number of possible actions that the signal controller can choose from varies depending on the phasing sequence scheme. Two phasing sequence schemes were defined in the literature: the fixed phasing sequence ( 3 , 4 , 7 , 70 , 72 ) and the variable phasing sequence (5, 7 –9, 69, 73, 78). If the phasing sequence is fixed, there are only two actions in the action space, either extending the green time for the current green phase or switching the green light to the next phase. In the variable phasing sequence, the action space consists of N actions, where N is the number of phases. In this paper, the fixed phasing sequence was adopted for the proposed algorithm.

At each time step, the agent of the proposed algorithm implements one of two actions, either extending the green time (A1) or switching the green light to the next phase (A2). In the case in which the agent selects A1, the current green phase of the through movements is extended by a time (t). If A2 is selected, the green light will be switched to the next phase and its minimum green time ( $G_{\min}$ ) applied after accounting for the yellow (Y) and the all-red (AR) times. The time interval (t) between the decision points for the proposed algorithm can be estimated using:

t = {\begin{matrix} 5 & (if A 1) \\ Y + AR + Gmin & (if A 2) \end{matrix}

(5)

where

t: the time interval between the decision points in seconds;

Y: the yellow time in seconds;

AR: the all-red time in seconds;

Gmin: the minimum green time in seconds.

Following the standard signal timing manual ( 79 ), standard constraint values for minimum green, all-red, and yellow times are applied in the QASCS algorithm to ensure its feasibility to be implemented in the real world. The assumed values of $Gmin$ , Y, and AR are 10 s, 4 s, and 2 s, respectively. In addition, a typical value of 70 s is assumed for the maximum green time ( 79 ). It is worth mentioning that the signal control algorithm is not allowed to apply action A1 (i.e., extending the green time) if the maximum green value is reached.

It is worth noting that selecting the update time interval (t) was relatively tricky. With too short an update time interval, it will be difficult to estimate the immediate reward of the applied action, as the new state of the environment will be almost the same as the old state. On the other side, a relatively long update time interval does not enable the algorithm to capture the variation in the environment’s state between consecutive actions, making the algorithm less adaptive to real-time traffic conditions. Therefore, after several preliminary trials, we assumed a reasonable value of 5 s for the update time interval (if A1 is selected). However, investigating the results’ sensitivity to the update time interval value is a recommended area of future research.

Action Selection Strategy

In RL, the agent is accumulating the maximum reward through exploiting the best rewarding actions. Moreover, it needs to explore new actions, to make better action selections in the future. Exploration enables the agent to visit more state–action pairs to converge the optimal policy ( 7 ). The agent should exploit and explore new actions at the same time to obtain the optimal policy. Therefore, an action selection strategy should be applied to balance the exploration and the exploitation. Typically, ∈-greedy and SoftMax algorithms were adopted as action selection strategies in most previous research ( 75 ).

In this study, the ∈-greedy method was employed as the action selection strategy. In this method, the greedy action is selected most of the time except for ∈-time when a random action is selected uniformly. At the beginning of the learning process, the rate of exploration is higher than the rate of exploitation, as the agent does not know much about the environment. Then, the agent exploits more until the end of the learning process as it converges to the optimal policy ( 75 ). Thus, the exploration rate was assumed to be decreasing gradually using an exponentially decreasing function ( 7 ) as follows:

\in = e^{- En}

(6)

where E is a constant, and n is the age of the agent (i.e., the iteration number).

Real-Time Collision Prediction Models

The real-time collision prediction models developed by Zheng and Sayed ( 33 ) are utilized in this research for reward definition and for evaluating the safety effectiveness of the proposed QASCS algorithm. These real-time collision prediction models are Bayesian hierarchal EVT models developed using traffic conflicts as intermediate for real-time crash prediction ( 32 , 47 ) (Figure 2). The models can be used to predict the ROC using cycle-level dynamic traffic parameters (i.e., traffic volume, shock wave area, and platoon ratio) as covariates. The EVT models were originally developed using real-world video data from four signalized intersections in the city of Surrey, British Columbia, Canada. Traffic conflicts were identified by the MTTC conflict indicator ( 80 ). The cycle-level dynamic traffic variables (i.e., traffic volume [V], shock wave area [A], and platoon ratio [P]) were extracted from the real-world video data using computer vision techniques. Three Bayesian hierarchal GEV models (BHM_GEV) were developed in Zheng and Sayed ( 33 ), considering the three dynamic traffic covariates added to location paramater, scale paramater, and both location and scale paramaters, respectively. Furthermore, the ROC and RLC measures were obtained from the GEV distribution. The model parameters are estimated as follows:

{\begin{matrix} μ ij = α μ 0 + α μ X + ε μ j \\ ϕ ij = α ϕ 0 + α ϕ X + ε ϕ j \\ ξ ij = α ξ 0 + ε ξ j \end{matrix}

(7)

where αμ0, αϕ0 and αξ0 are the three intercept terms corresponding to the three model parameters location, scale, and shape, respectively. εμj, εϕj, and εξj are random error terms to account for additional heterogeneity that is not directly addressed by the covariates. X is the vector of covariates, $α μ, α ϕ$ are the vectors of regression parameters. It is observed that the random error terms demonstrate the variances between different sites. The three intercept terms can be added to the three random error terms in Equation 7, resulting in a three random intercept terms, $α μ j, α ϕ j$ , and $α ξ j .$ Therefore, the previous equation can be written as a random intercept model as follows:

{\begin{matrix} μ ij = α μ j + α μ X \\ ϕ ij = α ϕ j + α ϕ X \\ ξ ij = α ξ j \end{matrix}

(8)

The parameters of the utilized best-fitted model are shown in Table 1.

Figure 2.

Real-time Bayesian Hierarchical Models (BHM) at the cycle level.

Table 1.

Parameters of the Best-Fitted Model ( 33 )

Parameter (mean, SD)		First intersection (128 St & 72 Ave)		Second intersection (132 St & 72 Ave)
$μ$	$α μ j$	−1.4330	0.1090	−1.7440	0.1079
	αμ(V)	0.0390	0.0062	0.0390	0.0062
	αμ(A)	0.1086	0.0260	0.1086	0.0260
	αμ(P)	−0.3623	0.0839	−0.3623	0.0839
$ϕ$	$α ϕ j$	−0.8953	0.1082	−0.6276	0.0996
	αϕ(A)	−0.1438	0.0485	−0.1438	0.0485
$ξ$	$α ξ j$	−0.3850	0.0535	−0.3589	0.0785

Note: SD = standard deviation.

Reward Definition

The main objective of this research is to optimize the safety of signalized intersections by minimizing the ROC in real time. Therefore, there was a need of quantitative measures that can reflect the fluctuating safety levels of dynamic traffic conditions cycle-by-cycle, to represent the algorithm’s reward or penalty. The ROC and RLC were selected as two RL rewards. In the proposed QASCS algorithm, the ROC and RLC were estimated from real-time collision prediction models ( 28 , 33 ) using cycle-level dynamic traffic parameters (i.e., traffic volume, shock wave area, and platoon ratio) as covariates. Although two safety indices were used as rewards in this study, each indicator was introduced to the algorithm separately.

The ROC is a non-negative indicator. A value of zero indicates a safe cycle with no risk of crash, whereas a ROC greater than zero indicates a positive crash risk. The RLC is a standard prediction in extreme value analysis that also reflects the safety level of a cycle. A value greater than or equal to zero for RLC indicates positive ROC of the cycle, and RLC less than zero implies that no crash risk is predicted. It is worth mentioning that the ROC and RLC are positively correlated ( 28 , 33 ). Also, smaller values of ROC and RLC indicate safer signal cycles. Thus, the reward function for each state–action pair is defined by ROC and RLC as penalty. The ROC was estimated at each lane of the four approaches, and the sum of all the lanes of all the approaches was then obtained for each cycle. For the other safety indicator, a value of RLC was obtained for each lane of each approach, then a weighted average of RLC for all lanes of all the approaches was estimated cycle-by-cycle. Both indicators were then input to the algorithm as penalty at the end of each cycle. Eventually, the reward was distributed equally as a delayed penalty $(r_{t + 1})$ to the cycle actions (n) to update the Q-value. The two safety measures as well as the reward value are estimated as follows:

RO C_{ci} = \Pr {zi \geq 0} = 1 - Gi (0) = {\begin{matrix} 1 - \exp {- {[1 - ξ i \frac{μ i}{σ i}]}^{- \frac{1}{ξ i}}} for ξ \neq 0 \\ 1 - \exp [- \exp (\frac{μ i}{σ i})] for ξ = 0 \end{matrix}

(9)

RL C_{pi} = {\begin{matrix} μ i - \frac{σ i}{ξ i} [1 - {- \log (1 - p)}^{- ξ i} \\ μ i - \log {- \log (1 - p)} \end{matrix} for ξ \neq 0

(10)

r_{t + 1} = - \frac{\sum_{l = 1}^{k} RO C_{ci}}{n}

(11)

RL C_{WA} = \frac{\sum_{l = 1}^{k} RL C_{pi} * V}{\sum_{l = 1}^{k} V}

(12)

r_{t + 1} = - \frac{RL C_{WA}}{n}

(13)

where

$RO C_{ci}$ : the risk of collision of cycle (i), at each lane (l).

$RL C_{pi}$ : the return level of a cycle (i) associated with the return period $\frac{1}{P}$ , at each lane (l) giving that G ( $RL C_{pi}$ ) = 1–p.

$RL C_{WA}$ : weighted average RLC of all lanes of all approaches.

$V$ : the lane traffic volume.

n: the number of cycle actions.

k: the number of lanes.

$ξ i$ , $μ i$ , $σ i$ : the parameters of the EVT model.

X: the vector of model covariates.

Training the Algorithm

The proposed QASCS algorithm was trained using the simulation platform VISSIM to find the optimal policy. The simulation was run for 500 iterations for both safety indices (i.e., ROC, RLC). Each iteration was divided into a 1,000-s warming-up period, a 500-s cooling-down period, and a 3,600-s (i.e., an hour) training period. The total training time for each safety index was more than two million seconds. It was observed that after 400 iterations, the proposed algorithm converged to the optimal policy. At each time interval (t) as shown in Equation 5, a new state of the environment is defined after pausing the simulation, the agent selects the best action and applies it, and finally the Q-value is updated. Afterwards, a reward is received at the end of each cycle as a delayed reward and divided backward equally to the cycle actions. For each signal controller, 10 different random seeds were applied, and the results were then averaged. The minimum required number of random seeds to compare the performance measures of the two alternatives (i.e., the proposed QASCS and the ASC benchmark) was estimated, following the methodology provided in Dowling et al. ( 81 ). The statistical analysis showed that 10 simulation runs are sufficient to reject the null hypothesis at 95% confidence level. This means the differences in the performance measures are caused by using two different alternatives and not just a result of using different random seeds. As well, to enable the algorithm to visit more states, the traffic volume at each approach was defined as a random volume between 200 and 1,600 vehicles/hour. It should be noted that although the traffic microsimulation VISSIM was used to train the algorithm, the ROC in this research is not based on the driving behavior in the simulation model. Rather, the previously mentioned EVT models are used for the crash-risk prediction. These EVT models include cycle-level dynamic traffic parameters (i.e., traffic volume, shock wave area, and platoon ratio) as covariates. Previous research has shown that these dynamic traffic parameters can be estimated from traffic simulation with a reasonable accuracy ( 23 ). Traffic conflicts from VISSIM are not used in the analysis because of the simulation model’s inability to capture the actual driving behavior and to simulate drivers’ mistakes.

Validation of the Proposed Algorithm

Real-world traffic data from two signalized intersections in the city of Surrey, British Columbia, Canada, were used to validate the proposed algorithm. The first intersection is 72nd Avenue and 128 Street, and the second intersection is 72nd Avenue and 132 Street. Figure 3 shows the two signalized intersections and the studied approaches. Both intersections are urban signalized intersections and are controlled by a typical fully actuated signal control (ASC). The trained QASCS algorithm and the existing ASC were both simulated in a VISSIM model for each intersection.

Figure 3.

Study intersections and approaches.

VISSIM models of the two selected intersections came from previous studies ( 22 , 82 ). The VISSIM models were built to accurately match actual field conditions in relation to intersection geometry, traffic volumes, traffic composition, and traffic signal settings (i.e., the actuated signal controller). The real-world ASC was defined in VISSIM using the Ring Barrier Controller (RBC) module. Visual inspection was also performed to ensure that there are no abnormal movements of the simulated vehicles. In addition, the VISSIM models were precisely calibrated in Essa and Sayed ( 22 , 82 ) using a comprehensive two-step calibration procedure. The first calibration step aimed to match the simulated delay times with the field-observed delay times. This was achieved by matching the arrival pattern and the desired speed to the field conditions. The second calibration step aimed at enhancing the correlation between field-observed and simulated traffic conflicts by calibrating the VISSIM parameters. First, important VISSIM parameters that had the most significant effect on the simulated conflicts were determined through a sensitivity analysis. Subsequently, a Genetic Algorithm was applied to estimate the best values of these parameters with the objective of enhancing the correlation between field-observed and simulated conflicts. Table 2 shows the selected VISSIM parameters and their calibrated values at each intersection ( 22 , 82 ).

Table 2.

Calibrated VISSIM Parameters ( 22 , 82 )

Parameter	Description	Unit	Default value	Calibrated value (128 St &72 Ave)	Calibrated value (132 St & 72 Ave)
Standstill distance	The desired distance between stopped vehicles	m	1.50	2.50	2.10
Headway time	The time that a driver wants to keep	s	0.90	1.3	1.30
Following thresholds	The thresholds which control the speed differences during the “Following” state	NA	±0.35	±0.25	±1.10
Reduction factor for safety distance closed to stop line	This reduction factor defines the vehicle behavior close to stop line at signalized intersections	NA	0.60	0.75	0.60
Start upstream of stop line	Distance upstream of the stop line of signalized intersection	m	100	110	100
Desired deceleration	Desired deceleration is used as the maximum for: the deceleration caused by a desired speed decision; the deceleration in case of Stop & Go traffic, when closing up to a preceding vehicle; the deceleration toward an emergency stop position (route); and for co-operative braking	m/s²	−2.80	−2.80	−2.80

Note: NA = Not Applicable.

After developing the simulation models for both intersections using ASC and QASCS algorithm, the two measures ROC and RLC were estimated and compared for both signal controllers. The calibrated simulation models were run separately for each signal controller for 9 h (i.e., the available real-world video data are from 9:00 a.m. to 6:00 p.m.). Table 3 shows the location, date of the video data collection, number of lanes, and traffic volume statistics for each intersection. The ASC was simulated using the RBC module, whereas the QASCS algorithm was represented by an external supporting code. Simulated traffic data were constantly extracted and saved for each simulation run, such as position and speed of each vehicle crossing the intersection, the vehicle type (e.g., connected or non-connected), and the indication of all signal heads.

Table 3.

Study Intersections and Camera Scenes

City (province)	Intersected roads	Video data were recorded in	Number of lanes per approach	Traffic signal timing (seconds)	Traffic volume statistics
Surrey (BC)	72 Ave & 128 St	March 28th, 2012	1 (left)	Red: 31–57	Total volume (9 h) = 29,610 vehicles
			2 (through)	Yellow: 4*	Max. hourly volume = 4,186 vph
				Green: 29–64	Min. hourly volume = 2,343 vph
				Green: 29–64	Average hourly volume = 3,290 vph
Surrey (BC)	72 Ave & 132 St	April 3rd, 2012	1 (left)	Red: 17–49	Total volume (9 h) = 25,197 vehicles
			2 (through)	Yellow: 4*	Max. hourly volume = 3,353 vph
				Green: 37–69	Min. hourly volume = 2,142 vph
				Green: 37–69	Average hourly volume = 2,800 vph

Note: vph = vehicles per hour; Min. = minimum; Max. = maximum.

Dynamic traffic parameters were extracted for each signal cycle (e.g., shock wave area, platoon ratio, traffic volume). These dynamic parameters were then used in the EVT model ( 33 ) shown in Table 1 and Equations 9 and 10 to estimate the ROC, and RLC for each cycle. Furthermore, the annual frequency of crashes and severe conflicts for the whole intersection were obtained. Equations 9 and 14 were used to calculate the annual crash frequency, and Equations 9 and 15 were employed to estimate the number of extreme conflicts per year. The value of $δ$ in Equation 15 is a small value to ensure that the risk of severe conflicts is greater than zero ( $δ$ = −0.5 is used in this study) ( 42 ).

N_{t} = \frac{T}{t} \sum_{i = 1}^{m} RO C_{i}

(14)

RO C_{ci} = \Pr {zi \geq 0} = 1 - Gi (0) = {\begin{matrix} 1 - \exp {- {[1 - ξ i \frac{(δ - μ i)}{σ i}]}^{- \frac{1}{ξ i}}} for ξ \neq 0 \\ 1 - \exp [- \exp (\frac{δ - μ i}{σ i})] for ξ = 0 \end{matrix}

(15)

where

t: the conflict observation period (daytime hours only);

m: the number of blocks (cycles) corresponding to the observation period;

T: a long period (a year);

$N_{t}$ : the estimated annual number of crashes.

Validation Results

A comparison between the existing real-world ASC and the trained QASCS algorithm was conducted. The results indicated that the proposed algorithm improved traffic safety considerably at both intersections. The number of cycles with ROC as estimated using Equation 9 was reduced from 103 cycles to 18 and 13 cycles for the first intersection using ROC and RLC as reward functions, respectively. For the second intersection, the number of cycles with ROC was reduced from 139 cycles to 22 and 29 cycles using ROC and RLC as reward functions, respectively. Furthermore, taking into consideration the strong correlation between ROC and RLC, the weighted average RLC was estimated using Equation 12 and compared for each hour of the day for the ASC and the proposed QASCS (Figures 4 and 5). Reduction values of 48% and 41% in the weighted average RLC were observed at the first intersection when using ROC and RLC as reward functions, respectively. For the second intersection, 46% reduction was obtained after using both reward functions (i.e., ROC and RLC) separately.

Figure 4.

Weighted average return level of a cycle (RLC) at the two studied locations before and after implementing the proposed algorithm with risk of collision (ROC) as a reward.

Figure 5.

Weighted average return level of a cycle (RLC) at the two studied locations before and after implementing the proposed algorithm with RLC as a reward.

The real-time variation of RLC is shown in Figures 6 –9 for both locations and both safety rewards. These values were calculated using Equation 10; positive RLC values imply that positive crash frequency is expected, whereas negative RLC values indicate that the cycle is safe and lower values of RLC represent safer signal cycles. As shown in the following figures, a reduction in RLC was observed after applying the proposed QASCS algorithm in most of the approaches and for both reward functions. Although the value of RLC has not improved significantly for some cycles, they are still safe as RLC remains below zero.

Figure 6.

Cycle-by-cycle fluctuation of RLC at each approach of the first intersection (72 Ave and 128 St) before and after implementing the QASCS algorithm using ROC reward.

Figure 7.

Cycle-by-cycle fluctuation of RLC at each approach of the second intersection (72 Ave and 132 St) before and after implementing the QASCS algorithm using ROC reward.

Figure 8.

Cycle-by-cycle fluctuation of RLC at each approach of the first intersection (72 Ave and 128 St) before and after implementing the QASCS algorithm using RLC reward.

Figure 9.

Cycle-by-cycle fluctuation of RLC at each approach of the second intersection (72 Ave and 132 St) before and after implementing the QASCS algorithm using RLC reward.

The results shown in Figures 6 –9 indicate that the RLC values of the QASCS generally have higher variability than that of the ASC. The reason is that the two controllers are completely different in relation to the operation mechanism. The QASCS utilizes dense and detailed traffic data from CVs, whereas the ASC relies on relatively limited traffic information captured by loop detectors. Thus, the QASCS is more adaptive to the real-time variation in traffic conditions, and it results in higher variability among consecutive signal cycles in relation to the optimized signal-timing plan (e.g., cycle length) and, subsequently, the RLC value.

Tables 4 and 5 summarize the validation results. It was also noted that the number of extreme conflicts and the number of crashes can be calculated and compared for QASCS and the ASC algorithms. In this case, the results would show reductions in extreme conflicts and crashes reaching more than 95%. However, given the short time period (limited number of hours), the calculation of these values is subject to very large uncertainty and therefore not reported in Tables 1 and 2.

Table 4.

Results of the Proposed QASCS Algorithm Compared with the ASC for the First Intersection

First intersection (128 St & 72 Ave)
	ASC	Proposed QASCS		% Reduction
	ASC	Using ROC	Using RLC	Using ROC	Using RLC
Overall volume	29610
The number of cycles with ROC	103	18	13	82%	87%
Weighted average return level of cycle (WA RLC)	−0.37	−0.707	−0.626	48%	41%

Note: ASC = actuated signal control; QASCS = Q-learning-based actuated signal control; RLC = return level of cycle; ROC = risk of collision.

Table 5.

Results of the Proposed QASCS Algorithm Compared with the ASC for the Second Intersection

Second intersection (132 St & 72 Ave)
	ASC	Proposed QASCS		% Reduction
	ASC	Using ROC	Using RLC	Using ROC	Using RLC
Overall volume	25197
The number of cycles with ROC	139	22	29	84%	79%
Weighted average return level of cycle (WA RLC)	−0.34	−0.62	−0.62	46%	46%

Note: ASC = actuated signal control; QASCS = Q-learning-based actuated signal control; RLC = return level of cycle; ROC = risk of collision.

In addition to the safety impact of the proposed algorithm, the algorithm’s effect on mobility was also evaluated. Even though the delay/travel time was not the primary objective function, the proposed algorithm improved mobility and reduced the total travel time at both intersections. The results indicated that the total travel time per vehicle was decreased by an average of 16% after applying QASCS using the ROC as an objective function for the two intersections. A reduction of 7% in the total travel time per vehicle was observed for the two intersections after using the RLC as a reward. Other performance metrics were also improved, including the queue length and the number of stops. Specifically, the maximum queue length, the 95th percentile of queue length, and the number of stops were reduced by 14%, 39%, and 32% using ROC as a reward, and by 16%, 40%, and 10% using RLC as a reward, respectively.

Thus, the proposed algorithm improves both the safety and operational performance. In other words, the algorithm optimizes safety (i.e., minimizes ROC and RLC) without deteriorating mobility. No doubt that traffic delays are an essential issue as congestion occurs more frequently and leads to significant economic and environmental cost. Traffic safety as well is a fundamental issue because of high collision frequencies and severities at signalized intersections and their enormous associated social and economic costs. Therefore, both traffic safety and mobility are fundamental optimization objectives. As previous research has been focused on optimizing delays (i.e., mobility) ( 1 – 9 ), the main contribution of this paper is to present a new algorithm that optimizes safety without deteriorating mobility (without increasing delays). The developed algorithm can further be modified to incorporate both safety and mobility in a multi-objective optimization problem. In such a problem, a weight can be assigned to each objective based on its associated cost (e.g., savings resulted from decreasing delays or collisions). These weights can vary among different locations and jurisdictions. These issues are potential areas of future research.

Effect of CV MPRs

The prevalence of CV technology is expected to increase gradually over the coming years. Before the full deployment of this technology, CVs will constitute a percentage of the total number of vehicles. Therefore, the validation of the proposed algorithm should be conducted based on various MPRs of CVs. The performance of the proposed algorithm was evaluated and compared using various MPRs of CVs at both intersections. The investigated MPRs range from 10% to 100%. Various MPRs of CVs were represented in the VISSIM model by creating a new vehicle class called “connected vehicle” and varying traffic composition percentages of each traffic input point. When implementing the algorithm with a specific MPR value, instantaneous vehicle information was captured from vehicles with the “connected vehicle” class only. The arrival-queue factor of each approach was estimated from CVs data. To determine the real-time state for the algorithm, the estimated arrival-queue factor was multiplied by a correction factor (i.e., magnification factor) to represent all vehicle classes (CVs and conventional vehicles). This factor equals the reciprocal of the MPR value. The exact MPR value is estimated in real time, given the number of CVs from the V2I communications and the total traffic counts from the counting detectors upstream of each approach of the intersection.

The results showed that the proposed QASCS algorithm can lead to considerable safety improvements even under lower MPRs of CVs. For example, compared with the benchmark ASC, a reduction of 43% in the weighted average RLC was achieved when the QASCS is applied with MPR of 50%. Generally, the higher the MPR value, the more the safety effectiveness of the algorithm. It should also be noted that MPR values less than 20% may not lead to significant safety benefits, as the algorithm cannot define the environment state with a reasonable accuracy because of the lack of real-time information on vehicle positions and speeds.

Summary and Conclusions

This study introduces an ATSC algorithm (i.e., QASCS) to optimize traffic safety in real time by directly minimizing crash risk. Reward representation and safety evaluation of the algorithm were based on real-time crash prediction models for signalized intersections developed in a recent study ( 33 ). The models used traffic conflicts extracted from vehicle trajectories and considered cycle-level traffic parameters (e.g., shock wave area, shock wave speed, traffic volume, and platoon ratio) as covariates for crash prediction within the EVT framework. Using these models, two real-time crash-risk measures, the ROC and the RLC, can be obtained using the GEV distribution. The RL framework was applied to formulate the proposed QASCS algorithm. Moreover, real-time data from CVs and variables representing traffic dynamic changes were considered in the analysis. To the best of the authors’ knowledge, this is the first study that applies ATSC for real-time safety optimization by reducing crash risk at signalized intersections.

The TD RL method, (particularly, the Q-learning off-policy method) was used in this study. In this method, the environment was simulated using VISSIM model. The state of the environment was represented by the position and the speed of each vehicle approaching the intersection within the DSRC range (i.e., 225 m). The fixed phasing sequence for action definition was adopted. This includes two actions, either extending the green time to the phase in effect or switching it to the next phase. Moreover, two real-time crash-risk measures, ROC and RLC, were employed to define the reward function as a penalty, separately. Constraints such as the yellow, the minimum green, the maximum green, and the all-red times were considered to ensure the feasibility of implementing the proposed technique in the real world.

The algorithm was trained using a real-world intersection modeled and simulated by VISSIM to learn the optimal policy. Traffic volumes at each intersection were randomized to run the simulation model for 500 iterations for both reward functions (i.e., ROC, RLC). Each iteration was divided into a 1,000-s warming-up period, a 500-s cooling-down period, and a 3,600-s training period. It was observed that after 400 iterations, the proposed algorithm converged to the optimal policy.

Validation of the trained algorithm was investigated using two separate signalized intersections in the city of Surrey, British Columbia, Canada. Additionally, the safety performance of the proposed QASCS algorithm and the field fully actuated traffic signal controller was compared. Important safety performance measures were evaluated for the two algorithms, including the number of cycles with ROC, and the weighted average RLC. Generally, the validation results showed that the proposed QASCS algorithm reduced ROC at the two signalized intersections significantly compared with the existing ASC. A drop of 82% and 87% in the number of cycles with ROC was obtained after implementing the QASCS algorithm at the first intersection using ROC and RLC as reward functions, respectively. For the second intersection, the number of cycles was reduced from 139 cycles to 22 and 29 for ROC and RLC as reward functions, respectively. The findings also illustrated that the weighted average RLC was reduced by 48% using ROC and 41% using RLC for the first intersection, whereas it was reduced by 46% for both reward functions at the second intersection.

Furthermore, the algorithm’s effect on mobility was evaluated at the two intersections. Despite the delay/travel time not being the primary objective function, the proposed algorithm improved mobility and reduced the total travel time at both intersections. The results indicated that the total travel time per vehicle was decreased by an average of 16% after applying QASCS using the ROC as an objective function for the two intersections. When using the RLC as a reward, a reduction of 7% in the total travel time per vehicle was observed for the two intersections. This reduction cannot be considered the optimal outcome for mobility improvement, as the primary objective of the proposed QASCS algorithm is optimizing traffic safety by reducing the ROC.

Additionally, the performance of the proposed algorithm was investigated under various MPRs of CVs. Results indicated that reasonable safety improvements can be realized at MPR values lower than 100%. Approximately 43% reduction in the weighted average RLC was obtained at MPR of 50%, compared with the existing ASC system.

Several areas of future research can be applied to improve the effectiveness of the proposed QASCS algorithm and address the study limitations. First, this study used only two intersections for validation. Future studies may consider a larger number of intersections to investigate the safety and mobility performance of the proposed algorithm. Second, future research may consider replacing the discrete Q-table that defines the possible states to a continuous state space by using a deep neural network to describe the infinite possible states of the environment. Third, investigating the sensitivity of the results to the assumed parameters such as the discount factor, the update time interval, and the V2I DSRC domain is recommended. Fourth, the improvement in the safety performance in this research was achieved by using the CVs technology and the RL technique combined. Investigating the separate effect of each of them is an interesting area of research that deserves future investigation. Fifth, it is suggested to test the algorithm’s performance in other jurisdictions (e.g., developing countries) with different traffic conditions and driving cultures as well as to compare the algorithm’s performance with other ATSC algorithms. Most importantly, safety and mobility can be considered in the algorithm as two primary objectives for a multi-objective real-time traffic signal optimization.

Footnotes

Author Contributions

The authors confirm contribution to the paper as follows: study conception and design: P. Reyad, T. Sayed; data collection: P. Reyad, M. Essa; analysis and interpretation of results: P. Reyad, M. Essa, L. Zheng; draft manuscript preparation: P. Reyad, T. Sayed. All authors reviewed the results and approved the final version of the manuscript.

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iDs

Passant Reyad

Tarek Sayed

Mohamed Essa

Lai Zheng

References

Sims

The Sydney Coordinated Adaptive Traffic System. Proc., Engineering Foundation Conference on Research Directions in Computer Control of Urban Traffic Systems, Pacific Grove, CA, 1979.

Luyanda

Gettman

Head

Shelby

Bullock

Mirchandani

ACS-Lite Algorithmic Architecture: Applying Adaptive Control System Technology to Closed-Loop Traffic Signal Control Systems. Transportation Research Record: Journal of the Transportation Research Board, 2003. 1856: 175–184.

Abdulhai

Pringle

Karakoulas

Reinforcement Learning for True Adaptive Traffic Signal Control. Journal of Transportation Engineering, Vol. 129, 2003, 278–285.

Shoufeng

Ximin

Shiqiang

Q-Learning for Adaptive Traffic Signal Control Based on Delay Minimization Strategy. Proc., IEEE International Conference on Networking, Sensing and Control, Sanya, China, 2008, pp. 687–691.

Salkham

Cunningham

Garg

Cahill

A Collaborative Reinforcement Learning Approach to Urban Traffic Control Optimization. Proc., IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Transportation Systems, Washington, D.C., 2008.

El-Tantawy

Abdulhai

Abdelgawad

Multiagent Reinforcement Learning for Integrated Network of Adaptive Traffic Signal Controllers (MARLIN-ATSC): Methodology and Large-Scale Application on Downtown Toronto. IEEE Transactions on Intelligent Transportation Systems, Vol. 14, No. 3, 2013, pp. 1140–1150.

El-Tantawy

Abdulhai

Abdelgawad

Design of Reinforcement Learning Parameters for Seamless Application of Adaptive Traffic Signal Control. Journal of Intelligent Transportation Systems, Vol. 18, No. 3, 2014, pp. 227–245.

Shabestary

Abdulhai

Deep Learning vs. Discrete Reinforcement Learning for Adaptive Traffic Signal Control. Proc., 21st International Conference on Intelligent Transportation Systems (ITSC), Maui, Hawaii, 2018, pp. 286–293.

Gong

Abdel-Aty

Cai

Rahman

Decentralized Network Level Adaptive Signal Control by Multi-Agent Deep Reinforcement Learning. Transportation Research Interdisciplinary Perspectives, Vol. 1, 2019, 100020p.

10.

Anžek

Kavran

Badanjak

Adaptive Traffic Control as Function of Safety. Proc., 12th World Congress on Intelligent Transport Systems and Services, San Francisco, 2005.

11.

Midenet

Saunier

Boillot

Exposure to Lateral Collision in Signalized Intersections With Protected Left Turn Under Different Traffic Control Strategies. Accident Analysis and Prevention, Vol. 43, No. 6, 2011, pp. 1968–1978.

12.

Tageldin

Sayed

Zaki

Azab

A Safety Evaluation of an Adaptive Traffic Signal Control System Using Computer Vision. Advances in Transportation Studies, Vol. 2, No. Special Issue, 2014, pp. 83–96.

13.

Fontaine

Zhou

Hale

Clements

Estimation of the Safety Effects of an Adaptive Traffic Signal Control System. Presented at 94th Annual Meeting of the Transportation Research Board, Washington, D.C., 2015.

14.

Sabra

Gettman

Henry

Nallamothu

Sabra

Balancing Safety and Capacity in an Adaptive Signal Control System—Phase 1. No. FHWA-HRT-10-038, Federal Highway Administration, Washington, D.C., 2010.

15.

Stevanovic

Kergaye

Haigwood

Stevanovic

Keragaie

Haigvood

Assessment of Surrogate Safety Benefits of an Adaptive Traffic Control System. Proc., 3rd International Conference on Road Safety and Simulation, Indiana, 2011.

16.

Khattak

Fontaine

Boateng

Evaluating the Impact of Adaptive Signal Control Technology on Driver Stress and Behavior Using Real-World Experimental Data. Transportation Research Part F: Traffic Psychology and Behaviour, Vol. 58, No. 1, 2018, pp. 133–144.

17.

Fyfe

Sayed

Safety Evaluation of Connected Vehicles for a Cumulative Travel Time Adaptive Signal Control Microsimulation Using the Surrogate Safety Assessment Model. Presented at 96th Annual Meeting of the Transportation Research Board, Washington, D.C., 2017.

18.

Lodes

Benekohal

R. F.

Safety Benefits of Implementing Adaptive Signal Control Technology: Survey Results. Illinois Center for Transportation, Urbana, IL, 2013.

19.

Sabra

Gettman

Henry

Nallamothu

Enhancing Safety and Capacity in an Adaptive Signal Control System—Phase 2. FHWA-PROJ-10, 37. Federal Highway Administration, McLean, VA, 2013.

20.

Stevanovic

Kergaye

Optimization of Traffic Signal Timings Based on Surrogate Measures of Safety. Transportation Research Part C: Emerging Technologies, Vol. 32, 2013, pp. 159–178.

21.

Stevanovic

Ostojic

Multi-Criteria Optimization of Traffic Signals: Mobility, Safety, and Environment. Transportation Research Part C: Emerging Technologies, Vol. 55, 2015, pp. 46–68.

22.

Essa

Sayed

Simulated Traffic Conflicts: Do They Accurately Represent Field-Measured Conflicts?

Transportation Research Record: Journal of the Transportation Research Board, 2015. 2514: 48–57.

23.

Essa

Sayed

Comparison Between Surrogate Safety Assessment Model and Real-Time Safety Models in Predicting Field-Measured Conflicts at Signalized Intersections. Transportation Research Record: Journal of the Transportation Research Board, 2020. 2674: 100–112.

24.

Essa

Sayed

Traffic Conflict Models to Evaluate the Safety of Signalized Intersections at the Cycle Level. Transportation Research Part C: Emerging Technologies, Vol. 89, 2018, pp. 289–302.

25.

Essa

Sayed

Full Bayesian Conflict-Based Models for Real Time Safety Evaluation of Signalized Intersections. Accident Analysis and Prevention, Vol. 129, 2019, pp. 367–381.

26.

Essa

Sayed

Reyad

Transferability of Real-Time Safety Performance Functions for Signalized Intersections. Accident Analysis and Prevention, Vol. 129, 2019, pp. 263–276.

27.

Tarko

Use of Crash Surrogates and Exceedance Statistics to Estimate Road Safety. Accident Analysis and Prevention, Vol. 45, 2012, pp. 230–240.

28.

Zheng

Ismail

Meng

Freeway Safety Estimation Using Extreme Value Theory Approaches: A Comparative Study. Accident Analysis and Prevention, Vol. 62, 2014. pp. 32–41.

29.

Åsljung

Nilsson

Fredricsson

Using Extreme Value Theory for Vehicle Level Safety Validation and Implications for Autonomous Vehicles. IEEE Transactions on Intelligent Vehicles, Vol. 2, No. 4, 2017, pp. 288–297.

30.

Farah

Azevedo

Safety Analysis of Passing Maneuvers Using Extreme Value Theory. IATSS Research, Vol. 41, No. 1, 2017, pp. 12–21.

31.

Wang

Xia

Qian

A Combined Use of Microscopic Traffic Simulation and Extreme Value Methods for Traffic Safety Evaluation. Transportation Research Part C: Emerging Technologies, Vol. 90, 2018, pp. 281–291.

32.

Zheng

Sayed

A Bivariate Bayesian Hierarchical Extreme Value Model for Traffic Conflict-Based Crash Estimation. Analytic Methods in Accident Research, Vol. 25, No. 4, 2020, p. 100111.

33.

Zheng

Sayed

A Novel Approach for Real Time Crash Prediction at Signalized Intersections. Transportation Research Part C, Vol. 117, 2020, p. 102683.

34.

Abdel-Aty

Abdalla

Linking Roadway Geometrics and Real-Time Traffic Characteristics to Model Daytime Freeway Crashes: Generalized Estimating Equations for Correlated Data. Transportation Research Record: Journal of the Transportation Research Board, 2004. 1897: 106–115.

35.

Theofilatos

Incorporating Real-Time Traffic and Weather Data to Explore Road Accident Likelihood and Severity in Urban Arterials. Journal of Safety Research, Vol. 61, 2017, pp. 9–21.

36.

Yuan

Abdel-Aty

Approach-Level Real-Time Crash Risk Analysis for Signalized Intersections. Accident Analysis and Prevention, Vol. 119, 2018, pp. 274–289.

37.

Hossain

Abdel-Aty

Quddus

Muromachi

Sadeek

Real-Time Crash Prediction Models: State-of-the-Art, Design Pathways and Ubiquitous Requirements. Accident Analysis and Prevention, Vol. 124, 2019, pp. 66–84.

38.

Campbell

K. L.

Joksch

H. C.

Green

P. E.

A Bridging Analysis for Estimating the Benefits of Active Safety Technologies. The University of Michigan, Ann Arbor, MI, 1996.

39.

Songchitruksa

Tarko

The Extreme Value Theory Approach to Safety Estimation. Accident Analysis and Prevention, Vol. 38, No. 4, 2006, pp. 811–822.

40.

Tarko

Surrogate Measures of Safety. Safe Mobility: Challenges, Methodology and Solutions. Emerald Publishing Limited, Bingley, 2018.

41.

Orsini

Gecchele

Gastaldi

Rossi

Collision Prediction in Roundabouts: A Comparative Study of Extreme Value Theory Approaches. Transportmetrica A: Transport Science, Vol. 15, No. 2, 2019, pp. 556–572.

42.

Zheng

Sayed

Application of Extreme Value Theory for Before-After Road Safety Analysis. Transportation Research Record: Journal of the Transportation Research Board, 2019. 2673: 1001–1010.

43.

Zheng

Sayed

Tageldin

Before-After Safety Analysis Using Extreme Value Theory: A Case of Left-Turn Bay Extension. Accident Analysis and Prevention, Vol. 121, 2018, pp. 258–267.

44.

Zheng

Sayed

A Full Bayes Approach for Traffic Conflict-Based Before–After Safety Evaluation Using Extreme Value Theory. Accident Analysis and Prevention, Vol. 131, 2019, pp. 308–315.

45.

Zheng

Sayed

Essa

Validating the Bivariate Extreme Value Modeling Approach for Road Safety Estimation With Different Traffic Conflict Indicators. Accident Analysis and Prevention, Vol. 123, 2019, pp. 314–323.

46.

Wang

Dai

A Crash Prediction Method Based on Bivariate Extreme Value Theory and Video-Based Vehicle Trajectory Data. Accident Analysis and Prevention, Vol. 123, 2019, pp. 365–373.

47.

Zheng

Sayed

Essa

Bayesian Hierarchical Modeling of the Non-Stationary Traffic Conflict Extremes for Crash Estimation. Analytic Methods in Accident Research, Vol. 23, 2019, p. 100100.

48.

Hunt

Robertson

Bretherton

Winton

SCOOT-A Traffic Responsive Method of Coordinating Signals. Technical Report No. LR 1014 Monograph. Transport and Road Research Laboratory, Wokingham, Berkshire, 1981.

49.

Gartner

A Demand-Responsive Strategy for Traffic Signal Control (OPAC). Transportation Research Record: Journal of the Transportation Research Board, 1983. 906: 75–81.

50.

Lee

Park

Development and Evaluation of a Cooperative Vehicle Intersection Control Algorithm Under the Connected Vehicles Environment. IEEE Transactions on Intelligent Transportation System, Vol. 13, 2012, pp. 81–90.

51.

Lee

Park

Malakorn

Sustainability Assessments of Cooperative Vehicle Intersection Control at an Urban Corridor. Transportation Research Part C: Emerging Technologies, Vol. 32, 2013, pp. 193–206.

52.

Kamal

Imura

Ohata

Hayakawa

Aihara

Coordination of Automated Vehicles at a Traffic-Lightless Intersection. Proc. of IEEE Intelligent Transportation Systems Conference, The Hague, The Netherlands, 2013, pp. 922–927.

53.

Mirheli

Tajalli

Hajibabai

Hajbabaie

A Consensus-Based Distributed Trajectory Control in a Signal-Free Intersection. Transportation Research Part C: Emerging Technologies, Vol. 100, 2019, pp. 161–176.

54.

Lee

Park

Yun

Cumulative Travel-Time Responsive Real-Time Intersection Control Algorithm in the Connected Vehicle Environment. Journal of Transportation Engineering, Vol. 139, No. 10, 2013, pp. 1020–1029.

55.

Guler

Menendez

Meier

Using Connected Vehicle Technology to Improve the Efficiency of Intersections. Transportation Research Part C: Emerging Technologies, Vol. 46, 2014, pp. 121–131.

56.

Yang

Guler

Menendez

Isolated Intersection Control for Various Levels of Vehicle Technology: Conventional, Connected, and Automated Vehicles. Transportation Research Part C: Emerging Technologies, Vol. 72, 2016, pp. 109–129.

57.

Liang

Guler

Gayah

An Equitable Traffic Signal Control Scheme at Isolated Signalized Intersections Using Connected Vehicle Technology. Transportation Research Part C: Emerging Technologies, Vol. 110, 2020, pp. 81–97.

58.

Rafter

Anvari

Box

Cherrett

Augmenting Traffic Signal Control Systems for Urban Road Networks With Connected Vehicles. IEEE Transactions on Intelligent Transportation Systems, Vol. 21, No. 4, pp. 1728–1740.

59.

Goodall

Smith

Park

Traffic Signal Control With Connected Vehicles. Transportation Research Record: Journal of the Transportation Research Board, 2013. 2381: 65–72.

60.

Feng

Head

Khoshmagham

Zamanipour

A Real-Time Adaptive Signal Control in a Connected Vehicle Environment. Transportation Research Part C: Emerging Technologies, Vol. 55, 2015, pp. 460–473.

61.

Al Islam

Hajbabaie

Distributed Coordinated Signal Timing Optimization in Connected Transportation Networks. Transportation Research Part C: Emerging Technologies, Vol. 80, 2017, pp. 272–285.

62.

Jiang

Wang

Park

Eco Approaching at an Isolated Signalized Intersection Under Partially Connected and Automated Vehicles Environment. Transportation Research Part C: Emerging Technologies, Vol. 79, 2017, pp. 290–307.

63.

Ban

Bian

Wang

V2I Based Cooperation Between Traffic Signal and Approaching Automated Vehicles. Proc., 2017 IEEE Intelligent Vehicles Symposium (IV), Los Angeles, CA, 2017, pp. 1658–1664.

64.

Guo

Xiong

Joint Optimization of Vehicle Trajectories and Intersection Controllers With Connected Automated Vehicles: Combined Dynamic Programming and Shooting Heuristic Approach. Transportation Research Part C: Emerging Technologies, Vol. 98, 2019, pp. 54–72.

65.

Guo

Ban

Urban Traffic Signal Control With Connected and Automated Vehicles: A Survey. Transportation Research Part C: Emerging Technologies, Vol. 101, 2019, pp. 313–334.

66.

Abdulhai

Kattan

Reinforcement Learning: Introduction to Theory and Potential for Transport Applications. Canadian Journal of Civil Engineering, Vol. 30, 2003, pp. 981–991.

67.

El-Tantawy

Abdulhai

Towards Multi-Agent Reinforcement Learning for Integrated Network of Optimal Traffic Controllers (MARLIN-OTC). Transportation Letters, Vol. 2, No. 2, 2010, pp. 89–110.

68.

Chen

Cheng

A Review of the Applications of Agent Technology in Traffic and Transportation Systems. IEEE Transactions on Intelligent Transportation Systems, Vol. 11, No. 2, 2010, pp. 485–497.

69.

Wiering

Multi-Agent Reinforcement Learning for Traffic Light Control. Proc. 17th International Conference on Machine Learning (ICML), Stanford, CA, 2000, pp. 1151–1158.

70.

Camponogara

Kraus

Distributed Learning Agents in Urban Traffic Control. Proc., 11th Portuguese Conference on Artificial Intelligence, Beja, Portugal, 2003, pp. 324–335.

71.

de Oliveira

Bazzan

da Silva

Basso

Nunes

Rossetti

Lamb

Reinforcement Learning Based Control of Traffic Lights in Non-stationary Environments: A Case Study in a Microscopic Simulator. Proc., 4th European Workshop on Multi-Agent Systems EUMAS’06, Lisbon, Portugal, 2006.

72.

Balaji

German

Srinivasan

Urban Traffic Signal Control Using Reinforcement Learning Agents. IET Intelligent Transport Systems,Vol. 4, 2010, pp. 177–188.

73.

Arel

Liu

Urbanik

Kohls

Reinforcement Learning-Based Multi-Agent System for Network Traffic Signal Control. IET Intelligent Transport Systems, 2010, pp. 128–135.

74.

Sutton

Barto

Reinforcement Learning. Journal of Cognitive Neuroscience, Vol. 11, No. 1, 1999, pp. 126–134.

75.

Sutton

Barto

Introduction to Reinforcement Learning. MIT Press, Cambridge, MA, 1998.

76.

PTV

AG.

Vissim User Manual. Planung Transport Verkehr AG, Karlsruhe, Germany, 2015.

77.

US DOT. ITS Research Fact Sheets 2009. https://www.standards.its.dot.gov/Factsheets/Factsheet/66.

78.

Richter

Aberdeen

Natural Actor-Critic for Road Traffic Optimisation. In Advances in Neural Information Processing Systems ( Schoelkopf

Platt

J. C.

Hofmann

, eds.), MIT Press, Cambridge, MA, 2007, pp. 1169–1176.

79.

Hedges

C. J.

National Cooperative Highway Research Program. 2015.

80.

Ozbay

Yang

Bartin

Mudigonda

Derivation and Validation of New Simulation-Based Surrogate Safety Measure. Transportation Research Record: Journal of the Transportation Research Board, 2008. 2083: 105–113.

81.

Dowling

Skabardonis

Alexiadis

Traffic Analysis Toolbox, Volume III: Guidelines for Applying Traffic Microsimulation Modeling Software. FHWA-HRT-04-040. Federal Highway Administration, McLean, VA, 2004.

82.

Essa

Sayed

Transferability of Calibrated Microsimulation Model Parameters for Safety Assessment Using Simulated Conflicts. Accident Analysis and Prevention, Vol. 84, 2015, pp. 41–53.

Real-Time Crash-Risk Optimization at Signalized Intersections

Abstract

Keywords

Literature Review

Real-Time Crash Prediction Models

EVT Models

ATSC Algorithms

Traffic Signal Optimization Using CV data

Use of Reinforcement Learning for Traffic Control

Methodology

RL Formulation

Modeling the Environment

Q-Learning

State Definition

Action Definition

Action Selection Strategy

Real-Time Collision Prediction Models

Reward Definition

Training the Algorithm

Validation of the Proposed Algorithm

Validation Results

Effect of CV MPRs

Summary and Conclusions

Footnotes

Author Contributions

Declaration of Conflicting Interests

Funding

ORCID iDs

References