Sage Journals: Discover world-class research

Abstract

The benefits of controlling a freeway bottleneck using reinforcement-learning-(RL)-based ramp metering (RM) and/or variable speed limit (VSL) controllers are well established. However, in the event of using both RM and VSL to control the freeway, it is not clear how each method benefits the traffic stream in contrast to the other. We argue that, depending on traffic conditions, it may be better to use one and not both, or more importantly, to dynamically switch between the two. Moreover, a learning agent can automate the switch when warranted. In this paper, we offer intensive analysis and performance evaluations for RL as well as regulator-based RM and VSL controllers applied on both a Aimsun simulated hypothetical freeway network from literature and a real-world freeway on-ramp, extracted from Queen Elizabeth Way (QEW) located in Ontario, Canada with different levels of demand. The findings indicate that RM is more effective and beneficial than VSL in heavily congested scenarios as opposed to VSL, which can be beneficial in moderate and low congested scenarios. We also show that RL has the advantage of automatically prioritizing one control method over the other depending on traffic conditions. We demonstrate that in heavy congestion scenarios, the RL control agent that manages both RM and VSL clearly chooses RM over VSL.

Keywords

operations freeway operations bottlenecks dynamic speed limits freeway traffic control ramp metering

Bottlenecks represent a primary source of congestion on freeways. These bottlenecks, sometimes latent, manifest when the upstream maximum flow capacity exceeds the downstream maximum flow capacity at a particular location. The nominal bottleneck capacity signifies the sustainable flow capacity at the bottleneck’s location when the traffic arriving from upstream matches the downstream capacity. If the incoming flow surpasses the downstream capacity, it activates the bottleneck, resulting in congestion formation. The congestion head emerges at the bottleneck, while the congestion tail gradually moves upstream. This congestion at the active bottleneck leads to two significant adverse consequences: 1) a capacity drop (CD), which can be clearly illustrated in flow/density fundamental diagram (FD), where capacity of the freeway is reduced because of the congestion that can potentially double motorists’ time spent in congestion, and 2) obstruction of upstream off-ramp entries, further exacerbating the issue ( 1 , 2 ).

To mitigate these issues, freeway control methods are essential to pace traffic onto the freeway and avoid triggering congestion. Among the most extensively researched and implemented strategies are ramp metering (RM) and variable speed limits (VSL). RM manages freeway entry by controlling the release of vehicles from on-ramps, while VSL dynamically adjusts speed limits to optimize traffic flow and enhance safety.

Despite the effectiveness of these control strategies, the interplay between RM and VSL remains an open question. Specifically, it is unclear under what conditions one method is preferable over the other or whether dynamically switching between the two offers superior performance. Recent advancements in artificial intelligence, particularly reinforcement learning (RL), provide an opportunity to automate this decision-making process. RL-based controllers can adaptively choose between RM and VSL based on real-time traffic conditions, potentially leading to more efficient congestion management. We pay attention to letting and observing the RL agent choose RM versus VLS, as will be discussed. This study aims to systematically analyze and compare the performance of RL-based RM and VSL controllers against traditional regulator-based controllers as well as the no-control case (base case) to highlight the benefits and deteriorations caused by each controller to each group of travelers (mainstream and on-ramp travelers) to highlight the effectiveness of each controller in different scenarios. We evaluate their effectiveness using both a hypothetical freeway network and a real-world freeway segment from the Queen Elizabeth Way (QEW) in Ontario, Canada. Our contributions include:

Developing RL-based RM, VSL, and RMVSL (integrated RM and VSL) controllers and comparing them with traditional proportional–integral–derivative (PID)-based controllers.

Evaluating the effectiveness of these controllers under different congestion scenarios to determine their impact on freeway efficiency and traveler experiences.

Analyzing the RL agent’s decision-making process in dynamically selecting between RM and VSL based on prevailing traffic conditions.

Our findings highlight the strengths and limitations of each control method under varying congestion levels and emphasize the adaptive advantage of RL-based control. We show that RM is more effective than VSL in highly congested scenarios while the effectiveness of VSL increases in the case of moderate and light congested scenarios. Most importantly we show that the RL agent applying both methods concurrently focuses on RM for some time and switches to VSL later depending on traffic conditions, which we find interesting and insightful.

The remainder of the paper is organized as follows: the second section reviews related literature on RM, VSL, and RL-based traffic control. The third section details the methodology for designing the proposed DRL and PID controllers. The fourth section presents the case study, including the simulator and network setup. The fifth section discusses the results and analysis, and the final section provides concluding remarks.

Related Work

Ramp Metering (RM)

RM stands as one of the earliest proposed freeway control strategies and has been a subject of extensive research in transportation engineering and intelligent transportation systems. It aims to optimize and improve traffic flow, reducing congestion and enhancing the overall transportation system performance by eliminating bottleneck activation, leading to CD and congestion spillback ( 3 , 4 ). Additionally, RM indirectly benefits traffic systems by influencing travelers’ route choices, thereby achieving a desired traffic flow distribution across the entire network ( 3 – 5 ). In Cassidy and Rudjanakanoknad ( 6 ), the authors investigated the impact of RM on the capacity of isolated merges. Their findings provided evidence that RM can positively affect the capacity of isolated merges, which shed light on the potential benefits of RM in specific traffic scenarios. In references ( 3 ) and ( 7 ), the authors provided an analysis and evaluation of RM, offering a comprehensive overview of the historical evolution and the application of new algorithms and engineering principles in RM systems. The Asservissement Linéaire d’Entrée Autoroutière (ALINEA), introduced in Papageorgiou et al. ( 8 ), is a prominent local feedback RM strategy. ALINEA remains one of the most widely implemented RM strategies globally owing to its simplicity, efficiency, and effectiveness in addressing recurrent freeway congestion. It has demonstrated promising results in field applications ( 9 ). Moreover, field results on the A6 motorway in Paris have provided valuable insights into the practical application of ALINEA as a local traffic-responsive strategy for RM ( 10 ). As a result of the success that ALINEA controller achieved, it has been the subject of theoretical analysis and simulation studies, which have shown its performance in various scenarios and have resulted in the development of many versions of ALINEA to solve different specific scenarios such as AD-ALINEA that is designed to operate in regions whose critical density/occupancy cannot be estimated beforehand or is dynamically changing, AU-ALINEA that is designed to operate using upstream measurements ( 11 ), PI-ALINEA which is designed using proportional–integral controller that is more stable when dealing with the presence of distant downstream bottlenecks from the controlled ramp ( 12 ), and coordinated ALINEA that is used to control two or more consecutive on-ramps ( 13 ). Beside the mentioned ALINEA versions, in Frejo and De Schutter ( 14 ), the authors proposed a feed forward ALINEA controller that predicts the changes in bottleneck density, showcasing the algorithm’s resilience through simulations. In Shang et al. ( 15 ), the authors extended ALINEA and tested it on various mixed autonomy scenarios to accommodate varying degrees of automation. In addition to ALINEA versions, the research presented in Fartash et al. ( 16 ) focused on the methodology to identify on-ramps for metering, considering system-wide recurrent and non-recurrent congestion. Their work addressed the challenges of RM in the context of congestion management and system-wide traffic control, providing insights into the considerations for effective RM deployment. In addition, in Pooladsanj et al. ( 17 ), the authors proposed RM strategy to optimize freeway throughput under various safety constraints. Furthermore, in Torné et al. ( 18 ) and Ramezani et al. ( 19 ), the authors proposed the coordination of RM with different active traffic management controllers like VSL and perimeter control. Finally, practical implementations and field studies have provided valuable insights into the practical performance of RM systems ( 20 ). The rich literature on RM provides valuable insights for the design and implementation of RM systems to enhance traffic flow efficiency and mitigate congestion.

Variable Speed Limits (VSL)

VSL is a considerably newer control strategy than RM. It has emerged as an intelligent transportation system solution for traffic management, aiming to improve safety and harmonize traffic flow by decreasing speed variation among vehicles across lanes and between upstream and downstream traffic flows ( 21 – 23 ). VSLs are displayed on variable message signs and have shown substantial traffic safety benefits, including reducing the risk of accidents by temporarily reducing speed limits during risky traffic conditions ( 24 ). Evaluation of VSLs to improve traffic safety has been a key focus, with studies investigating the benefits of VSL implementation for real-time crash risk reduction ( 25 ). Moreover, the interaction between system design and operations of VSL systems in work zones has been limited, despite the potential for VSLs to improve safety and operations in such areas ( 26 ). Furthermore, the environmental effects of changing speed limits, including the introduction of variable speed systems on metropolitan motorways, have been examined to assess their broader impacts ( 27 ). Over the last 10 years, VSL has gained prominence as a potential method for enhancing freeway efficiency, marking a shift from the earlier focus on safety-related implementations ( 22 , 23 ). The utilization of VSL for mainstream traffic flow control on freeways aims to optimize throughput by regulating the flow of traffic upstream of a bottleneck. Practical implementations and field studies have provided valuable insights into the performance of VSL systems. Results from a study on the dynamics of VSL systems surrounding bottlenecks on the German Autobahn have shed light on the practical implications of VSLs in addressing traffic challenges ( 28 ). Also, macroscopic modeling of VSLs on freeways has been explored, emphasizing the importance of traffic flow models in the design and evaluation of VSL controllers ( 29 ). Additionally, in Müller et al. ( 30 ) the authors conducted a microsimulation analysis of practical aspects of traffic control with VSLs, providing insights into the practical implications of VSLs on traffic control, including bottleneck analysis. Moreover, VSLs have been studied in the context of connected vehicle environments, providing dynamic speed advisory information to optimize traffic flow conditions for freeways and corridors under both recurrent and non-recurrent congestion ( 31 ). Also in the context of connected vehicle environments, in Müller et al. ( 32 ) the authors explored mainstream traffic flow control on freeways with different CAVs’ penetration rates, shedding light on the potential for cooperative strategies to enhance traffic flow on freeways. These studies collectively demonstrate the significance of VSLs in traffic management and control, shedding light on their potential to optimize traffic flow, improve safety, and address congestion issues on motorways while proposing various types of controllers. Among the most famous VSL controllers aiming to optimize flow at bottlenecks are: cascaded double loop regulator (1), and the ALINEA-like integral feedback controller that is proposed and tested on a microscopic simulator in Zhang et al. ( 33 ). However, the lack of an acceleration area limited the improvements achieved. After the proposal of the acceleration area, ALINEA-like VSL controllers are widely used for their simplicity as in Müller et al. ( 30 , 32 ) where the controller is tested via Aimsun microscopic simulator. However, these controllers require prior knowledge or estimation of traffic flow parameters like the critical density (to define the set-point), and do not explicitly target an optimal solution.

Both RM and VSL controllers can cooperate together and with other controllers, as in Carlson et al. ( 34 ), to achieve better flow efficiency at bottlenecks, and in Tajdari et al. ( 35 ) and Markantonakis et al. ( 36 ) to coordinate with lane-change on freeway. However, these controllers require prior knowledge or estimation of traffic flow parameters like the critical density (to define the set-point), and do not explicitly target an optimal solution.

Reinforcement Learning-Based Traffic Control

More recently, advancements in artificial intelligence techniques have led to a growing interest in RL-based approaches for traffic control, mainly model-free RL algorithms. These RL-based traffic control methods have gained attention since they do not need an explicit model of the environment and are able to learn good policies through purely interacting with the environment. Initially, the application of RL in road traffic control was primarily explored in the context of optimizing traffic signal settings within urban traffic networks, as in El-Tantawy et al. ( 37 ), Zhao et al. ( 38 ), Ozan et al. ( 39 ), and Abdulhai et al. ( 40 ). Also, there are early RL using Q-Learning algorithm applications for optimizing traffic flow at a bottleneck; in Rezaee et al. ( 41 ), Davarynejad et al. ( 42 ) where RM is applied; in Li et al. (43) where VSL is applied; and in Schmidt-Dumont and van Vuuren ( 44 ) where RMVSL is applied. In the forementioned studies, a tabular RL approach was employed to train their RL models. More recent advancements in RL have shown promising results by combining RL with the deep learning technique ( 45 ). The integration of deep learning techniques with RL has opened up possibilities for implementing RL-based traffic control strategies in large-scale traffic networks characterized by vast state and action spaces ( 46 – 52 ). However, in some recent implementations of deep RL (DRL) the reward is designed based on the critical density or bottleneck speed ( 50 , 51 , 53 ), which is similar to the mentioned PID controllers, and may not yield optimal performance as that enforces the agent to perform at a certain density.

This study builds on previous work by systematically evaluating RL-based RM and VSL controllers against traditional regulator-based controllers. We analyze their performance across different congestion scenarios, highlighting the advantages of adaptive RL-based control in dynamically selecting the appropriate control strategy based on real-time traffic conditions.

Methodology

Reinforcement Learning

RL ( 54 ) is an area of machine learning in which a controller is trained to act well by iterative adjustment of the controller parameters in response to the observed effects of the controller’s actions. The controller is frequently referred to as an agent. The agent takes actions, receives observations from its environment, and receives rewards. The changes of the environment that result from the agent’s actions may be stochastic, and so may the rewards that the agent receives. The agent’s goal is to learn to act in a manner that optimizes its expected sum of future rewards.

The environment and reward dynamics are frequently assumed to depend only on the current observation and action (and not on how the controller and environment got to the current state, that is, the full observation-action history). This is a simplifying assumption, called the Markov property. A process of agent-environment interactions for which the Markov property holds true (and in which the observations describe the entire agent-environment state) is formalized in the concept of a fully observable Markov decision process (MDP). An MDP is a six-tuple $(S, A, R, P, γ, i)$ . Here, $S$ is the set of possible environment states, $A$ is the set of possible agent actions in each environment state, $P (s, a, s^{'})$ is the probability of transitioning from $s$ to $s^{'}$ given action $a$ , $R (s, a, s^{'})$ is the reward obtained as a result of a transition from $s$ to $s^{'}$ given action $a$ , $γ \in [0, 1)$ is the discount factor and $i (s)$ is the initial probability of the MDP starting from state $s$ .

An agent’s policy is defined to be a function $π (a, s)$ that gives the probability of selecting action a from state $s$ . The agent’s goal is to find a policy that maximizes the expected cumulative reward

J (s_{0}, π) = E_{\begin{matrix} s_{t} ~ P (s_{t - 1}, a_{t - 1}, \cdot) \\ a_{t} ~ π (\cdot, s_{t - 1}) \end{matrix}} \sum_{t = 0}^{\infty} γ^{t} R (s_{t}, a_{t}, s_{t + 1})

(1)

at each state $s_{0}$ . Note that the discount factor $γ$ controls how myopic an agent is: $γ$ that is close to 0 makes the agent give higher priority to more immediate actions, while $γ$ that is closer to 1 makes the agent give higher priority to rewards over longer horizons.

In this work, we apply the deep Q-networks (DQN) algorithm ( 45 ). The DQN algorithm is model-free (meaning that it does not use the explicit model of the environment transitions $P (s, a, s^{'})$ during training, or learn $P$ from data, but instead learns the policy from collected experience) and off-policy (meaning that it can use data that was not generated by the current policy to train). DQN was one of the first algorithms published in DRL and remains a strong method for discrete action problems. (At the outset of this research, we have also attempted to apply proximal policy optimization [PPO] [ 55 ], as PPO is arguably more appropriate for continuous action problems, but found that DQN with discretized actions achieved better performance.)

To describe the DQN algorithm, it is necessary to introduce some additional concepts and notation. We denote the $t$ th reward along a trajectory by $r_{t} = R (s_{t}, a_{t}, s_{t + 1})$ , and more generally write $r = R (s, a, s^{'})$ when the states and action are clear from context. The quality $Q^{π} (s, a)$ of an action $a$ from state $s$ under policy $π$ is defined as:

E_{\begin{matrix} s_{t + 1} ~ P (s_{t}, a_{t}, \cdot) \\ a_{t + 1} ~ π (\cdot, s_{t}) \end{matrix}} [\sum_{t = 0}^{\infty} γ^{t} r_{t} | s_{0} = s, a_{0} = a] .

(2)

Intuitively, $Q$ measures the expected outcome by taking action $a$ at state $s$ , if further actions are going to be taken based on the policy $π$ . The Bellman optimality condition states that the equality

Q^{π} (s, a) = \sum_{s^{'}} P (s, a, s^{'}) (r + γ \max_{a^{'}} Q^{π} (s^{'}, a^{'}))

(3)

holds for all $s$ , $a$ if and only if the policy $π$ maximizes $J (s_{0}, π)$ for all $s_{0}$ . Inspired by the earlier tabular $Q$ -learning algorithm, DQN continually updates the policy to bring $Q^{π} (s, a)$ closer to the value target

r + γ \max_{a^{'}} Q^{π} (s^{'}, a^{'}) .

(4)

The training loop for the DQN algorithm proceeds as follows. Every environment step, the agent takes the greedy action $a = argma x_{a^{'}} Q_{θ} (s, a^{'})$ with probability $1 - ϵ$ and a uniformly randomized action with probability $ϵ$ , where $ϵ$ is an exploration factor that may follow a schedule during training ( $ϵ$ -greedy action selection). The resulting tuple of experience $(s, a, r, s^{'})$ is collected and stored in a replay buffer. A minibatch $B$ of experience tuples is then sampled from the replay buffer, and several iterations of stochastic gradient descent are carried out to minimize the temporal difference (TD) loss function $L_{TD} (θ)$ defined as:

\frac{1}{2 | B |} \sum_{(s, a, r^{'}, s) \in B} {(Q_{θ} (s, a) - (r + γ \max_{a^{'}} Q_{θ^{'}} (s^{'}, a^{'})))}^{2} .

(5)

The gradient descent is attempting to bring $Q_{θ} (s, a)$ closer to satisfying the Bellman optimality condition, and, therefore, closer to representing the $Q$ -function of an optimal policy. The value target is estimated with weights $θ^{'}$ , different from $θ$ in general, but periodically updated to be equal to $θ$ . The network $Q_{θ^{'}} (s, a)$ is called the target network.

The replay buffer shuffles and draws samples to train the $Q$ -function, making samples less likely to be correlated with each other; while the target network gives a more stable regression target. Both techniques are used to improve training stability. In deployment, the greedy deterministic policy is decided from the online network $θ$ : $π (s) = {argmax}_{a} Q_{θ} (s, a)$ .

We have used a further improvement of DQN called double DQN (DDQN) ( 56 ). The motivating observation is that the sampled maximum is a biased estimate of the true maximum, when using the target network in both action-selection and action-evaluation steps in Equation 5:

r + γ Q_{θ^{'}} (s^{'}, \underset{a^{'}}{argmax} Q_{θ^{'}} (s^{'}, a^{'})) .

(6)

Therefore, the DQN target tends to overestimate. DDQN attempts to address this deficiency by decoupling the computation of these two steps by replacing the target network $θ^{'}$ in the action-selection step with the online network $θ$ : $a^{'} = {argmax}_{a^{'}} Q_{θ} (s^{'}, a^{'})$ . This is a simple change to the algorithm that often substantially improves the performance.

In the experiments, a feed forward neural network with three hidden layers is adopted to model $Q_{θ}$ . We have used the DDQN parameter settings listed in Table 1.

Table 1.

Double Deep Q-Networks Parameter Settings

Parameter	Value
Learning rate	0.001
Exploration factor ( $ϵ$ ) schedule	1 to 0.005 linear ramp over 50,000 steps
Optimizer	Adam
Additional optimizer parameters (for Adam)	$β_{1} = 0.9, β_{2} = 0.999,$ $ϵ = 10^{- 7}$
Replay buffer size	1,200,000
Discount factor ( $γ$ )	0.9
Nodes in the $Q_{θ}$ and $Q_{θ^{'}}$ hidden layers	32 × 32 × 32
Target network $Q_{θ^{'}}$ update frequency	900

PID Controllers

PID controllers are feedback controllers that are frequently used in practice. PID controllers are comparatively simple to implement, yet often have good performance, especially in systems that are close to being linear.

We now describe the general form of a discrete-time PID controller. We denote the times when the control action will be applied by $t_{k}, (k = 0, 1, 2, \dots)$ , with period $Δ t = t_{k} - t_{k - 1}$ . The goal of the discrete-time PID controller is to drive the controlled system toward a time-dependent set-point $r (t_{k})$ . To achieve this, each time-step, given the system output $y (t_{k})$ , the error $e (t_{k}) = r (t_{k}) - y (t_{k})$ is computed, and the control function is computed as:

u (t_{k}) = u (t_{k - 1}) + C_{0} e (t_{k}) + C_{1} e (t_{k - 1}) + C_{2} e (t_{k - 2}),

(7)

where the coefficients are

C_{0} = K_{P} + K_{I} Δ t + \frac{K_{D}}{Δ t},

(8)

C_{1} = = - K_{P} - \frac{2 K_{D}}{Δ t},

(9)

C_{2} = = \frac{K_{D}}{Δ t} .

(10)

$K_{P}, K_{I}, K_{D}$ denote the gains for the proportional, integral, and derivative terms of the controller, respectively.

In practice, based on the desired characteristics of the controller, one or two of the Proportional, Integral, and Derivative terms in the update rule described in Equation 7 may be omitted. In applications to RM and VSL control (for example, as in Papageorgiou et al. ( 8 ) and Müller et al. [ 30 ]), it is common to use a discrete-time integral controller, with the simplified update rule

u (t_{k}) = u (t_{k - 1}) + K_{I} (r (t_{k}) - y (t_{k})) Δ t .

(11)

For both RM and VSL control applications, the system output is measured by the average detector occupancy $y (t_{k}) = o (t_{k})$ on the bottleneck section, and the set point $r (t_{k}) =^o$ is the critical occupancy of the fundamental diagram of the highway around the ramp. The control functions are the RM rate $R (t)$ for RM and the VSL rate $b (t)$ , respectively. RM rate is the flow output from the RM while VSL rate is equal to the applied speed limit in the VSL application area divided by the legal speed limit without VSL application. As in Müller et al. ( 30 ), gain scheduling is used to overcome the nonlinearity between speed limit and resulting flow output from the application area.

Case Study

Freeway Network and Traffic Demand

Hypothetical Freeway Network from Literature: To replicate and compare with the literature and study the effects of different controllers on a simple freeway bottleneck, the simple hypothetical network shown in Figure 1 is implemented with the same dimensions and parameters as in Müller et al. ( 30 ).

Figure 1.

Hypothetical freeway network geometry ( 30 , 32 ).

The network models a 4.3 km two-lane freeway that adjoins one on-ramp 300 m upstream from the end. Immediately after the on-ramp nose, there is a 200-m-long acceleration and merging lane. Then, there is a lane drop that forms a bottleneck. The RM signal is located at the end of the ramp section just before the start of the acceleration lane. For VSL, an application area of 100 m is located 175 m upstream of the on-ramp ( 30 ).

The demand was designed to be high enough to trigger congestion in the no-control case as in Müller et al. ( 30 , 32 ). However, in our case the ramp demand was slightly edited as shown in Figure 2 to extend the time where both mainline and ramp demands peak simultaneously, between 0.5 and 1.5 h. The entering vehicle time-headways are sampled following an exponential distribution.

Figure 2.

Hypothetical freeway network demand profile.

Traffic is modeled exclusively with passenger cars. Most parameters were set to the simulator’s default values, except for the reaction time (0.5 s) and vehicle acceleration (1.5 m/s²), which were adjusted to calibrate the freeway capacity. Additionally, parameters related to the two-lane car-following model were modified to accommodate a maximum speed difference of 30 km/h between the mainstream lanes and 50 km/h between the rightmost and middle lanes in the three-lane section. These adjustments aim to facilitate smoother lane-changing maneuvers.

All other simulation settings and parameters follow those in Müller et al. ( 30 ), except for the cooperation rate at the merging area, which was set to 50%. This means that 50% of drivers in the mainstream lane are willing to adjust their speed to facilitate merging from the acceleration lane. This cooperation rate helps balance the ramp and mainstream demand in the merging area.

Real World Freeway Network: To test the controllers on a network with real geometry, the eastbound (EB) on-ramp at Winston Churchill Boulevard located in the city of Mississauga, Ontario, Canada, shown in Figure 3 is used as a real geometry one-ramp testbed for different controllers.

Figure 3.

Queen Elizabeth Way eastbound (QEW EB) Winston Churchill on-ramp network geometry.

The network consists of a 4.8 km three-lanes mainstream section, 275 m merging area located 477 m upstream from the downstream end of the network, north and south on-ramps that merge together to a one-lane section then merges to the freeway at the previously mentioned merging area, and application and acceleration areas designed exactly as in the hypothetical network ( 30 , 32 ).

The demand profile is set as shown in Figure 4. It is designed to have the same general design as the demand set to the hypothetical network but with different peaks for both mainstream and on-ramp demands. These peaks were chosen to test the performance of controllers with excessively high demands coming from mainstream and on-ramp simultaneously. The simulation is continued for 30 min after the end of demand to allow the network to drain out all vehicles. All microscopic modeling is conducted using the Aimsun Microscopic Simulator ( 57 ) with all the simulation settings and parameters are kept as in the hypothetical network.

Figure 4.

Winston Churchill network demand profile.

Definitions of the MDPs for RM, VSL, and RMVSL

This section defines the states, actions and rewards used to train the RL controllers for RM, VSL, and RMVSL. The state space and reward are the same in the three cases, but the action spaces differ. In all three cases, the actions are applied with a period of 60 simulator seconds. The 60 s in-between controller actions is called the control interval.

State Space: The observation is composed of a collection of statistics. The traffic flow through the acceleration section and on-ramp section; flow, speed and density on the bottleneck section; and occupancy of two detectors placed at the bottleneck section. All statistics collected for the state representation are averaged over the 60 s control interval.

RM Action Space: The action consists of setting the RM rate $R (t_{k})$ . The action space is discrete, and the possible metering rates are: $200, 250, \dots, 2, 000$ vph (vehicles per hour).

VSL Action Space: The action consists of setting the variable speed limit $V (t_{k})$ in the VSL application area (two options: whole section, or right lane). The action space is discrete, and the possible actions are: $15, 20, 25, \dots, 100$ km/h.

Simultaneous (RMVSL) Action Space: The RM rate $R (t_{k})$ and the variable speed limit $V (t_{k})$ are chosen simultaneously, so that a simultaneous action is a two-tuple $(R (t_{k}), V (t_{k}))$ . The possible values of $R (t_{k})$ and $V (t_{k})$ are the same as in the individual RM and VSL experiments described above.

Reward: The immediate reward is the average output flow from the bottleneck section during the 60 s control interval.

Results

In this paper, we test regulators and RL-based RM, VSL, and the cooperation of both RM and VSL controllers on both the hypothetical and the real-life networks. The results are analyzed and compared with the no-control case to show the effects of each controller on the mainstream and on-ramp travel times. The results are mainly analyzed using network total travel time (TTT), which is defined by Aimsun as TTT experienced by all the vehicles that have crossed the network by the end of the simulation. It includes the time spent in virtual queues. For further investigation for the effects of the application of different controllers, this network TTT is broken down into mainstream TTT and on-ramp TTT for vehicles generated from mainstream and on-ramp centroids respectively. The TTT illustration is followed by plots and discussions for total flows as well as the break down of flows to mainstream as well as on-ramp travelers, and FD for each control case.

As described in the previous section, although our goal is to minimize TTT, our DRL agent is trained to maximize the outflow from the bottleneck as a surrogate, as the bottleneck outflow is simpler to measure in practice.

RM Results

Hypothetical Freeway Network: Table 2 shows the performance of ALINEA and RL-RM controllers and compares their travel times to the no-control case for the hypothetical network. The table highlights that RL-RM performs best as regards on-ramp (338 h) and overall network TTT (665 h) while ALINEA has the lowest TTT for mainstream travelers only (309.1 h). This can be explained by considering that RM generally gives priority to mainstream travelers on the expense of on-ramp travelers. However, RL-RM can result in benefiting both groups of travelers while ALINEA results in lowering TTT for mainstream travelers while increasing the on-ramp TTT.

Table 2.

Hypothetical Network Ramp Metering Controllers Total Travel Time (TTT)

Control	Network TTT (h)	Change (%)	Mainstream TTT (h)	Change (%)	On-ramp TTT (h)	Change (%)
Basecase	1,581.0	na	1,186.8	na	394.9	na
ALINEA	751.5	52.5	309.1	73.9	442.4	−12.0
RL	665.0	57.9	327.0	72.4	338.0	14.3

Note: ALINEA = Asservissement Linéaire d’Entrée Autoroutière; RL = reinforcement-learning; na = not applicable.

Figure 5 highlights that both ALINEA and RL-RM controllers can avoid the CD. It is important to note the demand profile (Figure 2) while analyzing results. In Figure 5a, without control, congestion occurs, and the CD happens after approximately 40 min from the start of the simulation as depicted by the sudden drop of the blue line, when both mainline and on-ramp demands peak. The congestion persists until a few minutes before the end of the simulation, at around 140 min of simulation. However, although neither ALINEA and RL-RM trigger congestion on the freeway, RL-RM can serve the demand more efficiently than ALINEA. The bottleneck flow in the case of RL-RM stays nearer to capacity than in ALINEA, as depicted by the gray and orange lines, respectively. (Color online only.) In addition, when RL-RM is utilized, the bottleneck flow drops after 120 min only because of serving the total demand. While in the ALINEA case, the total demand takes more time to be served, the flow drops after approximately 135 min. Considering mainstream demand, Figure 5d shows that both RL-RM (gray line) and ALINEA (orange line) have comparable efficiencies on the mainstream flow. But, for the on-ramp demand, the RL-RM controller serves the on-ramp demand faster (at around 120 min) and more efficiently than ALINEA which serves the demand at around 135 min, as illustrated in Figure 5c.

Figure 5.

Hypothetical network ramp metering (RM) controllers: (a) bottleneck flow, (b) bottleneck flow/density (FD), (c) ramp flow, and (d) upstream mainstream flow. Color online only.

Figure 5b illustrates that ALINEA is generally more conservative than RL-RM as it leads to a lower density and higher speed, depicted by more orange points at the left part of the graph in Figure 5b. We examined several set points for ALINEA to properly select the critical density. However, this is a manual process and therefore is not optimized. In Figure 5b, ALINEA points are either at capacity or at the free-flow side, whereas RL-RM points are either at capacity or near capacity at the congested side.

Winston Churchill Freeway Network Table 3 compares the performance of each RM controller, to the no-control case for the Winston Churchill network. Overall, ALINEA and RL-RM perform similarly with reference to TTT for mainstream travelers (669.1 h and 670.4 h, respectively) whereas RL-RM performs slightly better as regards on-ramp and overall TTT (346 h and 1,016.4 h, respectively). In the case of excessively high demand coming from the mainstream, it is clear that RM gives priority to mainstream travelers at the expense of on-ramp travelers. Although both ALINEA and RL-RM lead to longer TTT for on-ramp travelers, RL-RM causes less deterioration to on-ramp TTT than ALINEA.

Table 3.

Winston Churchill Ramp Metering Controllers Total Travel Time (TTT)

Control	Network TTT (h)	Change (%)	Mainstream TTT (h)	Change (%)	On-ramp TTT (h)	Change (%)
Basecase	1,856.4	na	1,698.3	na	158.1	na
ALINEA	1,026.2	44.7	669.1	66.6	357.0	−125.0
RL	1,016.4	45.3	670.4	60.5	346.0	−119.0

Note: ALINEA = Asservissement Linéaire d’Entrée Autoroutière; RL = reinforcement-learning; na = not applicable.

Similar to the results of the hypothetical network, both ALINEA and RL-RM controllers can avoid the CD as illustrated in Figure 6. It is important to notice the demand profile (Figure 4) while analyzing results. In Figure 6a, without control, congestion occurs, and the CD happens after approximately 10 min from the start of the simulation, when the sum of both mainline and on-ramp demands exceeds the bottleneck capacity, as depicted by the blue line. The congestion persists for approximately 150 min. When applying ALINEA (orange line) or RL-RM (gray line), the freeway bottleneck can serve the demand without triggering congestion. In this case, RL-RM performs slightly better than ALINEA, where the gray and orange lines stay above the blue line. (Color online only.) Concerning serving the mainstream demand, Figure 6d shows that both RL-RM and ALINEA serve the mainstream flow with nearly the same efficiency. However, for the on-ramp demand, RL-RM controller serves the on-ramp demand slightly faster than ALINEA as illustrated in Figure 6c.

Figure 6.

Winston Churchill network ramp metering (RM) controllers: (a) bottleneck flow, (b) bottleneck flow/density (FD), (c) ramp flow, and (d) upstream mainstream flow.

For a clearer understanding of the analyzed RL-RM controller in both moderately and severely congested scenarios, Figure 7 illustrates the learnt policy for RL-RM controller in both cases. Generally, the controller performs as expected; it decreases the metering rate when both mainstream and on-ramp demand peak, between approximately 30 and 90 min, then increases the metering rate when the mainstream demand decreases, after approximately 90 min. The main difference between the RL-based and integral controllers is that the policy learnt by the DRL agent shifts between high and low flows without large deviations in bottleneck density as shown in Figures 5b and 6b. This shows that the controller does not only target the critical density but also targets other bottleneck activation factors like merging and lane-changing maneuvers, so it keeps pushing forward and backward the amount of merging vehicles. Conversely, without significant deviations from the set occupancy, ALINEA’s control decisions vary smoothly. The ALINEA controller reacts only after the increase in occupancy/density which makes it a reactive controller, unlike RL-based controllers that can be proactive to any problem as illustrated in Li et al. ( 43 ).

Figure 7.

Reinforcement learning-ramp metering (RL-RM) applied ramp actions: (a) hypothetical network and (b) Winston Churchill network.

VSL Results

Hypothetical Freeway Network: We compare the performance of each VSL controller to the no-control case for the hypothetical network in Table 4. Again, the overall performance in the case of any control method is better than the no-control case. This is because VSL gives priority to the on-ramp demand and controls the mainstream flow. All RL-based VSL controllers perform better than the regulator concerning overall TTT (925.3 h, 905.9 h, and 970.8 h, respectively). However, VSL with integral control results in lower TTT for on-ramp travelers (57.4 h) but higher TTT for mainstream travelers (913.4 h) compared with RL-based controllers.

Table 4.

Hypothetical Network Variable Speed Limit Controllers Total Travel Time (TTT)

Control	Network TTT (h)	Change (%)	Mainstream TTT (h)	Change (%)	On-ramp (h)	Change (%)
Basecase	1,581.0	na	1,186.8	na	394.9	na
Regulator	970.8	38.6	913.4	23.0	57.4	85.5
RL right Lane	925.3	41.5	846.8	28.6	78.5	80.1
RL section	905.9	42.7	828.9	30.2	76.9	80.5

Note: RL = reinforcement learning; na = not applicable.

As illustrated in Figure 8a, all VSL controllers (orange, gray, and yellow lines) mitigate a CD and achieve higher bottleneck flows than the no-control case (blue line). (Color online only.) The better TTT for both mainstream and on-ramp travelers can be clearly observed in Figure 8, c and d , where both ramp flows and upstream mainstream flows are enhanced after the application of any VSL controller.

Figure 8.

Hypothetical network variable speed limit (VSL) controllers: (a) bottleneck flow, (b) bottleneck flow/density (FD), (c) ramp flow, and (d) upstream mainstream flow.

Similar to RM, regulator-based VSL is more conservative for mainstream travelers than RL-based controllers. That can be seen in Figure 8b and Table 4 where the regulator results in a higher speed, a lower density at the bottleneck, higher TTT for mainstream travelers (913.4 h), and lower TTT (57.4 h) for on-ramp travelers.

Winston Churchill Freeway Network Table 5 compares the performance of each VSL controller to the no-control case on the mainstream and on-ramp TTT for the Winston Churchill freeway network. In the case of a very high demand, all VSL controllers cannot add significant benefits to the network-wide TTT. Table 5 also illustrates that VSL controllers can only give priority to on-ramp demand by decreasing the TTT for on-ramp travelers while controlling the mainstream flow and increasing the TTT for mainstream travelers. All VSL controllers perform slightly better than no-control case as regards overall TTT.

Table 5.

Winston Churchill Network Variable Speed limit Controllers Total Travel Time (TTT)

Control	Network TTT (h)	Change (%)	Mainstream TTT (h)	Change (%)	On-ramp (h)	Change (%)
Basecase	1,856.4	na	1,698.3	na	158.1	na
Regulator	1,849.6	0.37	1,800.7	−6.0	48.9	69.1
RL Right Lane	1,793.0	3.4	1,642.6	3.3	150.4	4.9
RL Section	1,786.8	3.8	1,638.1	3.5	148.7	6.0

Note: RL = reinforcement learning; na = not applicable.

As illustrated in Figure 9, all VSL controllers fail to avoid the CD in the case of excessively high demand coming from both mainstream and on-ramp (all lines are approximately at the same level). Figure 9a shows that the bottleneck flow does not achieve any obvious gain by the application of any method of VSL controllers. However, in Figure 9, c and d , regulator-based VSL (orange line) can serve on-ramp demand faster than both RL-base controllers (gray and yellow lines) and the base case (blue line) at the expense of mainstream travelers. (Color online only.) RL-based VSL can serve on-ramp demand slightly faster than the no-control case while serving mainstream travelers with the same performance as in the no-control case. The fundamental diagram in Figure 9b shows that all VSL controllers can avoid the very high congested region where the density is higher than 80 veh/km/lane which illustrates the slightly better performance of the VSL controllers.

Figure 9.

Winston Churchill Network variable speed limit (VSL) controllers: (a) bottleneck flow, (b) bottleneck flow/density (FD), (c) ramp flow, and (d) upstream mainstream flow.

We analyze the behavior of RL-based VSL controllers further by looking at the policy learnt by each RL-based controller, as illustrated in Figure 10. All controllers learn that having very low speed limits is not the optimum solution when both demands peak as low speeds increase the TTT for a large mainstream demand (3,500 vph and 6,000 vph for hypothetical network and Winston Churchill freeway network, respectively) and decrease the TTT for less travelers (onramp demand; 1,000 vph and 1,250 vph for hypothetical network and Winston Churchill freeway network, respectively). Lower speeds will also facilitate the merging of the unmetered platoon while causing unneeded flow decrease for the left lane(s). For right-lane speed controllers, the agents learn to decrease the speed limit to slightly facilitate the merging of on-ramp travelers as that behavior will not cause any unwanted metering for left-lane travelers. However, whenever the demand is lower, the agent learns that lowering the speed limit can be effective. This learnt behavior by the RL agent is interesting and cannot be applied by the feedback integral controller as this feedback controller keeps decreasing the speed limit until the minimum allowed speed limit is reached (15 km/h in this case) as long as the occupancy on the freeway is larger than the set occupancy to the controller. In Figures 8 and 9b for integral controller, most bottleneck density points are larger than the critical density, so the integral controller decreases speed to minimum value most of the time.

Figure 10.

Reinforcement learning-variable speed limit (RL-VSL) applied speed actions: (a) hypothetical network right-lane speed action, (b) hypothetical network section speed action, (c) Winston Churchill network right-lane speed action, and (d) Winston Churchill network section speed action.

The main goal for both RM and VSL is to maximize freeway output flow at bottleneck locations by controlling the input flow to a given bottleneck. RM can simply control and cut down the on-ramp flow to the value decided by the controller. However, VSL controllers aim to control the mainstream flow indirectly by controlling the mainstream speed limit at the application area to create a controlled congestion upstream on the freeway before the ramp flow merges into the freeway. To better understand the effects and behavior of the VSL controllers, Figure 11 illustrates the effects of applying different speed limits to the FD of uninterrupted traffic flows (upstream to the bottleneck without congestion) for each network. The lower the applied speed limit the lower the flow capacity and the higher the critical density, this behavior is the same as in Papageorgiou et al. ( 22 ), Carlson et al. ( 58 , 59 ). The change in flow capacity and critical density is not linear with the change in speed limit. For example, the change in capacity flow between 100 km/h (dark blue color) and 90 km/h (orange color) speed limits is significantly less than the change in flow capacity between 30 km/h (brown color) and 20 km/h (gray color). (Color online only.) To illustrate this nonlinearity, Figure 12 shows the capacity flow change with different speed limits for both networks. The behavior is the same as illustrated in Müller et al. ( 30 , 32 ). Both freeway networks have the same nonlinear behavior between speed limit and output flow from application area. Most important to highlight is the range of flow control that VSL can manage to do. For example, in Figure 12b, for Winston Churchill freeway network, VSL can decrease the uninterrupted flow capacity when changing the speed limit from 100 km/h (dark blue color) to 20 km/h (gray color) by approximately 1,600 vph. (Color online only.) This is the maximum decrease in flow as the achieved flow capacity while merging is less than the flow capacity of the uninterrupted flow. Given that the network has three lanes, the maximum decrease in flow when changing speed limit from 100 km/h to 20 km/h is 533 vph/lane. The right lane flow can be decreased by 533 vph while the unmetered ramp flow adds 1,250 vph. Thus, the main reasons behind the poor performance of VSL when compared with RM are:

Insufficient flow metering compared with incoming demand from the on-ramp.

The decrease in mainstream flow is distributed among all lanes while it is needed only for the right lane.

Conflicts that happen between right-lane mainstream travelers and the high unmetered flow coming from the on-ramp and merging in platoons

Figure 11.

Flow/density (FD) for different speed limits: (a) hypothetical and (b) Winston Churchill freeway network.

Figure 12.

Flow capacity for different speed limits: (a) hypothetical and (b) Winston Churchill freeway network.

Concurrent RM + VSL Results

In this section, we present the most interesting experiment and results in which we apply both RM and VLS concurrently. As will be discussed, it is worthy to note how the RL agent, learning and having access to both RM and VSL actions, discovers when to favor and prioritize one over the other, showing interesting control dynamics, which is the primary contribution of the paper.

Hypothetical Freeway Network: Table 6 summarizes the performance of RMVSL, regulator and RL-based controllers for the hypothetical network. The application of any control is better than no-control case as illustrated also in Figure 13. RL-based controllers can achieve better results (TTT) than the integral controller. In Figure 13, the flow of RL-based controllers falls after serving all the demand earlier than the integral controller. This shows the ability of RL-based controllers to serve all the demand in a shorter time. RL-RMVSL controllers always have better mainstream TTT than regulator. However, the on-ramp TTT in the right-lane RL-RMVSL controller (664.3 h) is significantly better than the TTT in the integral controller (856.2 h). Although the integral controller is designed to prioritize RM first, then apply VSL when RM alone fails to achieve the goal as in Carlson et al. ( 34 ), RL-based controllers learn to maintain a higher priority to the mainstream demand and decrease the VSL application.

Table 6.

Hypothetical Network Integrated Ramp Metering and/or Variable Speed Limit (RMVSL) Controllers Total Travel Time (TTT)

Control	Network TTT (h)	Change (%)	Mainstream TTT (h)	Change (%)	On-ramp (h)	Change (%)
Basecase	1,581.0	na	1,186.8	na	394.9	na
Regulator	856.2	45.9	513.5	56.7	342.7	13.2
RL Right lane	664.3	58.0	331.4	72.1	332.9	15.7
RL Section	679.0	57.1	324.6	72.7	354.4	10.3

Note: RL = reinforcement learning; na = not applicable.

Figure 13.

Hypothetical network integrated ramp metering and/or variable speed limit (RMVSL) bottleneck flow.

Winston Churchill Freeway Network Table 7 summarizes the performance of RMVSL, regulator and RL-based controllers for the Winston Churchill freeway network. As shown in Figure 14 and Table 7, RL-based controllers can achieve higher bottleneck flows (depicted by the gray and yellow lines) and lower overall TTT compared with the integral controller (orange line). (Color online only.)

Table 7.

Winston Churchill Integrated Ramp Metering and/or Variable Speed Limit Controllers Total Travel Time (TTT)

Control	Network TTT (h)	Change (%)	Mainstream TTT (h)	Change (%)	On-ramp (h)	Change (%)
Basecase	1,856.4	na	1,698.3	na	158.1	na
Regulator	1,706.9	8.1	1,421.9	16.3	285.0	−80.1
RL Right Lane	1,042.7	43.8	726.4	57.2	316.2	−99.9
RL Section	1,040.7	43.9	700.3	58.8	340.4	−115.2

Note: RL = reinforcement learning; na = not applicable.

Figure 14.

Winston Churchill network integrated ramp metering and/or variable speed limit (RMVSL) bottleneck flow.

Most interestingly, Figure 15 displays the actions applied by each agent. Generally, RL-RMVSL controllers learn tightening on-ramp flow more than mainstream flow whenever both mainstream and on-ramp demands peak. However, when the demand decreases (after approximately 90 min) the agent learns to decrease the speed limit and allow more flow from the on-ramp. The RL agent learns when and how to switch between RM and VSL depending on the state of the bottleneck and the incoming demand from both upstream mainstream and on-ramp. This dynamic and optimal behavior of when and how to switch between controllers is interesting, effective and cannot be applied by regulators. The learnt policies as well as the results from all controllers show that RM is more effective than VSL in highly congested scenarios, while VSL effectiveness appears in moderate and low congested scenarios. The RL agent learns automatically which to emphasize and when, depending on the prevailing traffic conditions.

Figure 15.

Integrated ramp metering and/or variable speed limit (RMVSL) actions: (a, b) hypothetical network right-lane control, (c, d) hypothetical network section control, (e, f) Winston Churchill network right-lane control, and (g, h) Winston Churchill network section control.

Although in Carlson et al. ( 1 ) it is reported that VSL can be less effective than RM only because of blocking of upstream off-ramps, in this paper we provide careful analysis for both controllers that shows that VSL can be ineffective in the case of heavy congestion, not attributable to any blocking of off-ramps. In fact, there are no off ramps in our experiment. In heavy congestion, we observe that it is not advisable to meter the mainstream, as clearly demonstrated by the learnt policy of the RL agent.

Conclusion

In this paper DRL-based RM, VSL, and RMVSL are designed and compared with regulator-based RM, VSL, and RMVSL, as well as the no-control case, to analyze the performance and the effects of each individual controller on mainline and on-ramp TTT. The analysis of the results shows that:

Any kind of control improves substantially over the no-control case, which is expected and consistent with the literature.

All RL-based controllers perform better than the corresponding regulators as RL-based controllers are optimizing controllers and do not rely on manually determined set points as in the regulator approach.

RL-based controllers display a wider range of behaviors than regulators

RM controller is more effective than, and definitely preferred to, VSL in highly congested scenarios.

VSL effectiveness in decreasing TTT appears in moderate and low congested scenarios.

RL-based RMVSL learns how to optimally combine RM and VSL and when and how to switch between them based on traffic conditions and the effectiveness of each controller, which we find to be very interesting, and useful.

In this study, we focused on simple freeway networks with hypothetical demands and isolated on-ramps to thoroughly analyze and evaluate the application of RL as well as regulator-based RM and VSL controllers. Applying these control strategies to larger-scale scenarios, such as longer congested urban freeways with consecutive on-ramp bottlenecks, may present additional challenges. For instance, resolving an upstream bottleneck might increase traffic flow toward downstream bottlenecks, potentially resulting in more severe and difficult-to-manage congestion. These complexities will require more sophisticated control strategies. Future research will build on the insights gained in this study to develop and implement effective controllers for managing traffic on large, heavily congested urban freeways with multiple on-ramps and calibrated, realistic demand scenarios, ultimately ensuring more practical and scalable solutions.

Footnotes

Author Contributions

The authors confirm contribution to the paper as follows: study conception and design: O. ElSamadisy; data collection: O. ElSamadisy; analysis and interpretation of results: O. ElSamadisy, I. Smirnov, B. Abdulhai; draft manuscript preparation: O. ElSamadisy, I. Smirnov, X. Wang. All authors reviewed the results and approved the final version of the manuscript.

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

ORCID iDs

Omar ElSamadisy

Xiaoyu Wang

Baher Abdulhai

References

Carlson

R. C.

Papamichail

Papageorgiou

Local Feedback-Based Mainstream Traffic Flow Control on Motorways Using Variable Speed Limits. IEEE Transactions on intelligent transportation systems, Vol. 12, No. 4, 2011, pp. 1261–1276.

Iordanidou

G.-R.

Roncoli

Papamichail

Papageorgiou

Feedback-Based Mainstream Traffic Flow Control for Multiple Bottlenecks on Motorways. IEEE Transactions on Intelligent Transportation Systems, Vol. 16, No. 2, 2014, pp. 610–621.

Papageorgiou

Kotsialos

Freeway Ramp Metering: An Overview. IEEE Transactions on Intelligent Transportation Systems, Vol. 3, No. 4, 2002, pp. 271–281.

Papageorgiou

Papamichail

Overview of Traffic Signal Operation Policies for Ramp Metering. Transportation Research Record: Journal of the Transportation Research Board, 2008. 2047: 28–36.

Jin

Zhang

Evaluation of On-Ramp Control Algorithms. California PATH Research Report: UCB-ITS-PRR-2001-36. Institute of Transportation Studies, University Of California, Berkeley, 2001.

Cassidy

M. J.

Rudjanakanoknad

Increasing the Capacity of an Isolated Merge by Metering its On-Ramp. Transportation Research Part B: Methodological, Vol. 39, No. 10, 2005, pp. 896–913.

Trubia

Curto

Barberi

Severino

Arena

Pau

Analysis and Evaluation of Ramp Metering: From Historical Evolution to the Application of New Algorithms and Engineering Principles. Sustainability, Vol. 13, No. 2, 2021, p. 850.

Papageorgiou

Hadj-Salem

Blosseville

J.-M.

, et al. ALINEA: A Local Feedback Control Law for On-Ramp Metering. Transportation Research Record: Journal of the Transportation Research Board, 1991. 1320: 58–67.

Papageorgiou

Hadj-Salem

Middelham

ALINEA local ramp metering: Summary of field results. Transportation Research Record: Journal of the Transportation Research Board, 1997. 1603: 90–98.

10.

Papageorgiou

Blosseville

J.-M.

Haj-Salem

Modelling and Real-Time Control of Traffic Flow on the Southern Part of Boulevard Périphérique in Paris: Part II: Coordinated On-Ramp Metering. Transportation Research Part A: General, Vol. 24, No. 5, 1990, pp. 361–370.

11.

Smaragdis

Papageorgiou

Kosmatopoulos

A Flow-Maximizing Adaptive Local Ramp Metering Strategy. Transportation Research Part B: Methodological, Vol. 38, No. 3, 2004, pp. 251–270.

12.

Wang

Kosmatopoulos

E. B.

Papageorgiou

Papamichail

Local Ramp Metering in the Presence of a Distant Downstream Bottleneck: Theoretical Analysis And Simulation Study. IEEE Transactions on Intelligent Transportation Systems, Vol. 15, No. 5, 2014, pp. 2024–2039.

13.

Papamichail

Papageorgiou

Traffic-Responsive Linked Ramp-Metering Control. IEEE Transactions on Intelligent Transportation Systems, Vol. 9, No. 1, 2008, pp. 111–121.

14.

Frejo

J. R. D.

Schutter

B. De

. Feed-Forward ALINEA: A Ramp Metering Control Algorithm for Nearby and Distant Bottlenecks. IEEE Transactions on Intelligent Transportation Systems, Vol. 20, No. 7, 2018, pp. 2448–2458.

15.

Shang

Wang

Stern

R. E.

Extending Ramp Metering Control to Mixed Autonomy Traffic Flow with Varying Degrees of Automation. Transportation Research Part C: Emerging Technologies, Vol. 151, 2023, p. 104119.

16.

Fartash

Hadi

Ponnaluri

Methodology to Identify On-Ramps for Metering with Consideration of System-Wide Recurrent and Non-Recurrent Congestion. Transportation Research Record: Journal of the Transportation Research Board, 2018. 2672: 39–49.

17.

Pooladsanj

Savla

Ioannou

P. A.

Ramp Metering to Maximize Freeway Throughput Under Vehicle Safety Constraints. Transportation Research Part C: Emerging Technologies, Vol. 154, 2023, p. 104267.

18.

Torné

J. M.

Soriguera

Geroliminis

Coordinated active traffic management freeway strategies using capacity-lagged cell transmission model. In Transportation Research Board Annual Meeting, Vol. 93. Transportation Research Board, Washington, DC, 2014.

19.

Ramezani

Haddad

Geroliminis

Macroscopic Traffic Control of a Mixed Urban and Freeway Network. IFAC Proceedings Volumes, Vol. 45, No. 24, 2012, pp. 89–94.

20.

Hourdos

Geroliminis

Zitzow

Limniati

Y. S.

Field Implementation, Testing, and Refinement of Density Based Coordinated Ramp Control Strategy. Final Report 2015-37. Center for Transportation Studies, University of Minnesota, 2015.

21.

Ranjitkar

A Fuzzy Logic-Based Variable Speed Limit Controller. Journal of Advanced Transportation, Vol. 49, No. 8, 2015, pp. 913–927.

22.

Papageorgiou

Kosmatopoulos

Papamichail

Effects of Variable Speed Limits on Motorway Traffic Flow. Transportation Research Record: Journal of the Transportation Research Board, 2008, 2047: 37–48.

23.

Heydecker

Addison

J. D.

Analysis and Modelling of Traffic Flow Under Variable Speed Limits. Transportation Research Part C: Emerging Technologies, Vol. 19, No. 2, 2011, pp. 206–217.

24.

Vaitkus

Strumskys

Jasiūnienė

Jateikienė

Andriejauskas

Skrodenis

Effect of Intelligent Transport Systems on Traffic Safety. The Baltic Journal of Road and Bridge Engineering, Vol. 11, No. 2, 2016, pp. 136–143.

25.

Abdel-Aty

Cunningham

R. J.

Gayah

V. V.

Hsia

Dynamic Variable Speed Limit Strategies for Real-Time Crash Risk Reduction on Freeways. Transportation Research Record: Journal of the Transportation Research Board, 2008. 2078: 108–116.

26.

Fudala

N. J.

Fontaine

M. D.

Interaction Between System Design and Operations of Variable Speed Limit Systems in Work Zones. Transportation Research Record: Journal of the Transportation Research Board, 2010. 2169: 1–10.

27.

Bel

Bolancé

Guillén

Rosell

The Environmental Effects of Changing Speed Limits: A Quantile Regression Approach. Transportation Research Part D: Transport and Environment, Vol. 36, 2015, pp. 76–85.

28.

Bertini

R. L.

Boice

Bogenberger

Dynamics of Variable Speed Limit System Surrounding Bottleneck on German Autobahn. Transportation Research Record: Journal of the Transportation Research Board, 2006. 1978: 149–159.

29.

Frejo

J. R. D.

Papamichail

Papageorgiou

De Schutter

Macroscopic Modeling of Variable Speed Limits on Freeways. Transportation Research Part C: Emerging Technologies, Vol. 100, 2019, pp. 15–33.

30.

Müller

E. R.

Carlson

R. C.

Kraus

Papageorgiou

Microsimulation Analysis of Practical Aspects of Traffic Control with Variable Speed Limits. IEEE Transactions on Intelligent Transportation Systems, Vol. 16, No. 1, 2015, pp. 512–523.

31.

Lee

Park

B. B.

Evaluation of Variable Speed Limit Under Connected Vehicle Environment. In 2013 International Conference on Connected Vehicles and Expo (ICCVE), Las Vegas, Nevada, IEEE, Piscataway, New Jersey, 2013, pp. 966–967.

32.

Müller

E. R.

Carlson

R. C.

Kraus

Cooperative Mainstream Traffic Flow Control on Freeways. IFAC-PapersOnLine, Vol. 49, No. 32, 2016, pp. 89–94.

33.

Zhang

Chang

Ioannou

P. A.

A Simple Roadway Control System for Freeway Traffic. In 2006 American Control Conference, Minneapolis, IEEE, Piscataway, New Jersey, 2006, p. 6.

34.

Carlson

R. C.

Papamichail

Papageorgiou

Integrated Feedback Ramp Metering and Mainstream Traffic Flow Control on Motorways Using Variable Speed Limits. Transportation Research Part C: Emerging Technologies, Vol. 46, 2014, pp. 209–221.

35.

Tajdari

Roncoli

Papageorgiou

Feedback-Based Ramp Metering and Lane-Changing Control with Connected and Automated Vehicles. IEEE Transactions on Intelligent Transportation Systems, Vol. 23, No. 2, 2020, pp. 939–951.

36.

Markantonakis

Skoufoulas

D. I.

Papamichail

Papageorgiou

Integrated Traffic Control for Freeways Using Variable Speed Limits and Lane Change Control Actions. Transportation Research Record: Journal of the Transportation Research Board, 2019. 2673: 602–613.

37.

El-Tantawy

Abdulhai

Abdelgawad

Multiagent Reinforcement Learning for Integrated Network of Adaptive Traffic Signal Controllers (MARLIN-ATSC): Methodology and Large-Scale Application on Downtown Toronto. IEEE Transactions on Intelligent Transportation Systems, Vol. 14, No. 3, 2013, pp. 1140–1150.

38.

Zhao

Jin

Traffic Signal Timing via Parallel Reinforcement Learning. In Smart Transportation Systems 2019, Springer, Singapore, 2019, pp. 113–123.

39.

Ozan

Baskan

Haldenbilen

Ceylan

A Modified Reinforcement Learning Algorithm for Solving Coordinated Signalized Networks. Transportation Research Part C: Emerging Technologies, Vol. 54, 2015, pp. 40–55.

40.

Abdulhai

Pringle

Karakoulas

G. J.

Reinforcement Learning for True Adaptive Traffic Signal Control. Journal of Transportation Engineering, Vol. 129, No. 3, 2003, pp. 278–285.

41.

Rezaee

Abdulhai

Abdelgawad

Self-Learning Adaptive Ramp Metering: Analysis of Design Parameters on a Test Case in Toronto, Canada. Transportation Research Record: Journal of the Transportation Research Board, 2013. 2396: 10–18.

42.

Davarynejad

Hegyi

Vrancken

van den Berg

Motorway Ramp-Metering Control with Queuing Consideration Using Q-Learning. Proc., 2011 14th International IEEE Conference on Intelligent Transportation Systems (ITSC), Washington, D.C., IEEE, Piscataway, New Jersey, 2011, pp. 1652–1658.

43.

Liu

Duan

Wang

Reinforcement Learning-Based Variable Speed Limit Control Strategy to Reduce Traffic Congestion at Freeway Recurrent Bottlenecks. IEEE Transactions on Intelligent Transportation Systems, Vol. 18, No. 11, 2017, pp. 3204–3217.

44.

Schmidt-Dumont

van Vuuren

J. H.

Decentralised Reinforcement Learning for Ramp Metering and Variable Speed Limits on Highways. IEEE Transactions on Intelligent Transportation Systems, Vol. 14, No. 8, 2015, p. 1.

45.

Mnih

Kavukcuoglu

Silver

Rusu

A. A.

Veness

Bellemare

M. G.

Graves

, et al. Human-Level Control Through Deep Reinforcement Learning. Nature, Vol. 518, No. 7540, 2015, pp. 529–533.

46.

Zhou

Gayah

V. V.

Model-Free Perimeter Metering Control for Two-Region Urban Networks Using Deep Reinforcement Learning. Transportation Research Part C: Emerging Technologies, Vol. 124, 2021, p. 102949.

47.

ElSamadisy

Abdulhai

Xue

Smirnov

Khalil

E. B.

Abdulhai

SMAC-tuned Deep Q-learning for Ramp Metering. In 2023 IEEE International Conference on Smart Mobility (SM), Thuwal, Saudi Arabia, IEEE, Piscataway, New Jersey, 2023, pp. 65–72.

48.

Shabestary

S. M. A.

Abdulhai

Adaptive Traffic Signal Control with Deep Reinforcement Learning and High Dimensional Sensory Inputs: Case Study and Comprehensive Sensitivity Analyses. IEEE Transactions on Intelligent Transportation Systems, Vol. 23, No. 11, 2022, pp. 20021–20035.

49.

Han

Wang

Roncoli

Gao

Liu

A Physics-Informed Reinforcement Learning-Based Strategy for Local and Coordinated Ramp Metering. Transportation Research Part C: Emerging Technologies, Vol. 137, 2022, p. 103584.

50.

Liu

A Novel Ramp Metering Algorithm based on Deep Reinforcement Learning. Proc., 2022 2nd International Conference on Algorithms, High Performance Computing and Artificial Intelligence (AHPCAI), Guangzhou, China, IEEE, Piscataway, New Jersey, 2022, pp. 128–133.

51.

Maliakal

A. H.

Intelligent Ramp Metering Control using Federated Reinforcement Learning. Master thesis, School of Computer Science and Statistics, TCD, Dublin, Ireland, 2022.

52.

Wang

Taitler

Smirnov

Sanner

Abdulhai

eMARLIN: Distributed Coordinated Adaptive Traffic Signal Control with Topology-Embedding Propagation. Transportation Research Record: Journal of the Transportation Research Board, 2023. 2678: 189–202.

53.

Wang

Zhang

Ran

A New Solution For Freeway Congestion: Cooperative Speed Limit Control Using Distributed Reinforcement Learning. IEEE Access, Vol. 7, 2019, pp. 41947–41957.

54.

Sutton

R. S.

Barto

A. G.

Reinforcement Learning: An Introduction. MIT Press, Cambridge, Massachusetts, 2018.

55.

Schulman

Wolski

Dhariwal

Radford

Klimov

Proximal Policy Optimization Algorithms. arXiv Preprint arXiv:1707.06347. 2017.

56.

Van Hasselt

Guez

Silver

Deep Reinforcement Learning with Double Q-Learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, Canada, Vol. 30, AAAI Press, Washington, D.C., 2016.

57.

Aimsun. Aimsun Next 24 User's Manual. Barcelona, Spain, aimsun next 24.0.0 ed., 2024.

58.

Carlson

R. C.

Papamichail

Papageorgiou

Messmer

Optimal Mainstream Traffic Flow Control of Large-Scale Motorway Networks. Transportation Research Part C: Emerging Technologies, Vol. 18, No. 2, 2010, pp. 193–212.

59.

Carlson

R. C.

Papamichail

Papageorgiou

Mainstream Traffic Flow Control on Freeways Using Variable Speed Limits. Transportes, Vol. 21, No. 3, 2013, pp. 56–65.

Deep Reinforcement Learning Freeway Controller Chooses Ramp Metering Over Variable Speed Limits

Abstract

Keywords

Related Work

Ramp Metering (RM)

Variable Speed Limits (VSL)

Reinforcement Learning-Based Traffic Control

Methodology

Reinforcement Learning

PID Controllers

Case Study

Freeway Network and Traffic Demand

Definitions of the MDPs for RM, VSL, and RMVSL

Results

RM Results

VSL Results

Concurrent RM + VSL Results

Conclusion

Footnotes

Author Contributions

Declaration of Conflicting Interests

Funding

ORCID iDs

References