Abstract
The benefits of controlling a freeway bottleneck using reinforcement-learning-(RL)-based ramp metering (RM) and/or variable speed limit (VSL) controllers are well established. However, in the event of using both RM and VSL to control the freeway, it is not clear how each method benefits the traffic stream in contrast to the other. We argue that, depending on traffic conditions, it may be better to use one and not both, or more importantly, to dynamically switch between the two. Moreover, a learning agent can automate the switch when warranted. In this paper, we offer intensive analysis and performance evaluations for RL as well as regulator-based RM and VSL controllers applied on both a Aimsun simulated hypothetical freeway network from literature and a real-world freeway on-ramp, extracted from Queen Elizabeth Way (QEW) located in Ontario, Canada with different levels of demand. The findings indicate that RM is more effective and beneficial than VSL in heavily congested scenarios as opposed to VSL, which can be beneficial in moderate and low congested scenarios. We also show that RL has the advantage of automatically prioritizing one control method over the other depending on traffic conditions. We demonstrate that in heavy congestion scenarios, the RL control agent that manages both RM and VSL clearly chooses RM over VSL.
Keywords
Bottlenecks represent a primary source of congestion on freeways. These bottlenecks, sometimes latent, manifest when the upstream maximum flow capacity exceeds the downstream maximum flow capacity at a particular location. The nominal bottleneck capacity signifies the sustainable flow capacity at the bottleneck’s location when the traffic arriving from upstream matches the downstream capacity. If the incoming flow surpasses the downstream capacity, it activates the bottleneck, resulting in congestion formation. The congestion head emerges at the bottleneck, while the congestion tail gradually moves upstream. This congestion at the active bottleneck leads to two significant adverse consequences: 1) a capacity drop (CD), which can be clearly illustrated in flow/density fundamental diagram (FD), where capacity of the freeway is reduced because of the congestion that can potentially double motorists’ time spent in congestion, and 2) obstruction of upstream off-ramp entries, further exacerbating the issue ( 1 , 2 ).
To mitigate these issues, freeway control methods are essential to pace traffic onto the freeway and avoid triggering congestion. Among the most extensively researched and implemented strategies are ramp metering (RM) and variable speed limits (VSL). RM manages freeway entry by controlling the release of vehicles from on-ramps, while VSL dynamically adjusts speed limits to optimize traffic flow and enhance safety.
Despite the effectiveness of these control strategies, the interplay between RM and VSL remains an open question. Specifically, it is unclear under what conditions one method is preferable over the other or whether dynamically switching between the two offers superior performance. Recent advancements in artificial intelligence, particularly reinforcement learning (RL), provide an opportunity to automate this decision-making process. RL-based controllers can adaptively choose between RM and VSL based on real-time traffic conditions, potentially leading to more efficient congestion management. We pay attention to letting and observing the RL agent choose RM versus VLS, as will be discussed. This study aims to systematically analyze and compare the performance of RL-based RM and VSL controllers against traditional regulator-based controllers as well as the no-control case (base case) to highlight the benefits and deteriorations caused by each controller to each group of travelers (mainstream and on-ramp travelers) to highlight the effectiveness of each controller in different scenarios. We evaluate their effectiveness using both a hypothetical freeway network and a real-world freeway segment from the Queen Elizabeth Way (QEW) in Ontario, Canada. Our contributions include:
Developing RL-based RM, VSL, and RMVSL (integrated RM and VSL) controllers and comparing them with traditional proportional–integral–derivative (PID)-based controllers.
Evaluating the effectiveness of these controllers under different congestion scenarios to determine their impact on freeway efficiency and traveler experiences.
Analyzing the RL agent’s decision-making process in dynamically selecting between RM and VSL based on prevailing traffic conditions.
Our findings highlight the strengths and limitations of each control method under varying congestion levels and emphasize the adaptive advantage of RL-based control. We show that RM is more effective than VSL in highly congested scenarios while the effectiveness of VSL increases in the case of moderate and light congested scenarios. Most importantly we show that the RL agent applying both methods concurrently focuses on RM for some time and switches to VSL later depending on traffic conditions, which we find interesting and insightful.
The remainder of the paper is organized as follows: the second section reviews related literature on RM, VSL, and RL-based traffic control. The third section details the methodology for designing the proposed DRL and PID controllers. The fourth section presents the case study, including the simulator and network setup. The fifth section discusses the results and analysis, and the final section provides concluding remarks.
Related Work
Ramp Metering (RM)
RM stands as one of the earliest proposed freeway control strategies and has been a subject of extensive research in transportation engineering and intelligent transportation systems. It aims to optimize and improve traffic flow, reducing congestion and enhancing the overall transportation system performance by eliminating bottleneck activation, leading to CD and congestion spillback ( 3 , 4 ). Additionally, RM indirectly benefits traffic systems by influencing travelers’ route choices, thereby achieving a desired traffic flow distribution across the entire network ( 3 – 5 ). In Cassidy and Rudjanakanoknad ( 6 ), the authors investigated the impact of RM on the capacity of isolated merges. Their findings provided evidence that RM can positively affect the capacity of isolated merges, which shed light on the potential benefits of RM in specific traffic scenarios. In references ( 3 ) and ( 7 ), the authors provided an analysis and evaluation of RM, offering a comprehensive overview of the historical evolution and the application of new algorithms and engineering principles in RM systems. The Asservissement Linéaire d’Entrée Autoroutière (ALINEA), introduced in Papageorgiou et al. ( 8 ), is a prominent local feedback RM strategy. ALINEA remains one of the most widely implemented RM strategies globally owing to its simplicity, efficiency, and effectiveness in addressing recurrent freeway congestion. It has demonstrated promising results in field applications ( 9 ). Moreover, field results on the A6 motorway in Paris have provided valuable insights into the practical application of ALINEA as a local traffic-responsive strategy for RM ( 10 ). As a result of the success that ALINEA controller achieved, it has been the subject of theoretical analysis and simulation studies, which have shown its performance in various scenarios and have resulted in the development of many versions of ALINEA to solve different specific scenarios such as AD-ALINEA that is designed to operate in regions whose critical density/occupancy cannot be estimated beforehand or is dynamically changing, AU-ALINEA that is designed to operate using upstream measurements ( 11 ), PI-ALINEA which is designed using proportional–integral controller that is more stable when dealing with the presence of distant downstream bottlenecks from the controlled ramp ( 12 ), and coordinated ALINEA that is used to control two or more consecutive on-ramps ( 13 ). Beside the mentioned ALINEA versions, in Frejo and De Schutter ( 14 ), the authors proposed a feed forward ALINEA controller that predicts the changes in bottleneck density, showcasing the algorithm’s resilience through simulations. In Shang et al. ( 15 ), the authors extended ALINEA and tested it on various mixed autonomy scenarios to accommodate varying degrees of automation. In addition to ALINEA versions, the research presented in Fartash et al. ( 16 ) focused on the methodology to identify on-ramps for metering, considering system-wide recurrent and non-recurrent congestion. Their work addressed the challenges of RM in the context of congestion management and system-wide traffic control, providing insights into the considerations for effective RM deployment. In addition, in Pooladsanj et al. ( 17 ), the authors proposed RM strategy to optimize freeway throughput under various safety constraints. Furthermore, in Torné et al. ( 18 ) and Ramezani et al. ( 19 ), the authors proposed the coordination of RM with different active traffic management controllers like VSL and perimeter control. Finally, practical implementations and field studies have provided valuable insights into the practical performance of RM systems ( 20 ). The rich literature on RM provides valuable insights for the design and implementation of RM systems to enhance traffic flow efficiency and mitigate congestion.
Variable Speed Limits (VSL)
VSL is a considerably newer control strategy than RM. It has emerged as an intelligent transportation system solution for traffic management, aiming to improve safety and harmonize traffic flow by decreasing speed variation among vehicles across lanes and between upstream and downstream traffic flows ( 21 – 23 ). VSLs are displayed on variable message signs and have shown substantial traffic safety benefits, including reducing the risk of accidents by temporarily reducing speed limits during risky traffic conditions ( 24 ). Evaluation of VSLs to improve traffic safety has been a key focus, with studies investigating the benefits of VSL implementation for real-time crash risk reduction ( 25 ). Moreover, the interaction between system design and operations of VSL systems in work zones has been limited, despite the potential for VSLs to improve safety and operations in such areas ( 26 ). Furthermore, the environmental effects of changing speed limits, including the introduction of variable speed systems on metropolitan motorways, have been examined to assess their broader impacts ( 27 ). Over the last 10 years, VSL has gained prominence as a potential method for enhancing freeway efficiency, marking a shift from the earlier focus on safety-related implementations ( 22 , 23 ). The utilization of VSL for mainstream traffic flow control on freeways aims to optimize throughput by regulating the flow of traffic upstream of a bottleneck. Practical implementations and field studies have provided valuable insights into the performance of VSL systems. Results from a study on the dynamics of VSL systems surrounding bottlenecks on the German Autobahn have shed light on the practical implications of VSLs in addressing traffic challenges ( 28 ). Also, macroscopic modeling of VSLs on freeways has been explored, emphasizing the importance of traffic flow models in the design and evaluation of VSL controllers ( 29 ). Additionally, in Müller et al. ( 30 ) the authors conducted a microsimulation analysis of practical aspects of traffic control with VSLs, providing insights into the practical implications of VSLs on traffic control, including bottleneck analysis. Moreover, VSLs have been studied in the context of connected vehicle environments, providing dynamic speed advisory information to optimize traffic flow conditions for freeways and corridors under both recurrent and non-recurrent congestion ( 31 ). Also in the context of connected vehicle environments, in Müller et al. ( 32 ) the authors explored mainstream traffic flow control on freeways with different CAVs’ penetration rates, shedding light on the potential for cooperative strategies to enhance traffic flow on freeways. These studies collectively demonstrate the significance of VSLs in traffic management and control, shedding light on their potential to optimize traffic flow, improve safety, and address congestion issues on motorways while proposing various types of controllers. Among the most famous VSL controllers aiming to optimize flow at bottlenecks are: cascaded double loop regulator (1), and the ALINEA-like integral feedback controller that is proposed and tested on a microscopic simulator in Zhang et al. ( 33 ). However, the lack of an acceleration area limited the improvements achieved. After the proposal of the acceleration area, ALINEA-like VSL controllers are widely used for their simplicity as in Müller et al. ( 30 , 32 ) where the controller is tested via Aimsun microscopic simulator. However, these controllers require prior knowledge or estimation of traffic flow parameters like the critical density (to define the set-point), and do not explicitly target an optimal solution.
Both RM and VSL controllers can cooperate together and with other controllers, as in Carlson et al. ( 34 ), to achieve better flow efficiency at bottlenecks, and in Tajdari et al. ( 35 ) and Markantonakis et al. ( 36 ) to coordinate with lane-change on freeway. However, these controllers require prior knowledge or estimation of traffic flow parameters like the critical density (to define the set-point), and do not explicitly target an optimal solution.
Reinforcement Learning-Based Traffic Control
More recently, advancements in artificial intelligence techniques have led to a growing interest in RL-based approaches for traffic control, mainly model-free RL algorithms. These RL-based traffic control methods have gained attention since they do not need an explicit model of the environment and are able to learn good policies through purely interacting with the environment. Initially, the application of RL in road traffic control was primarily explored in the context of optimizing traffic signal settings within urban traffic networks, as in El-Tantawy et al. ( 37 ), Zhao et al. ( 38 ), Ozan et al. ( 39 ), and Abdulhai et al. ( 40 ). Also, there are early RL using Q-Learning algorithm applications for optimizing traffic flow at a bottleneck; in Rezaee et al. ( 41 ), Davarynejad et al. ( 42 ) where RM is applied; in Li et al. (43) where VSL is applied; and in Schmidt-Dumont and van Vuuren ( 44 ) where RMVSL is applied. In the forementioned studies, a tabular RL approach was employed to train their RL models. More recent advancements in RL have shown promising results by combining RL with the deep learning technique ( 45 ). The integration of deep learning techniques with RL has opened up possibilities for implementing RL-based traffic control strategies in large-scale traffic networks characterized by vast state and action spaces ( 46 – 52 ). However, in some recent implementations of deep RL (DRL) the reward is designed based on the critical density or bottleneck speed ( 50 , 51 , 53 ), which is similar to the mentioned PID controllers, and may not yield optimal performance as that enforces the agent to perform at a certain density.
This study builds on previous work by systematically evaluating RL-based RM and VSL controllers against traditional regulator-based controllers. We analyze their performance across different congestion scenarios, highlighting the advantages of adaptive RL-based control in dynamically selecting the appropriate control strategy based on real-time traffic conditions.
Methodology
Reinforcement Learning
RL ( 54 ) is an area of machine learning in which a controller is trained to act well by iterative adjustment of the controller parameters in response to the observed effects of the controller’s actions. The controller is frequently referred to as an agent. The agent takes actions, receives observations from its environment, and receives rewards. The changes of the environment that result from the agent’s actions may be stochastic, and so may the rewards that the agent receives. The agent’s goal is to learn to act in a manner that optimizes its expected sum of future rewards.
The environment and reward dynamics are frequently assumed to depend only on the current observation and action (and not on how the controller and environment got to the current state, that is, the full observation-action history). This is a simplifying assumption, called the Markov property. A process of agent-environment interactions for which the Markov property holds true (and in which the observations describe the entire agent-environment state) is formalized in the concept of a fully observable Markov decision process (MDP). An MDP is a six-tuple
An agent’s policy is defined to be a function
at each state
In this work, we apply the deep Q-networks (DQN) algorithm (
45
). The DQN algorithm is model-free (meaning that it does not use the explicit model of the environment transitions
To describe the DQN algorithm, it is necessary to introduce some additional concepts and notation. We denote the
Intuitively,
holds for all
The training loop for the DQN algorithm proceeds as follows. Every environment step, the agent takes the greedy action
The gradient descent is attempting to bring
The replay buffer shuffles and draws samples to train the
We have used a further improvement of DQN called double DQN (DDQN) ( 56 ). The motivating observation is that the sampled maximum is a biased estimate of the true maximum, when using the target network in both action-selection and action-evaluation steps in Equation 5:
Therefore, the DQN target tends to overestimate. DDQN attempts to address this deficiency by decoupling the computation of these two steps by replacing the target network
In the experiments, a feed forward neural network with three hidden layers is adopted to model
Double Deep Q-Networks Parameter Settings
PID Controllers
PID controllers are feedback controllers that are frequently used in practice. PID controllers are comparatively simple to implement, yet often have good performance, especially in systems that are close to being linear.
We now describe the general form of a discrete-time PID controller. We denote the times when the control action will be applied by
where the coefficients are
In practice, based on the desired characteristics of the controller, one or two of the Proportional, Integral, and Derivative terms in the update rule described in Equation 7 may be omitted. In applications to RM and VSL control (for example, as in Papageorgiou et al. ( 8 ) and Müller et al. [ 30 ]), it is common to use a discrete-time integral controller, with the simplified update rule
For both RM and VSL control applications, the system output is measured by the average detector occupancy
Case Study
Freeway Network and Traffic Demand
Hypothetical Freeway Network from Literature: To replicate and compare with the literature and study the effects of different controllers on a simple freeway bottleneck, the simple hypothetical network shown in Figure 1 is implemented with the same dimensions and parameters as in Müller et al. ( 30 ).

The network models a 4.3 km two-lane freeway that adjoins one on-ramp 300 m upstream from the end. Immediately after the on-ramp nose, there is a 200-m-long acceleration and merging lane. Then, there is a lane drop that forms a bottleneck. The RM signal is located at the end of the ramp section just before the start of the acceleration lane. For VSL, an application area of 100 m is located 175 m upstream of the on-ramp ( 30 ).
The demand was designed to be high enough to trigger congestion in the no-control case as in Müller et al. ( 30 , 32 ). However, in our case the ramp demand was slightly edited as shown in Figure 2 to extend the time where both mainline and ramp demands peak simultaneously, between 0.5 and 1.5 h. The entering vehicle time-headways are sampled following an exponential distribution.

Hypothetical freeway network demand profile.
Traffic is modeled exclusively with passenger cars. Most parameters were set to the simulator’s default values, except for the reaction time (0.5 s) and vehicle acceleration (1.5 m/s2), which were adjusted to calibrate the freeway capacity. Additionally, parameters related to the two-lane car-following model were modified to accommodate a maximum speed difference of 30 km/h between the mainstream lanes and 50 km/h between the rightmost and middle lanes in the three-lane section. These adjustments aim to facilitate smoother lane-changing maneuvers.
All other simulation settings and parameters follow those in Müller et al. ( 30 ), except for the cooperation rate at the merging area, which was set to 50%. This means that 50% of drivers in the mainstream lane are willing to adjust their speed to facilitate merging from the acceleration lane. This cooperation rate helps balance the ramp and mainstream demand in the merging area.
Real World Freeway Network: To test the controllers on a network with real geometry, the eastbound (EB) on-ramp at Winston Churchill Boulevard located in the city of Mississauga, Ontario, Canada, shown in Figure 3 is used as a real geometry one-ramp testbed for different controllers.

Queen Elizabeth Way eastbound (QEW EB) Winston Churchill on-ramp network geometry.
The network consists of a 4.8 km three-lanes mainstream section, 275 m merging area located 477 m upstream from the downstream end of the network, north and south on-ramps that merge together to a one-lane section then merges to the freeway at the previously mentioned merging area, and application and acceleration areas designed exactly as in the hypothetical network ( 30 , 32 ).
The demand profile is set as shown in Figure 4. It is designed to have the same general design as the demand set to the hypothetical network but with different peaks for both mainstream and on-ramp demands. These peaks were chosen to test the performance of controllers with excessively high demands coming from mainstream and on-ramp simultaneously. The simulation is continued for 30 min after the end of demand to allow the network to drain out all vehicles. All microscopic modeling is conducted using the Aimsun Microscopic Simulator ( 57 ) with all the simulation settings and parameters are kept as in the hypothetical network.

Winston Churchill network demand profile.
Definitions of the MDPs for RM, VSL, and RMVSL
This section defines the states, actions and rewards used to train the RL controllers for RM, VSL, and RMVSL. The state space and reward are the same in the three cases, but the action spaces differ. In all three cases, the actions are applied with a period of 60 simulator seconds. The 60 s in-between controller actions is called the control interval.
State Space: The observation is composed of a collection of statistics. The traffic flow through the acceleration section and on-ramp section; flow, speed and density on the bottleneck section; and occupancy of two detectors placed at the bottleneck section. All statistics collected for the state representation are averaged over the 60 s control interval.
RM Action Space: The action consists of setting the RM rate
VSL Action Space: The action consists of setting the variable speed limit
Simultaneous (RMVSL) Action Space: The RM rate
Reward: The immediate reward is the average output flow from the bottleneck section during the 60 s control interval.
Results
In this paper, we test regulators and RL-based RM, VSL, and the cooperation of both RM and VSL controllers on both the hypothetical and the real-life networks. The results are analyzed and compared with the no-control case to show the effects of each controller on the mainstream and on-ramp travel times. The results are mainly analyzed using network total travel time (TTT), which is defined by Aimsun as TTT experienced by all the vehicles that have crossed the network by the end of the simulation. It includes the time spent in virtual queues. For further investigation for the effects of the application of different controllers, this network TTT is broken down into mainstream TTT and on-ramp TTT for vehicles generated from mainstream and on-ramp centroids respectively. The TTT illustration is followed by plots and discussions for total flows as well as the break down of flows to mainstream as well as on-ramp travelers, and FD for each control case.
As described in the previous section, although our goal is to minimize TTT, our DRL agent is trained to maximize the outflow from the bottleneck as a surrogate, as the bottleneck outflow is simpler to measure in practice.
RM Results
Hypothetical Freeway Network: Table 2 shows the performance of ALINEA and RL-RM controllers and compares their travel times to the no-control case for the hypothetical network. The table highlights that RL-RM performs best as regards on-ramp (338 h) and overall network TTT (665 h) while ALINEA has the lowest TTT for mainstream travelers only (309.1 h). This can be explained by considering that RM generally gives priority to mainstream travelers on the expense of on-ramp travelers. However, RL-RM can result in benefiting both groups of travelers while ALINEA results in lowering TTT for mainstream travelers while increasing the on-ramp TTT.
Hypothetical Network Ramp Metering Controllers Total Travel Time (TTT)
Note: ALINEA = Asservissement Linéaire d’Entrée Autoroutière; RL = reinforcement-learning; na = not applicable.
Figure 5 highlights that both ALINEA and RL-RM controllers can avoid the CD. It is important to note the demand profile (Figure 2) while analyzing results. In Figure 5a, without control, congestion occurs, and the CD happens after approximately 40 min from the start of the simulation as depicted by the sudden drop of the blue line, when both mainline and on-ramp demands peak. The congestion persists until a few minutes before the end of the simulation, at around 140 min of simulation. However, although neither ALINEA and RL-RM trigger congestion on the freeway, RL-RM can serve the demand more efficiently than ALINEA. The bottleneck flow in the case of RL-RM stays nearer to capacity than in ALINEA, as depicted by the gray and orange lines, respectively. (Color online only.) In addition, when RL-RM is utilized, the bottleneck flow drops after 120 min only because of serving the total demand. While in the ALINEA case, the total demand takes more time to be served, the flow drops after approximately 135 min. Considering mainstream demand, Figure 5d shows that both RL-RM (gray line) and ALINEA (orange line) have comparable efficiencies on the mainstream flow. But, for the on-ramp demand, the RL-RM controller serves the on-ramp demand faster (at around 120 min) and more efficiently than ALINEA which serves the demand at around 135 min, as illustrated in Figure 5c.

Hypothetical network ramp metering (RM) controllers: (a) bottleneck flow, (b) bottleneck flow/density (FD), (c) ramp flow, and (d) upstream mainstream flow. Color online only.
Figure 5b illustrates that ALINEA is generally more conservative than RL-RM as it leads to a lower density and higher speed, depicted by more orange points at the left part of the graph in Figure 5b. We examined several set points for ALINEA to properly select the critical density. However, this is a manual process and therefore is not optimized. In Figure 5b, ALINEA points are either at capacity or at the free-flow side, whereas RL-RM points are either at capacity or near capacity at the congested side.
Winston Churchill Freeway Network Table 3 compares the performance of each RM controller, to the no-control case for the Winston Churchill network. Overall, ALINEA and RL-RM perform similarly with reference to TTT for mainstream travelers (669.1 h and 670.4 h, respectively) whereas RL-RM performs slightly better as regards on-ramp and overall TTT (346 h and 1,016.4 h, respectively). In the case of excessively high demand coming from the mainstream, it is clear that RM gives priority to mainstream travelers at the expense of on-ramp travelers. Although both ALINEA and RL-RM lead to longer TTT for on-ramp travelers, RL-RM causes less deterioration to on-ramp TTT than ALINEA.
Winston Churchill Ramp Metering Controllers Total Travel Time (TTT)
Note: ALINEA = Asservissement Linéaire d’Entrée Autoroutière; RL = reinforcement-learning; na = not applicable.
Similar to the results of the hypothetical network, both ALINEA and RL-RM controllers can avoid the CD as illustrated in Figure 6. It is important to notice the demand profile (Figure 4) while analyzing results. In Figure 6a, without control, congestion occurs, and the CD happens after approximately 10 min from the start of the simulation, when the sum of both mainline and on-ramp demands exceeds the bottleneck capacity, as depicted by the blue line. The congestion persists for approximately 150 min. When applying ALINEA (orange line) or RL-RM (gray line), the freeway bottleneck can serve the demand without triggering congestion. In this case, RL-RM performs slightly better than ALINEA, where the gray and orange lines stay above the blue line. (Color online only.) Concerning serving the mainstream demand, Figure 6d shows that both RL-RM and ALINEA serve the mainstream flow with nearly the same efficiency. However, for the on-ramp demand, RL-RM controller serves the on-ramp demand slightly faster than ALINEA as illustrated in Figure 6c.

Winston Churchill network ramp metering (RM) controllers: (a) bottleneck flow, (b) bottleneck flow/density (FD), (c) ramp flow, and (d) upstream mainstream flow.
For a clearer understanding of the analyzed RL-RM controller in both moderately and severely congested scenarios, Figure 7 illustrates the learnt policy for RL-RM controller in both cases. Generally, the controller performs as expected; it decreases the metering rate when both mainstream and on-ramp demand peak, between approximately 30 and 90 min, then increases the metering rate when the mainstream demand decreases, after approximately 90 min. The main difference between the RL-based and integral controllers is that the policy learnt by the DRL agent shifts between high and low flows without large deviations in bottleneck density as shown in Figures 5b and 6b. This shows that the controller does not only target the critical density but also targets other bottleneck activation factors like merging and lane-changing maneuvers, so it keeps pushing forward and backward the amount of merging vehicles. Conversely, without significant deviations from the set occupancy, ALINEA’s control decisions vary smoothly. The ALINEA controller reacts only after the increase in occupancy/density which makes it a reactive controller, unlike RL-based controllers that can be proactive to any problem as illustrated in Li et al. ( 43 ).

Reinforcement learning-ramp metering (RL-RM) applied ramp actions: (a) hypothetical network and (b) Winston Churchill network.
VSL Results
Hypothetical Freeway Network: We compare the performance of each VSL controller to the no-control case for the hypothetical network in Table 4. Again, the overall performance in the case of any control method is better than the no-control case. This is because VSL gives priority to the on-ramp demand and controls the mainstream flow. All RL-based VSL controllers perform better than the regulator concerning overall TTT (925.3 h, 905.9 h, and 970.8 h, respectively). However, VSL with integral control results in lower TTT for on-ramp travelers (57.4 h) but higher TTT for mainstream travelers (913.4 h) compared with RL-based controllers.
Hypothetical Network Variable Speed Limit Controllers Total Travel Time (TTT)
Note: RL = reinforcement learning; na = not applicable.
As illustrated in Figure 8a, all VSL controllers (orange, gray, and yellow lines) mitigate a CD and achieve higher bottleneck flows than the no-control case (blue line). (Color online only.) The better TTT for both mainstream and on-ramp travelers can be clearly observed in Figure 8, c and d , where both ramp flows and upstream mainstream flows are enhanced after the application of any VSL controller.

Hypothetical network variable speed limit (VSL) controllers: (a) bottleneck flow, (b) bottleneck flow/density (FD), (c) ramp flow, and (d) upstream mainstream flow.
Similar to RM, regulator-based VSL is more conservative for mainstream travelers than RL-based controllers. That can be seen in Figure 8b and Table 4 where the regulator results in a higher speed, a lower density at the bottleneck, higher TTT for mainstream travelers (913.4 h), and lower TTT (57.4 h) for on-ramp travelers.
Winston Churchill Freeway Network Table 5 compares the performance of each VSL controller to the no-control case on the mainstream and on-ramp TTT for the Winston Churchill freeway network. In the case of a very high demand, all VSL controllers cannot add significant benefits to the network-wide TTT. Table 5 also illustrates that VSL controllers can only give priority to on-ramp demand by decreasing the TTT for on-ramp travelers while controlling the mainstream flow and increasing the TTT for mainstream travelers. All VSL controllers perform slightly better than no-control case as regards overall TTT.
Winston Churchill Network Variable Speed limit Controllers Total Travel Time (TTT)
Note: RL = reinforcement learning; na = not applicable.
As illustrated in Figure 9, all VSL controllers fail to avoid the CD in the case of excessively high demand coming from both mainstream and on-ramp (all lines are approximately at the same level). Figure 9a shows that the bottleneck flow does not achieve any obvious gain by the application of any method of VSL controllers. However, in Figure 9, c and d , regulator-based VSL (orange line) can serve on-ramp demand faster than both RL-base controllers (gray and yellow lines) and the base case (blue line) at the expense of mainstream travelers. (Color online only.) RL-based VSL can serve on-ramp demand slightly faster than the no-control case while serving mainstream travelers with the same performance as in the no-control case. The fundamental diagram in Figure 9b shows that all VSL controllers can avoid the very high congested region where the density is higher than 80 veh/km/lane which illustrates the slightly better performance of the VSL controllers.

Winston Churchill Network variable speed limit (VSL) controllers: (a) bottleneck flow, (b) bottleneck flow/density (FD), (c) ramp flow, and (d) upstream mainstream flow.
We analyze the behavior of RL-based VSL controllers further by looking at the policy learnt by each RL-based controller, as illustrated in Figure 10. All controllers learn that having very low speed limits is not the optimum solution when both demands peak as low speeds increase the TTT for a large mainstream demand (3,500 vph and 6,000 vph for hypothetical network and Winston Churchill freeway network, respectively) and decrease the TTT for less travelers (onramp demand; 1,000 vph and 1,250 vph for hypothetical network and Winston Churchill freeway network, respectively). Lower speeds will also facilitate the merging of the unmetered platoon while causing unneeded flow decrease for the left lane(s). For right-lane speed controllers, the agents learn to decrease the speed limit to slightly facilitate the merging of on-ramp travelers as that behavior will not cause any unwanted metering for left-lane travelers. However, whenever the demand is lower, the agent learns that lowering the speed limit can be effective. This learnt behavior by the RL agent is interesting and cannot be applied by the feedback integral controller as this feedback controller keeps decreasing the speed limit until the minimum allowed speed limit is reached (15 km/h in this case) as long as the occupancy on the freeway is larger than the set occupancy to the controller. In Figures 8 and 9b for integral controller, most bottleneck density points are larger than the critical density, so the integral controller decreases speed to minimum value most of the time.

Reinforcement learning-variable speed limit (RL-VSL) applied speed actions: (a) hypothetical network right-lane speed action, (b) hypothetical network section speed action, (c) Winston Churchill network right-lane speed action, and (d) Winston Churchill network section speed action.
The main goal for both RM and VSL is to maximize freeway output flow at bottleneck locations by controlling the input flow to a given bottleneck. RM can simply control and cut down the on-ramp flow to the value decided by the controller. However, VSL controllers aim to control the mainstream flow indirectly by controlling the mainstream speed limit at the application area to create a controlled congestion upstream on the freeway before the ramp flow merges into the freeway. To better understand the effects and behavior of the VSL controllers, Figure 11 illustrates the effects of applying different speed limits to the FD of uninterrupted traffic flows (upstream to the bottleneck without congestion) for each network. The lower the applied speed limit the lower the flow capacity and the higher the critical density, this behavior is the same as in Papageorgiou et al. ( 22 ), Carlson et al. ( 58 , 59 ). The change in flow capacity and critical density is not linear with the change in speed limit. For example, the change in capacity flow between 100 km/h (dark blue color) and 90 km/h (orange color) speed limits is significantly less than the change in flow capacity between 30 km/h (brown color) and 20 km/h (gray color). (Color online only.) To illustrate this nonlinearity, Figure 12 shows the capacity flow change with different speed limits for both networks. The behavior is the same as illustrated in Müller et al. ( 30 , 32 ). Both freeway networks have the same nonlinear behavior between speed limit and output flow from application area. Most important to highlight is the range of flow control that VSL can manage to do. For example, in Figure 12b, for Winston Churchill freeway network, VSL can decrease the uninterrupted flow capacity when changing the speed limit from 100 km/h (dark blue color) to 20 km/h (gray color) by approximately 1,600 vph. (Color online only.) This is the maximum decrease in flow as the achieved flow capacity while merging is less than the flow capacity of the uninterrupted flow. Given that the network has three lanes, the maximum decrease in flow when changing speed limit from 100 km/h to 20 km/h is 533 vph/lane. The right lane flow can be decreased by 533 vph while the unmetered ramp flow adds 1,250 vph. Thus, the main reasons behind the poor performance of VSL when compared with RM are:
Insufficient flow metering compared with incoming demand from the on-ramp.
The decrease in mainstream flow is distributed among all lanes while it is needed only for the right lane.
Conflicts that happen between right-lane mainstream travelers and the high unmetered flow coming from the on-ramp and merging in platoons

Flow/density (FD) for different speed limits: (a) hypothetical and (b) Winston Churchill freeway network.

Flow capacity for different speed limits: (a) hypothetical and (b) Winston Churchill freeway network.
Concurrent RM + VSL Results
In this section, we present the most interesting experiment and results in which we apply both RM and VLS concurrently. As will be discussed, it is worthy to note how the RL agent, learning and having access to both RM and VSL actions, discovers when to favor and prioritize one over the other, showing interesting control dynamics, which is the primary contribution of the paper.
Hypothetical Freeway Network: Table 6 summarizes the performance of RMVSL, regulator and RL-based controllers for the hypothetical network. The application of any control is better than no-control case as illustrated also in Figure 13. RL-based controllers can achieve better results (TTT) than the integral controller. In Figure 13, the flow of RL-based controllers falls after serving all the demand earlier than the integral controller. This shows the ability of RL-based controllers to serve all the demand in a shorter time. RL-RMVSL controllers always have better mainstream TTT than regulator. However, the on-ramp TTT in the right-lane RL-RMVSL controller (664.3 h) is significantly better than the TTT in the integral controller (856.2 h). Although the integral controller is designed to prioritize RM first, then apply VSL when RM alone fails to achieve the goal as in Carlson et al. ( 34 ), RL-based controllers learn to maintain a higher priority to the mainstream demand and decrease the VSL application.
Hypothetical Network Integrated Ramp Metering and/or Variable Speed Limit (RMVSL) Controllers Total Travel Time (TTT)
Note: RL = reinforcement learning; na = not applicable.

Hypothetical network integrated ramp metering and/or variable speed limit (RMVSL) bottleneck flow.
Winston Churchill Freeway Network Table 7 summarizes the performance of RMVSL, regulator and RL-based controllers for the Winston Churchill freeway network. As shown in Figure 14 and Table 7, RL-based controllers can achieve higher bottleneck flows (depicted by the gray and yellow lines) and lower overall TTT compared with the integral controller (orange line). (Color online only.)
Winston Churchill Integrated Ramp Metering and/or Variable Speed Limit Controllers Total Travel Time (TTT)
Note: RL = reinforcement learning; na = not applicable.

Winston Churchill network integrated ramp metering and/or variable speed limit (RMVSL) bottleneck flow.
Most interestingly, Figure 15 displays the actions applied by each agent. Generally, RL-RMVSL controllers learn tightening on-ramp flow more than mainstream flow whenever both mainstream and on-ramp demands peak. However, when the demand decreases (after approximately 90 min) the agent learns to decrease the speed limit and allow more flow from the on-ramp. The RL agent learns when and how to switch between RM and VSL depending on the state of the bottleneck and the incoming demand from both upstream mainstream and on-ramp. This dynamic and optimal behavior of when and how to switch between controllers is interesting, effective and cannot be applied by regulators. The learnt policies as well as the results from all controllers show that RM is more effective than VSL in highly congested scenarios, while VSL effectiveness appears in moderate and low congested scenarios. The RL agent learns automatically which to emphasize and when, depending on the prevailing traffic conditions.

Integrated ramp metering and/or variable speed limit (RMVSL) actions: (a, b) hypothetical network right-lane control, (c, d) hypothetical network section control, (e, f) Winston Churchill network right-lane control, and (g, h) Winston Churchill network section control.
Although in Carlson et al. ( 1 ) it is reported that VSL can be less effective than RM only because of blocking of upstream off-ramps, in this paper we provide careful analysis for both controllers that shows that VSL can be ineffective in the case of heavy congestion, not attributable to any blocking of off-ramps. In fact, there are no off ramps in our experiment. In heavy congestion, we observe that it is not advisable to meter the mainstream, as clearly demonstrated by the learnt policy of the RL agent.
Conclusion
In this paper DRL-based RM, VSL, and RMVSL are designed and compared with regulator-based RM, VSL, and RMVSL, as well as the no-control case, to analyze the performance and the effects of each individual controller on mainline and on-ramp TTT. The analysis of the results shows that:
Any kind of control improves substantially over the no-control case, which is expected and consistent with the literature.
All RL-based controllers perform better than the corresponding regulators as RL-based controllers are optimizing controllers and do not rely on manually determined set points as in the regulator approach.
RL-based controllers display a wider range of behaviors than regulators
RM controller is more effective than, and definitely preferred to, VSL in highly congested scenarios.
VSL effectiveness in decreasing TTT appears in moderate and low congested scenarios.
RL-based RMVSL learns how to optimally combine RM and VSL and when and how to switch between them based on traffic conditions and the effectiveness of each controller, which we find to be very interesting, and useful.
In this study, we focused on simple freeway networks with hypothetical demands and isolated on-ramps to thoroughly analyze and evaluate the application of RL as well as regulator-based RM and VSL controllers. Applying these control strategies to larger-scale scenarios, such as longer congested urban freeways with consecutive on-ramp bottlenecks, may present additional challenges. For instance, resolving an upstream bottleneck might increase traffic flow toward downstream bottlenecks, potentially resulting in more severe and difficult-to-manage congestion. These complexities will require more sophisticated control strategies. Future research will build on the insights gained in this study to develop and implement effective controllers for managing traffic on large, heavily congested urban freeways with multiple on-ramps and calibrated, realistic demand scenarios, ultimately ensuring more practical and scalable solutions.
Footnotes
Author Contributions
The authors confirm contribution to the paper as follows: study conception and design: O. ElSamadisy; data collection: O. ElSamadisy; analysis and interpretation of results: O. ElSamadisy, I. Smirnov, B. Abdulhai; draft manuscript preparation: O. ElSamadisy, I. Smirnov, X. Wang. All authors reviewed the results and approved the final version of the manuscript.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
