Replicating the behaviour of electric vehicle drivers using an agent-based reinforcement learning model

Abstract

Despite the rapid expansion of electric vehicle (EV) charging networks, questions remain about their efficiency in meeting the growing needs of EV drivers. Previous rule-based ABMs have struggled to capture the adaptive behaviours of human drivers. Although reinforcement learning has been applied in EV simulation studies, its application has primarily focused on optimising fleet operations rather than modelling private drivers who make independent charging decisions. To address this gap, we propose a multi-stage reinforcement learning framework that simulates the charging demand of private EV drivers across a national-scale road network. We validate the model against real-world data and identify the training stage that most closely reflects actual driver behaviour. Based on the simulation results, we identify critical ‘charging deserts’ where EV drivers face high risks of battery depletion. Our findings further highlight recent policy shifts toward expanding rapid charging hubs along motorway corridors and urban boundaries to meet growing demand from long-distance trips.

Keywords

electric vehicles charging demand driver behaviours reinforcement learning

Introduction

With global net-zero carbon transition targets, rapid technological advancements, and declining battery costs, many countries have witnessed a significant rise in EV adoption (Mahmud et al., 2023). In the early stages of EV adoption, households adopting EVs typically own garages or off-street parking suitable for installing home chargers (LaMonaca and Ryan, 2022). However, as EV ownership continues to grow, the market has shifted from early adopters to the mass market (IEA, 2024). This transition has led to increased demand for public chargers, which is particularly high among drivers without access to private chargers.

Recent research has questioned whether the emission reduction benefits from EVs can be sustained due to variability in consumer behaviour (Nunes et al., 2022). Concerns have also been raised about whether the charging network expansion is misallocating resources and failing to adapt to user needs (Metais et al., 2022). In the UK, geographical disparities in public chargers have created significant barriers to EV adoption. Some areas have lagged behind others in the deployment process (Peng et al., 2024). A survey by the Department for Transport (2022a, 2022b) indicates that existing EV drivers aim to adjust their behaviours to integrate charging into their trip schedules and daily habits. Potential EV adopters also prioritise proximity, reliability and dependability of chargers as key factors in their decision to switch to EVs. It is therefore important to understand the complex behaviours of diverse EV drivers and accurately estimate charging demand to support a more effective and sustainable charging network.

Previous studies have explored EV driver behaviours and estimated their charging demand using either data-driven or simulation-based approaches. Many studies are conducted within grid cells (van der Kam et al., 2019) or intra-city scale networks (Pagani et al., 2019; Yi et al., 2023), while other studies also extend to large geographical contexts over regional or country levels where long-distance journeys bring critical challenges to EV drivers (Anjos et al., 2020; Liao et al., 2023). However, despite their contributions, both approaches have limitations. Data-driven methods often face limited access to EV-specific datasets due to privacy concerns (Park and Joe, 2024). As a result, researchers often rely on alternative datasets, such as socio-demographic data (Crozier et al., 2021), GPS data from conventional vehicles (Kontou et al., 2019) or commercial EV fleets (Hu et al., 2022). While these datasets offer high granularity or rich attributes, they may introduce biased representations of private EV drivers. Meanwhile, simulation-based approaches, such as Agent-based Models (ABMs), have proven effective in capturing the behavioural features of EV drivers (Yi et al., 2023). However, these models often rely on static behavioural rules, which fail to reflect the adaptive nature of human drivers in response to the changing environment. For example, as EV drivers gain experience, they tend to charge less frequently and become more comfortable operating with lower battery levels.

The objective of this research is to develop and validate an agent-based reinforcement learning (RL) model using a Deep-Q Network (DQN) to (1) simulate the adaptive charging behaviours of heterogeneous EV drivers in Great Britain (GB); and (2) estimate the spatial distribution of public charging demand and identify locations with high risks of battery depletion. Our approach first categorises drivers based on travel distance, battery status, trip purpose, and environmental factors. We then implement a novel multi-episode training-simulation framework where representative agents from each cluster learn charging behaviours through RL. These learnt behaviours are then used to simulate the broader population of drivers within their respective groups, creating an updated charging environment that feeds back into the next training episode. This iterative process captures both the adaptive charging behaviours of EV drivers and the evolving interactions among drivers at charging stations. Unlike previous studies that use RL with rational agents to derive optimal routing and charging strategies (Dastpak et al., 2024; Jin and Xu, 2022; Yu et al., 2023), we use RL to replicate the adaptive learning process of real-world private drivers operating under bounded rationality when making charging decisions. Rather than identifying the training episode that achieves optimal performance, we validate against real-world charging session data to identify the episode that most closely reflects observed behaviour. The model provides a validated modelling framework to estimate charging demand distribution and inform future charging infrastructure planning.

Background

To date, extensive studies have explored charging demand estimation and EV driver behaviour through data-driven or simulation-based approaches. Data-driven studies often rely on socio-demographic data, such as population density, average travel distance, and travel flow volumes, to estimate charging demand distribution (Dong et al., 2019; Hardinghaus et al., 2019). More recent research has also used vehicle trajectories to infer EV charging demand, including data from conventional vehicles (Kontou et al., 2019) and EV fleets such as taxis (Hu et al., 2022). Despite the high granularity of these trajectory datasets, they may introduce bias when representing charging demand for private EV drivers. Meanwhile, limited data availability for private EV drivers remains a challenge, partly due to the early adoption stage of EVs and privacy concerns about connected vehicle data. Some studies use driver surveys to collect data on private EV drivers (Hasan and Simsekoglu, 2020). While such surveys provide valuable insights into driver behaviours, their high costs often restrict sample sizes and limit generalisability to broader spatial contexts.

Given the lack of data for private EV drivers, simulation-based approaches, such as ABMs, have been developed to simulate EV charging demand with predefined behavioural rules. These models consider multiple aspects of driver behaviours, including psychological features, financial preferences, and vehicle attributes (Adenaw and Lienkamp, 2021; Liu et al., 2022). However, despite these detailed behavioural rules, questions remain about their ability to represent EV drivers’ diversity and adaptability. Most ABMs simulating EV driver behaviour rely on static behavioural rules, such as initiating charging when SOC falls below a fixed threshold. Even when psychological factors such as range anxiety are incorporated, they are typically modelled as static attributes that vary across individuals but remain unchanged within individuals over time (Lin et al., 2022; van der Kam et al., 2019).

One critique of ABMs for simulating human behaviour concerns their limited capacity to represent adaptive human behaviours. Rule-based ABMs commonly follow an ‘if-then’ structure (DeAngelis and Diaz, 2019), in which deterministic rules assign a single behavioural response to a given situation (Grimm and Railsback, 2005). Agents are inherently limited in their ability to learn and remember previous decisions (Brearcliffe and Crooks, 2021), which can limit the explanatory power of simulations (Macal, 2016). In addition, ABMs can face substantial computational challenges when the number and details of agents, interactions, decision rules, scheduling, and environment increase (Sun et al., 2016). Incorporating heterogeneous adaptive and cognitive processes intensifies these challenges and therefore must be addressed through appropriate methodological choices.

Machine learning (ML) techniques provide great potential to bring higher levels of agent intelligence into ABMs. The integration of ML with ABMs can take various forms depending on the modelling objective and the stage of the modelling process. Although the first use of ML in ABMs is difficult to pinpoint, an early and notable contribution is Rand’s (2006) systematic framework for incorporating ML into ABMs to inform agent decision-making. Later studies have extended this framework to encompass the broader modelling workflow Zhang et al. (2023) categorise ML applications across data preprocessing, agent behaviour specification, and model output analysis. Ale Ebrahim Dehkordi et al. (2023) further classify ML-ABM integration based on three dimensions: the ML technique employed, the ABM challenge addressed, and the modelling purpose.

This study focuses specifically on ML’s contribution to addressing the behavioural modelling challenge of ABMs – namely, the development of more realistic and adaptive representations of agent behaviour by integrating RL algorithms. RL is guided by rewards and penalties and enables agents to learn through interactions with other agents and the environment (Sutton and Barto, 1988). Through repeated exposure to the environment and interactions with other agents, RL-trained agents navigate spaces as if they were human and gradually learn behavioural strategies that maximise cumulative reward (Malleson et al., 2022). Additionally, RL does not require a dataset of correct answers (Dahlke et al., 2020), making it especially valuable for studying private EV drivers whose trajectory data is limited.

In transportation research, RL has been widely applied to optimise vehicle routing and charging decisions (Jebessa et al., 2022; Koh et al., 2020). One notable feature of these studies is their emphasis on optimising routing and charging strategies for EVs (Yan et al., 2021). One group of studies assumes agents share information with each other in a timely manner and uses RL to optimise the coordination of EV fleets (Lin et al., 2022; Sultanuddin et al., 2023). Another group focuses on individual drivers, where RL is used to identify optimal routing and charging strategies to minimise travel time, energy consumption, or charging costs (Dastpak et al., 2024; Jin and Xu, 2022; Yu et al., 2023). However, the assumptions of complete information exchange and perfect rationality do not apply to private EV drivers, who act independently and have limited, random communication with one another. Recent research on modelling cognitive decision-makers has increasingly emphasised the importance of incorporating bounded rationality into human decision-making models (Pappalardo et al., 2023). Despite RL’s ability to reflect adaptive learning, how to integrate bounded rationality into RL frameworks to better represent private EV drivers remains underexplored.

A further limitation of previous studies modelling RL-trained EV drivers is the challenge of scalability. Simulating EV driver behaviours across large geographical areas is critical, as these are where long-distance journeys bring critical challenges to EV drivers. Prior studies have used ABMs to simulate EV driver behaviours in large geographical scales (Anjos et al., 2020; Liao et al., 2023). However, incorporating RL-trained agents into large-scale environments can introduce further computational challenges due to the expanded road and charging networks, which increases the number of charging choices available to agents. While some efforts have been made to improve cooperative multi-agent RL, simulating large populations of RL-trained agents operating independently across geographically large-scale scenarios needs further exploration.

In summary, despite significant progress in estimating charging demand through both data-driven and simulation-based approaches, several critical gaps remain. Data-driven methods face challenges related to biased representations and limited data availability for private EV drivers. Simulation-based models often rely on static behavioural rules and fail to capture the adaptive nature of human decision-making. Integrating RL into ABMs offers advantages in incorporating an adaptive learning process. However, previous studies have mainly applied RL for EV fleet optimisation. It therefore remains underexplored how RL-trained ABM can be used to simulate private EV drivers with bounded rationality and adaptive behaviours, especially at large geographical scales where computational burdens increase.

Data and study area

We take GB as our study area, comprising England, Scotland, and Wales, with a focus on the trips made by England residents travelling across GB. As of April 2024, there are 59,125 publicly available chargers across the region (Department for Transport, 2024). England has a high volume of private vehicle trips, with residents travelling an average of 5373 miles in 2022 (Department for Transport, 2023a).

In the model, agents represent a combination of EV drivers and their vehicles. Travel schedules of EV drivers are established using trip attributes derived from the National Travel Survey (NTS) dataset (Department for Transport, 2023b), combined with Ordnance Survey (OS) Code-Point data (Ordnance Survey, 2023a). EV attributes are sourced from the EV Database (EV Database, 2024). The simulation environment integrates road networks from OS Open Roads dataset (Ordnance Survey, 2023b) and public chargers from ChargePoint dataset (ChargePoint, 2023). Charger availability status, scraped from the ChargePoint website, is used for model validation (ChargePoint, 2023). Further details are provided in the Data and Study Area section in the Supplemental Materials.

Methods

Multi-stage training-simulation framework

To address the limitations discussed in background, we propose a scalable multi-stage framework in which, at each training episode, representative agents from each cluster are trained. Their current policies are then used to simulate all agents within their respective clusters. Through continuous validation against empirical data at each episode, the framework identifies the charger usage patterns that most closely reflect real-world conditions. This approach provides a practical solution for modelling large-scale systems. It allows for the inclusion of larger agent populations and application across diverse scenarios while maintaining computational efficiency.

The details of the methodology framework are shown in Figure 1. We first cluster all EV drivers from the NTS dataset using k-means clustering based on their trip distance, initial battery level, trip co-occurrence density (TCD), and charger density (CD). The initial battery level, TCD and CD are not directly provided by the dataset, and the methods of calculating them are detailed in the Data and Study Area section in the Supplemental Materials. Trip purpose (work, leisure) was not directly included in the cluster analysis due to both the limited sample size of leisure trips and the categorical nature of this variable, which could bias the clustering results. Instead, each group is further subdivided by trip purpose following the clustering analysis to account for behavioural differences associated with work and leisure trips.

Figure 1.

Methodology framework.

Following the clustering analysis, representative agents are selected to initialise the training. For each cluster, the two agents closest to the cluster centroid are first identified, as these agents best represent the typical characteristics of their respective clusters. These two agents are then assigned to two separate training sets, ensuring that each training set contains one representative agent per cluster. Separately, simulation agents are independently sampled from the full agent population. 10 simulation agent sets are generated, each containing a different random sample of agents. To ensure the robustness of the model and to minimise data overfitting, the model is run for all combinations of the two training sets and the 10 simulation sets, resulting in 20 model runs in total. This number of runs was chosen to balance statistical robustness with computational resource constraints.

The RL model is specified in the next section. The model is available at https://github.com/fengzixin0617/ReplicatingthebehaviourofEVdriverswithagentbasedRLmodel. The trained policies from each episode guide the behaviour of all agents within their respective groups during the simulation phase, where agents share a common environment in which their charging decisions affect charger availability for others. During the simulation process, agents interact with each other in two ways: (1) All drivers are assumed to have access to the real-time charger status updates via online platforms. When a driver occupies a charger, the status of the charger changes from ‘available’ to ‘in use’ and is updated for all other drivers; and (2) an occupied charger does not necessarily prevent other drivers from heading to it, as limited charger availability in certain areas may lead drivers to queue behind other vehicles. As charging apps and websites do not display queue lengths, drivers must estimate waiting time based on personal experience. After each episode, the charger usage time windows are incorporated into the state space for the next training episode. This iterative feedback loop enables agents to learn from system-wide charging patterns based on past experiences. The model is executed on AWS using 64 vCPUs and 128 GB of RAM. Key parameters, including the learning rate, discount factor, and exploration rate, are calibrated to enhance performance.

At the end of each episode, we validate the model by comparing simulated charger usage patterns against real-world charging session data from ChargePoint dataset, covering both spatial distribution and temporal patterns of charging sessions. This comparison uses spatial and temporal correlation analysis detailed in Supplemental Materials. After all episodes are complete, the episodes with the highest correlation scores are selected to represent the stage where simulated behaviour most closely aligns with observed patterns. Notably, the best-matching episode does not correspond to the final episode where agents have fully converged to optimal behaviour. Instead, it typically falls between the early exploratory stage and the final convergence stage. This intermediate stage captures a realistic mix of driver experience levels across the population. The episodes that best matches real-world patterns therefore represent a population-level blend of different experience stages, instead of a uniformly optimal outcome.

RL model

Many different RL frameworks have been applied to train and simulate EV drivers, spanning both on-policy and off-policy models. These include Policy Gradient (Acquarone et al., 2022), State-Action-Reward-State-Action (SARSA) (Aljohani and Mohammed, 2022), Q-learning (Jiang et al., 2018; Lee and Brown, 2021), and Soft Actor-Critic (Chu et al., 2022). While these methods have been effective for optimisation tasks, not all are suitable for modelling the adaptability of decentralised private EV drivers. In on-policy models such as SARSA, the agent learns by following and updating the same policy. In contrast, off-policy models such as Q-learning enable agents to learn the optimal policy independently of the actions currently being taken (van Seijen et al., 2009).

In this paper, we use off-policy RL – Q-learning to enable human drivers to go beyond the current policy and learn from diverse past experiences. While value-based Q-learning performs well in many scenarios with discrete states and actions, its reliance on tabular approach limits its scalability. Training becomes inefficient when dealing with large state and action spaces, such as those in our study. To address this challenge, we employ the Deep Q-Network (DQN) framework introduced by Mnih et al. (2015). By leveraging deep neural networks to approximate the Q-value function to enhance performance, DQN is a more practical choice for our large-scale EV simulation.

Throughout this process, the agent’s objective is to complete all planned trips while avoiding battery depletion. Specifically, we formulate the charging decision-making process of EV drivers as a RL problem, where agents observe a state, take an action, receive a reward, and undergo a transition based on their chosen action. Each agent is initialised with a State of Charge (SOC) and a daily trip schedule comprising one or more destinations. The agent navigates through road networks using Dijkstra's shortest path algorithm, moving from its current node at time $t$ ( $P_{t}$ ) to the next node at time $t$ +1 ( $P_{t + 1}$ ) while continuously evaluating charging decisions. At any time step $t$ , the agent’s state $S_{t}$ is represented as a tuple of five variables:

S_{t} = ({S O C}_{t}, {D i s t}_{t}, {T i m e}_{t}, {S t a t i o n s}_{t}, {T r i p}_{t})

(1)

where

{S O C}_{t}

represents the

S O C

at time

t

, indicating the vehicle’s battery status.

{D i s t}_{t}

is the shortest distance from the current position to the destination.

{T i m e}_{t}

indicates the time in minutes past midnight on the given day at time step

t

{T r i p}_{t}

is the number of remaining trips in the driver’s daily trip plan.

{S t a t i o n s}_{t}

is the percentage of available stations relative to all reachable stations within the vehicle’s remaining driving range, based on its

{S O C}_{t}

and energy consumption rate at location

P_{t}

at time

t

. At each updated time step (

t + 1

), the agent re-evaluates the reachable stations, and recalculates

{S t a t i o n s}_{t + 1}

based on its updated location (

P_{t + 1}

) and updated remaining SOC (

{S O C}_{t + 1}

At each time step t, the agent selects an action $a_{t, i}$ from the action space $A_{t}$ using a $ϵ$ greedy strategy. The action space $A_{t}$ is defined as follows

A_{t} = {a_{t, 0}} \cap {a_{t, n, k} | n \in {10, 20, 30 \dots 100}, k \in K}

(2)

where:

• $a_{t, 0}$ indicates moving from the current node $P_{t}$ to $P_{t + 1}$ without charging.

• $a_{t, n, k}$ includes rerouting to one of the reachable and available charging stations $k$ $(k \in K)$ and charging the battery by n%

We implement the learning process using a Deep Q-Network (DQN), which maps state-action pairs to Q-values through hidden layers. The DQN takes the state variables $S_{t}$ as input and outputs Q-values for each possible action $a_{t, i}$ . The Q-value, $Q (s, a)$ , represents the agent’s expected cumulative reward from taking action $a$ in state $S$ :

Q (s, a) = r + γ \max_{a^{'}} Q (s^{'}, a^{'})

(3)

r

is the immediate reward for the action.

γ

is the discount factor

\max_{a^{'}} Q (s^{'}, a^{'})

represents the maximum estimated reward achievable from the next state

s^{'}

. The action follows an ε-greedy strategy for action selection to balance exploration and exploitation. With a probability of

ϵ,

the agent explores by randomly selecting an action from the available actions. The exploration rate

ϵ

is initialised at 0.99 to encourage exploration during the early stages of training. As training progresses,

ϵ

gradually decays over time from 0.99 and reaches a minimum value of 0.01

(1 - ϵ)

by the end of training (2000 episodes). The decay ensures that agents transition from predominantly exploratory behaviour to predominantly exploitative behaviour. It ensures the agent learns from diverse experiences while refining its decision-making over time.

After performing the selected action $a_{t, i}$ , the agent receives a reward $R_{t}$ , and transitions to a new state $S_{t + 1}$ . These transitions are stored as experiences in an experience replay buffer (equations (4) and (5)). Random mini-batches of experiences are sampled from the replay buffer $D_{t}$ during the training. This approach helps break temporal correlations and improves learning efficiency by allowing the agent to revisit rare transitions multiple times, thereby enhancing stability in the learning process (Mnih et al., 2015; Schaul et al., 2015).

e_{t} = {S_{t}, a_{t, i}, R_{t}, S_{t + 1}}

(4)

D_{t} = {e_{1}, e_{2}, \dots, e_{t}}

(5)

For each mini-batch, the network computes the predicted Q-values and the target Q-values using the Bellman Equation. The network parameters are then updated by minimising the loss function $L (θ)$ , which is the mean squared error of the target Q-value and predicted Q-value:

L (θ) = E_{s, a, s^{'}, a^{'}} [{(r + γ \max_{a^{'}} Q (s^{'}, a^{'}; θ^{'}) - Q (s, a; θ))}^{2}]

(6)

Another important term to define the agent’s experience is the reward $R_{t} .$ It is the immediate reward received when taking the action $a_{t, i}$ . The reward considers financial costs, charging urgency, trip status, and SOC threshold of drivers (equations (7)–(11)).

D_{S O C} = S {O C}_{t} - {S O C}_{t h r e s h o l d}

(7)

D_{c h a r g e} = I_{c h a r g e, t} \cdot ε \cdot \ln ({t i m {i n g}_{c h a r g e} \cdot c h a n c e}_{c h a r g e} \cdot s t a t u s_{b a t t e r y} + 1)

(8)

I_{c h a r g e, t} = {\begin{cases} 1, i f c h a r g e a t t \\ 0, i f n o t c h a r g e a t t \end{cases}

(9)

D_{c o s t} = {(α \cdot \log (1 + P a y m e n t {\cdot I}_{c h a r g e, t}) + β \cdot \log (1 + T_{t r a v e l} + T_{c h a r g e} {\cdot I}_{c h a r g e, t}) + 1)}^{\frac{n}{m}}

(10)

R_{t} = \frac{γ \cdot D_{S O C}}{D_{c h a r g e} \cdot D_{c o s t}} + ρ \cdot S t a t u s

(11)

where:

• $D_{S O C}$ is the SOC threshold impact. It is the difference between the current SOC level at $t$ and the drivers’ SOC threshold ${S O C}_{t h r e s h o l d}$ . It measures how well the driver is maintaining a sufficient battery level relative to their psychological threshold.

• $D_{c h a r g e}$ is the charging urgency factor. It evaluates the urgency of a charging decision. This variable is only considered if a charge behaviour happens.

• $t i m {i n g}_{c h a r g e}$ is the lengths of time between the trips starts and the charging activity starts, indicating whether charging occurs early or late in the trip.

• ${c h a n c e}_{c h a r g e}$ is the charging opportunity ratio. It is defined as the ratio of CD to TCD, as described in the Data and Study Area section in the Supplementary Materials.

• $s t a t u s_{b a t t e r y}$ is the start battery level of a driver.

• $D_{c o s t}$ is the cost impact. It consists of the financial payment for charging, the time spent for rerouting to charging station, and the actual charging duration.

• $n$ is the number of times the agent charges. It introduces an exponential penalty when the total charge time increases.

• $m$ is a categorical variable of the trip distance class. Compared with longer-distance trips, shorter-distance trips will incur a greater penalty for repetitive charging.

• $S t a t u s$ is a categorical variable with values 0, 1, or −1. It represents the driver’s trip completion status. A value of −1 results in a penalty and indicates that the driver fails to complete the trip. A value of 0 means the driver continues the trip without additional rewards or penalties. A value of 1 means the driver earns an additional reward by successfully reaching a destination.

• $ρ$ is the weight coefficient for the trip completion status. It determines the magnitude of the reward associated with trip completion outcomes.

Results

Cluster analysis, reward change, model calibration, and model validation

K-means clustering was applied to all agents based on four continuous features: total trip distance, initial battery level, trip co-occurrence density, and public charger density. Cluster analysis results and the determination of the optimal number of clusters are provided in the Supplemental Materials. The cumulative reward change for each agent type across all episodes is provided in Figure S9 in Supplemental Materials.

Prior to model validation, three key parameters were calibrated: the learning rate, discount factor, and exploration rate. Details of the calibration process and results are included in the Model Calibration section in the Supplemental Materials.

The model was validated using 21 days of real-world charging session data through correlation analysis comparing simulated charger usage patterns with observed data, both spatially and temporally. Details of the model validation results are included in the Supplemental Materials. To further illustrate the behavioural validity of the results, we analysed the action selection probabilities of agents throughout the training process and evaluated if the emergent behaviours of different types of agents align with established and intuitive understanding of EV driver behaviour. Details are provided in the Action Selection Probability of Agents section in Supplemental Materials.

Charging demand distribution

After model validation, the SOC distribution along road networks was analysed using data from the 703rd episode, which achieved one of the highest spatial and temporal correlations with real-world data. To represent the average battery status at a location, mean SOC values were calculated at each network node based on all passing EV drivers.

Due to the dense distribution of network nodes, the H3 hexagonal spatial indexing system was used to aggregate and visualise SOC patterns. H3 is an open-source geospatial grid system that divides the Earth’s surface into hexagonal cells at varying resolutions (H3, 2024). Lower H3 resolution levels correspond to larger hexagons and broader spatial aggregation, while higher levels offer finer spatial detail. Mean SOC values were calculated for each hexagonal cell and visualised at resolutions 7 and 6 (Figure 2(a) and (b)), with average cell areas of approximately 5.2 km² and 36.1 km², respectively.

Figure 2.

Consecutive SOC change distribution along road networks at the 703rd episode.

To identify areas where EV drivers face a higher risk of battery depletion, the resolution 6 SOC map was classified into five categories using natural breaks. Areas where SOC falls into the lowest category (0 – 0.22) are shown in Figure 2(c). However, these patterns alone do not definitively indicate charging risk, as they may reflect drivers’ willingness to operate at lower battery levels rather than insufficient charging infrastructure. To accurately identify true risk areas, it is necessary to consider both SOC patterns and public charger availability together.

To this end, buffer zones of 500 m, 1000 m, and 1500 m were created around each node and the number of chargers within each buffer was calculated. Applying k-means clustering to the charger counts and average SOC values, three distinct clusters were identified for each buffer distance (Figure 3(a)–(c)). The selection of the optimal cluster number is detailed in the Supplementary Materials. Across all buffer distances, Cluster 0 consistently represented locations with both low average SOC values and limited nearby charging infrastructure, which indicates higher risks of battery depletion for EV drivers. The spatial distribution of these high-risk locations across buffer sizes is shown in Figure 3(d)–(f).

Figure 3.

Areas with high-risk of battery depletion.

The 500 m analysis identified widely distributed high-risk areas, but likely overestimates infrastructure needs, as drivers typically accept detours of more than 500 m for charging. The 1000 m and 1500 m analyses revealed more realistic risk patterns, both highlighting areas such as Outer London, the eastern M4 corridor, Greater Manchester, and south Newcastle. The key difference between the two buffer distances lies in Greater London. The 1500 m result primarily highlights high-risk areas along London’s boundaries, while the 1000 m result identified more high-risk areas toward Inner London. Notably, fewer areas along the M4 and M1 corridors were classified as high-risk compared to the SOC-only analysis shown in Figure 2(c). This suggests that, despite frequent low SOC observations, these corridors have relatively sufficient public chargers which reduce the actual risk of battery depletion.

Discussion

This paper presents a multi-stage agent-based RL model to train and simulate EV drivers’ charging and driving behaviours in GB. Unlike previous studies that apply RL to optimise EV fleet operation strategies, we employ RL to capture the adaptive learning process of private EV drivers. By validating each episode against real-world charging data, we identify the simulation stage that best reflects reality, which also represents a balance between adaptive behaviour and bounded rationality. The validity of the model is further supported by the emergent behaviours of trained agents, which align with established and intuitive understanding of EV driver behaviour.

The model generates a high-resolution aggregated SOC distribution along road networks, and identifies locations with higher risks of battery depletion. Many areas with low average SOC are situated along inter-city motorways, likely due to the extensive prevalence of long-distance travel in these corridors. By exploring charging risks across multiple search distances, we identified key ‘charging desert’ in GB, which faces both low SOC levels and limited charger provision. Battery depletion in these areas poses greater risks for long-distance travellers. Accordingly, while initial charger deployment focused on destination and residential charging (Department of Transport, 2022a, 2022b), our findings support recent policy shifts toward developing rapid charging hubs along motorway corridors and urban peripheries in these areas (Department of Transport, 2022a, 2022b; Mohammed et al., 2024).

The simulated charging demand distribution can also serve as a key input for infrastructure planning models. As noted by Kchaou-Boujelben (2021), one of the major challenges in public charger location models lies in accurately estimating the spatial and temporal distribution of charging demand. For example, Yi et al. (2023) used rule-based ABMs to generate high-resolution driving profiles and predict where charging requests are likely to arise before optimising charger placement. The agent-based RL model proposed in this study extends this direction of research by integrating RL-trained agents with adaptive behaviours and providing a more behaviourally realistic representation of charging demand distribution. These estimates can subsequently be integrated into public charger optimisation frameworks, such as those by Anjos et al. (2020) and Ljósheim et al. (2026), to support more robust infrastructure planning decisions.

This study has several limitations that warrant further research. First, although the RL framework offers an avenue for modelling agent behaviours in unseen scenarios, questions remain about the trade-offs between predictability and model tractability: while drivers’ final decisions in each episode can be observed, their exact reasons behind each choice cannot be specified. Second, due to computational constraints, this study simulates 1000 agents simultaneously. Future research could use high performance computing infrastructure to simulate larger driver populations. Third, the NTS data covers England residents but lacks trips by Scotland and Wales residents, which future work could address by incorporating additional regional data. The NTS dataset also does not provide information on EV ownership, and we assume that drivers maintain their existing travel patterns upon switching to EVs. Although travel survey data combined with appropriately specified EV attributes has been shown to effectively support EV driver behaviour modelling (Pareschi et al., 2020), future research could integrate EV adoption inference models that account for socio-demographic factors, residential property types, and attitudes towards EVs to generate more representative synthetic populations. Fourth, initial SOC levels are generated using a truncated normal distribution, which does not explicitly account for access to home and workplace charging. Such access can directly influence initial SOC, yet their ownership information is absent from the NTS dataset, which also lacks residential property types or workplace locations that could be used to infer charger access. Future research could address this by integrating home and workplace charger availability data, once available, to generate more realistic initial SOC distributions.

Moreover, while the current framework captures heterogeneity in learning progress across agent types, future research could more explicitly model the mix of experienced and novice drivers in the EV driver population. This could be achieved by introducing rule-based agents with simple decision heuristics representing novice drivers alongside RL-trained agents representing experienced drivers. Such an approach can create a more explicitly heterogeneous agent population and allow for investigation of how the proportion of novice versus experienced drivers affects aggregated charging patterns. Additionally, while this study focuses on charging demand from private EV drivers, future studies could incorporate other user groups such as EV fleets to provide a more comprehensive representation of public charger usage. The model also assumes that drivers follow the shortest path as determined by Dijkstra’s algorithm. While this ensures computational efficiency, real-world drivers may deviate from optimal routes due to personal preferences, habitual behaviour, or incomplete information. Future research should incorporate bounded rationality in route choices through approaches such as stochastic shortest path formulations to better reflect real-world driving behaviour. Finally, the charger location data may not comprehensively cover all public chargers across GB, which may affect the accuracy of identifying ‘charging deserts’ and the overall representation of charging accessibility. A more complete, nationwide dataset on public charger data would enable more detailed and reliable identification of charging deserts.

Conclusion

To address the limitations of both data-driven and simulation-based approaches, we developed an agent-based RL model using DQN to capture the charging behaviours of heterogeneous EV drivers in GB. The modelling framework captures both the adaptive charging behaviours of EV drivers and the emergent patterns arising from interactions at charging stations. Rather than seeking optimal charging strategies, which is atypical of private EV drivers, we validated the model against real-world charging session data and identified episodes that closely align with observed charging behaviours. The validity of the model is further supported by the emergent behaviours of trained agents, which align with established expectations of EV driver behaviour. Our results also provide a validated modelling framework for estimating the spatial distribution of charging demand and identifying areas at high risk of battery depletion, and highlight the importance of establishing more rapid charging hubs along motorway corridors in the next phase of public charger deployment.

Supplemental material

Supplemental Material - Replicating the behaviour of electric vehicle drivers using an agent-based reinforcement learning model

Supplemental Material for Replicating the behaviour of electric vehicle drivers using an agent-based reinforcement learning model by Zixin Feng, Qunshan Zhao and Alison Heppenstall in Environment and Planning B: Urban Analytics and City Science.

Footnotes

Acknowledgements

The authors would like to thank the anonymous reviewers for their insightful comments and suggestions on an earlier version of this manuscript.

ORCID iDs

Zixin Feng

Qunshan Zhao

Alison Heppenstall

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The work was made possible by ESRC’s on-going support for the Urban Big Data Centre (ES/L011921/1 and ES/S007105/1). Qunshan Zhao has also received and the Royal Society International Exchange Scheme (IEC/NSFC/223042). Alison Heppenstall is funded by the SIPHER Consortium (MR/S037578/2), the Systems Science Research in Public Health (MC_UU_00022/5), and the ExAMPLER (EP/Y008839/1).

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data Availability Statement

The ChargePoint charging session data were scraped from ChargePoint’s charging station map website (https://driver.chargepoint.com/). While using web scrapping to gather the availability status and other details of public chargers, we ensured that our activities were limited to publicly accessible information and conducted solely for non-commercial academic research purposes. Our scraping process strictly adhered to the web scraping policy by the Office for National Statistics and complied with UK copyright law, which permits lawful access to material for non-commercial academic research. The data is available at https://researchdata.gla.ac.uk/1990/. The Ordnance Survey Open Road data is publicly available at https://www.ordnancesurvey.co.uk/products/os-open-roads. The Ordnance Survey Code-Point Open data is publicly available at https://www.ordnancesurvey.co.uk/products/code-point. The UK EV datasets is publicly available at https://ev-database.org/uk. The 2022 UK National Travel Survey (NTS) data is a special license data and can be accessed upon application through .

Supplemental material

Supplemental material for this article is available online.

Author biographies

Zixin Feng is a PhD student in Urban Big Data Centre in the School of Social and Political Sciences, University of Glasgow.

Qunshan Zhao is a Professor in Urban Analytics in Urban Big Data Centre in the School of Social and Political Sciences, University of Glasgow.

Alison Heppenstall is a Professor of Geocomputation in Urban Big Data Centre in the School of Social and Political Sciences, University of Glasgow.

References

Acquarone

Borneo

Misul

(2022) Acceleration control strategy for battery electric vehicle based on deep reinforcement learning in V2V driving. In: 2022 IEEE transportation electrification conference & expo (ITEC), Anaheim, CA, 15 June 2022, pp. 202–207. IEEE.

Adenaw

Lienkamp

(2021) Multi-Criteria, Co-Evolutionary Charging Behavior: An Agent-Based Simulation of Urban Electromobility. World Electric Vehicle Journal 12(1): 18. Available at: https://doi.org/10.3390/wevj12010018

Ale Ebrahim Dehkordi

Lechner

Ghorbani

, et al. (2023) Using machine learning for agent specifications in agent-based models and simulations: a critical review and guidelines. The Journal of Artificial Societies and Social Simulation 26(1): 1–9.

Aljohani

Mohammed

(2022) A real-time energy consumption minimization framework for electric vehicles routing optimization based on SARSA reinforcement learning. Vehicles 4(4): 1176–1194.

Anjos

Gendron

Joyce-Moniz

(2020) Increasing electric vehicle adoption through the optimal deployment of fast-charging stations for local and long-distance travel. European Journal of Operational Research 285(1): 263–278.

Brearcliffe

Crooks

(2021) Creating intelligent agents: combining agent-based modeling with machine learning. In: Proceedings of the 2020 conference of the computational social science society of the Americas. Springer, pp. 31–58.

ChargePoint (2023) ChargePoint charging map. https://driver.chargepoint.com/.

Chu

Wei

Fang

, et al. (2022) A multiagent federated reinforcement learning approach for Plug-In electric vehicle fleet charging coordination in a residential community. IEEE Access 10: 98535–98548.

Crozier

Morstyn

McCulloch

(2021) Capturing diversity in electric vehicle charging behaviour for network capacity estimation. Transportation Research Part D: Transport and Environment 93: 102762.

10.

Dahlke

Bogner

Mueller

, et al. (2020) Is the juice worth the squeeze? Machine learning (Ml) in and for agent-based modelling (abm). ArXiv Preprint arXiv:2003.11985. Epub ahead of print 2020.

11.

Dastpak

Errico

Jabali

, et al. (2024) Dynamic routing for the electric vehicle shortest path problem with charging station occupancy information. Transportation Research Part C: Emerging Technologies 158: 104411.

12.

DeAngelis

Diaz

(2019) Decision-making in agent-based modeling: a current review and future prospectus. Frontiers in Ecology and Evolution 6: 1–16.

13.

Department for Transport (2022a) Public Electric Vehicle Charging Infrastructure. Department for Transport. https://assets.publishing.service.gov.uk/media/6234a51f8fa8f540edba36f1/public-ev-charging-infrastructure-research-report.pdf

14.

Department for Transport (2023a) National Travel Survey 2022: Introduction and Main Findings. Department for Transport. https://www.gov.uk/government/statistics/national-travel-survey-2022/national-travel-survey-2022-introduction-and-main-findings

15.

Department for Transport (2023b) National Travel Survey Data. [Data Series]. UK Data Service. SN: 2000037. 8th Release. Department for Transport. https://doi.org/10.5255/UKDA-Series-2000037

16.

Department for Transport (2024) Electric Vehicle Public Charging Infrastructure Statistics: April 2024. Department for Transport. https://www.gov.uk/government/statistics/electric-vehicle-public-charging-infrastructure-statistics-april-2024

17.

Department of Transport (2022b) Electric Vehicle Charging Research: Survey with Electric Vehicle Drivers. Department for Transport. https://assets.publishing.service.gov.uk/media/628f5603d3bf7f037097bd73/dft-ev-driver-survey-summary-report.pdf

18.

Dong

Wei

, et al. (2019) Electric vehicle charging point placement optimisation by exploiting spatial statistics and maximal coverage location models. Transportation Research Part D: Transport and Environment 67: 77–88.

19.

EV Database (2024) EV database. https://ev-database.org/uk.

20.

Grimm

Railsback

(2005) Individual-Based Modeling and Ecology. Princeton University Press.

21.

H3 (2024) H3. Epub ahead of print 2024. https://h3geo.org/docs

22.

Hardinghaus

Seidel

Anderson

(2019) Estimating public charging demand of electric vehicles. Sustainability 11(21): 5925.

23.

Hasan

Simsekoglu

(2020) The role of psychological factors on vehicle kilometer travelled (VKT) for battery electric vehicle (BEV) users. Research in Transportation Economics 82: 100880.

24.

Huang

Liu

, et al. (2022) Data driven optimization for electric vehicle charging station locating and sizing with charging satisfaction consideration in urban areas. IET Renewable Power Generation 16(12): 2630–2643.

25.

International Energy Agency (2024) Global EV Outlook 2024. International Energy Agency. https://www.iea.org/reports/global-ev-outlook-2024

26.

Jebessa

Olana

Getachew

, et al. (2022) Analysis of reinforcement learning in autonomous vehicles. In: 2022 IEEE 12th annual computing and communication workshop and conference (CCWC), Las Vegas, NV, 26 January 2022, pp. 0087–0091. IEEE.

27.

Jiang

Jing

, et al. (2018) Optimal location of PEVCSs using MAS and ER approach. IET Generation, Transmission and Distribution 12(20): 4377–4387.

28.

Jin

(2022) Shortest-path-based deep reinforcement learning for EV charging routing under stochastic traffic condition and electricity prices. IEEE Internet of Things Journal 9(22): 22571–22581.

29.

Kchaou-Boujelben

(2021) Charging station location problem: a comprehensive review on models and solution approaches. Transportation Research Part C: Emerging Technologies 132: 103376.

30.

Koh

Zhou

Fang

, et al. (2020) Real-time deep reinforcement learning based vehicle navigation. Applied Soft Computing 96: 106694.

31.

Kontou

Liu

Xie

, et al. (2019) Understanding the linkage between electric vehicle charging network coverage and charging opportunity using GPS travel data. Transportation Research Part C: Emerging Technologies 98: 1–13.

32.

LaMonaca

Ryan

(2022) The state of play in electric vehicle charging services – a review of infrastructure provision, players, and policies. Renewable and Sustainable Energy Reviews 154: 111733.

33.

Lee

Brown

(2021) Social & locational impacts on electric vehicle ownership and charging profiles. Energy Reports 7: 42–48.

34.

Liao

Tozluoğlu

Sprei

, et al. (2023) Impacts of charging behavior on BEV charging infrastructure needs and energy use. Transportation Research Part D: Transport and Environment 116: 103645.

35.

Lin

Ghaddar

Nathwani

(2022) Deep reinforcement learning for the electric vehicle routing problem with time windows. IEEE Transactions on Intelligent Transportation Systems 23(8): 11528–11538.

36.

Liu

Tayarani

Gao

(2022) An activity-based travel and charging behavior model for simulating battery electric vehicle charging demand. Energy 258: 124938.

37.

Ljósheim

Jenkins

Searle

, et al. (2026) Optimal placement of electric vehicle slow-charging stations: a continuous facility location problem under uncertainty. Computers & Operations Research 185: 107289.

38.

Macal

(2016) Everything you need to know about agent-based modelling and simulation. Journal of Simulation 10(2): 144–156.

39.

Mahmud

Medha

Hasanuzzaman

(2023) Global challenges of electric vehicle charging systems and its future prospects: a review. Research in Transportation Business & Management 49: 101011.

40.

Malleson

Birkin

Birks

, et al. (2022) Agent-based modelling for urban analytics: state of the art and challenges. AI Communications 35(4): 393–406.

41.

Metais

Jouini

Perez

, et al. (2022) Too much or not enough? Planning electric vehicle charging infrastructure: a review of modeling options. Renewable and Sustainable Energy Reviews 153: 111719.

42.

Mnih

Kavukcuoglu

Silver

, et al. (2015) Human-level control through deep reinforcement learning. Nature 518(7540): 529–533.

43.

Mohammed

Saif

Abo-Adma

, et al. (2024) Strategies and sustainability in fast charging station deployment for electric vehicles. Scientific Reports 14(1): 283.

44.

Nunes

Woodley

Rossetti

(2022) Re-thinking procurement incentives for electric vehicles to achieve net-zero emissions. Nature Sustainability 5(6): 527–532.

45.

Ordnance Survey (2023a) OS code-point. https://www.ordnancesurvey.co.uk/products/code-point.

46.

Ordnance Survey (2023b) OS open roads. https://www.ordnancesurvey.co.uk/products/os-open-roads.

47.

Pagani

Korosec

Chokani

, et al. (2019) User behaviour and electric vehicle charging infrastructure: an agent-based model assessment. Applied Energy 254: 113680.

48.

Pappalardo

Manley

Sekara

, et al. (2023) Future directions in human mobility science. Nature Computational Science 3(7): 588–600.

49.

Pareschi

Küng

Georges

, et al. (2020) Are travel surveys a good basis for EV models? Validation of simulated charging profiles against empirical data. Applied Energy 275: 115318.

50.

Park

J-H

Joe

I-W

(2024) Federated learning-based prediction of energy consumption from blockchain-based black box data for electric vehicles. Applied Sciences 14(13): 5494.

51.

Peng

Wang

MWH

Yang

, et al. (2024) An analytical framework for assessing equitable access to public electric vehicle chargers. Transportation Research Part D: Transport and Environment 126: 103990.

52.

Rand

(2006) Machine learning meets agent-based modeling: when not to go to a bar. In: Conference on social agents: results and prospects, Chicago, IL, 21–23 September 2006, pp. 51–58. Citeseer.

53.

Schaul

Quan

Antonoglou

, et al. (2015) Prioritized experience replay. Epub ahead of print 18 November 2015.

54.

Sultanuddin

Vibin

Rajesh Kumar

, et al. (2023) Development of improved reinforcement learning smart charging strategy for electric vehicle fleet. Journal of Energy Storage 64: 106987.

55.

Sun

Lorscheid

Millington

, et al. (2016) Simple or complicated agent-based models? A complicated issue. Environmental Modelling & Software 86: 56–67.

56.

Sutton

Barto

(1988) Reinforcement Learning: An Introduction. MIT Press.

57.

van der Kam

Peters

van Sark

, et al. (2019) Agent-based modelling of charging behaviour of electric vehicle drivers. The Journal of Artificial Societies and Social Simulation 22(4): 1–17.

58.

van Seijen

van Hasselt

Whiteson

, et al. (2009) A theoretical and empirical analysis of expected sarsa. In: 2009 IEEE symposium on adaptive dynamic programming and reinforcement learning, Nashville, TN, 30 March 2009–2 April 2009, pp. 177–184. IEEE.

59.

Yan

Chen

Zhou

, et al. (2021) Deep reinforcement learning for continuous electric vehicles charging control with dynamic user behaviors. IEEE Transactions on Smart Grid 12(6): 5124–5134.

60.

Chen

Liu

, et al. (2023) An agent-based modeling approach for public charging demand estimation and charging station location optimization at urban scale. Computers, Environment and Urban Systems 101: 101949.

61.

Wang

Zhang

, et al. (2023) A three-step heuristic approach to the electric vehicle path planning problem considering charging. Journal of Advanced Transportation 2023: 1–16.

62.

Zhang

Valencia

Chang

N-B

(2023) Synergistic integration between machine learning and agent-based modeling: a multidisciplinary review. IEEE Transactions on Neural Networks and Learning Systems 34(5): 2170–2190.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

2.79 MB

0.00 MB