Abstract
Despite the rapid expansion of electric vehicle (EV) charging networks, questions remain about their efficiency in meeting the growing needs of EV drivers. Previous rule-based ABMs have struggled to capture the adaptive behaviours of human drivers. Although reinforcement learning has been applied in EV simulation studies, its application has primarily focused on optimising fleet operations rather than modelling private drivers who make independent charging decisions. To address this gap, we propose a multi-stage reinforcement learning framework that simulates the charging demand of private EV drivers across a national-scale road network. We validate the model against real-world data and identify the training stage that most closely reflects actual driver behaviour. Based on the simulation results, we identify critical ‘charging deserts’ where EV drivers face high risks of battery depletion. Our findings further highlight recent policy shifts toward expanding rapid charging hubs along motorway corridors and urban boundaries to meet growing demand from long-distance trips.
Introduction
With global net-zero carbon transition targets, rapid technological advancements, and declining battery costs, many countries have witnessed a significant rise in EV adoption (Mahmud et al., 2023). In the early stages of EV adoption, households adopting EVs typically own garages or off-street parking suitable for installing home chargers (LaMonaca and Ryan, 2022). However, as EV ownership continues to grow, the market has shifted from early adopters to the mass market (IEA, 2024). This transition has led to increased demand for public chargers, which is particularly high among drivers without access to private chargers.
Recent research has questioned whether the emission reduction benefits from EVs can be sustained due to variability in consumer behaviour (Nunes et al., 2022). Concerns have also been raised about whether the charging network expansion is misallocating resources and failing to adapt to user needs (Metais et al., 2022). In the UK, geographical disparities in public chargers have created significant barriers to EV adoption. Some areas have lagged behind others in the deployment process (Peng et al., 2024). A survey by the Department for Transport (2022a, 2022b) indicates that existing EV drivers aim to adjust their behaviours to integrate charging into their trip schedules and daily habits. Potential EV adopters also prioritise proximity, reliability and dependability of chargers as key factors in their decision to switch to EVs. It is therefore important to understand the complex behaviours of diverse EV drivers and accurately estimate charging demand to support a more effective and sustainable charging network.
Previous studies have explored EV driver behaviours and estimated their charging demand using either data-driven or simulation-based approaches. Many studies are conducted within grid cells (van der Kam et al., 2019) or intra-city scale networks (Pagani et al., 2019; Yi et al., 2023), while other studies also extend to large geographical contexts over regional or country levels where long-distance journeys bring critical challenges to EV drivers (Anjos et al., 2020; Liao et al., 2023). However, despite their contributions, both approaches have limitations. Data-driven methods often face limited access to EV-specific datasets due to privacy concerns (Park and Joe, 2024). As a result, researchers often rely on alternative datasets, such as socio-demographic data (Crozier et al., 2021), GPS data from conventional vehicles (Kontou et al., 2019) or commercial EV fleets (Hu et al., 2022). While these datasets offer high granularity or rich attributes, they may introduce biased representations of private EV drivers. Meanwhile, simulation-based approaches, such as Agent-based Models (ABMs), have proven effective in capturing the behavioural features of EV drivers (Yi et al., 2023). However, these models often rely on static behavioural rules, which fail to reflect the adaptive nature of human drivers in response to the changing environment. For example, as EV drivers gain experience, they tend to charge less frequently and become more comfortable operating with lower battery levels.
The objective of this research is to develop and validate an agent-based reinforcement learning (RL) model using a Deep-Q Network (DQN) to (1) simulate the adaptive charging behaviours of heterogeneous EV drivers in Great Britain (GB); and (2) estimate the spatial distribution of public charging demand and identify locations with high risks of battery depletion. Our approach first categorises drivers based on travel distance, battery status, trip purpose, and environmental factors. We then implement a novel multi-episode training-simulation framework where representative agents from each cluster learn charging behaviours through RL. These learnt behaviours are then used to simulate the broader population of drivers within their respective groups, creating an updated charging environment that feeds back into the next training episode. This iterative process captures both the adaptive charging behaviours of EV drivers and the evolving interactions among drivers at charging stations. Unlike previous studies that use RL with rational agents to derive optimal routing and charging strategies (Dastpak et al., 2024; Jin and Xu, 2022; Yu et al., 2023), we use RL to replicate the adaptive learning process of real-world private drivers operating under bounded rationality when making charging decisions. Rather than identifying the training episode that achieves optimal performance, we validate against real-world charging session data to identify the episode that most closely reflects observed behaviour. The model provides a validated modelling framework to estimate charging demand distribution and inform future charging infrastructure planning.
Background
To date, extensive studies have explored charging demand estimation and EV driver behaviour through data-driven or simulation-based approaches. Data-driven studies often rely on socio-demographic data, such as population density, average travel distance, and travel flow volumes, to estimate charging demand distribution (Dong et al., 2019; Hardinghaus et al., 2019). More recent research has also used vehicle trajectories to infer EV charging demand, including data from conventional vehicles (Kontou et al., 2019) and EV fleets such as taxis (Hu et al., 2022). Despite the high granularity of these trajectory datasets, they may introduce bias when representing charging demand for private EV drivers. Meanwhile, limited data availability for private EV drivers remains a challenge, partly due to the early adoption stage of EVs and privacy concerns about connected vehicle data. Some studies use driver surveys to collect data on private EV drivers (Hasan and Simsekoglu, 2020). While such surveys provide valuable insights into driver behaviours, their high costs often restrict sample sizes and limit generalisability to broader spatial contexts.
Given the lack of data for private EV drivers, simulation-based approaches, such as ABMs, have been developed to simulate EV charging demand with predefined behavioural rules. These models consider multiple aspects of driver behaviours, including psychological features, financial preferences, and vehicle attributes (Adenaw and Lienkamp, 2021; Liu et al., 2022). However, despite these detailed behavioural rules, questions remain about their ability to represent EV drivers’ diversity and adaptability. Most ABMs simulating EV driver behaviour rely on static behavioural rules, such as initiating charging when SOC falls below a fixed threshold. Even when psychological factors such as range anxiety are incorporated, they are typically modelled as static attributes that vary across individuals but remain unchanged within individuals over time (Lin et al., 2022; van der Kam et al., 2019).
One critique of ABMs for simulating human behaviour concerns their limited capacity to represent adaptive human behaviours. Rule-based ABMs commonly follow an ‘if-then’ structure (DeAngelis and Diaz, 2019), in which deterministic rules assign a single behavioural response to a given situation (Grimm and Railsback, 2005). Agents are inherently limited in their ability to learn and remember previous decisions (Brearcliffe and Crooks, 2021), which can limit the explanatory power of simulations (Macal, 2016). In addition, ABMs can face substantial computational challenges when the number and details of agents, interactions, decision rules, scheduling, and environment increase (Sun et al., 2016). Incorporating heterogeneous adaptive and cognitive processes intensifies these challenges and therefore must be addressed through appropriate methodological choices.
Machine learning (ML) techniques provide great potential to bring higher levels of agent intelligence into ABMs. The integration of ML with ABMs can take various forms depending on the modelling objective and the stage of the modelling process. Although the first use of ML in ABMs is difficult to pinpoint, an early and notable contribution is Rand’s (2006) systematic framework for incorporating ML into ABMs to inform agent decision-making. Later studies have extended this framework to encompass the broader modelling workflow Zhang et al. (2023) categorise ML applications across data preprocessing, agent behaviour specification, and model output analysis. Ale Ebrahim Dehkordi et al. (2023) further classify ML-ABM integration based on three dimensions: the ML technique employed, the ABM challenge addressed, and the modelling purpose.
This study focuses specifically on ML’s contribution to addressing the behavioural modelling challenge of ABMs – namely, the development of more realistic and adaptive representations of agent behaviour by integrating RL algorithms. RL is guided by rewards and penalties and enables agents to learn through interactions with other agents and the environment (Sutton and Barto, 1988). Through repeated exposure to the environment and interactions with other agents, RL-trained agents navigate spaces as if they were human and gradually learn behavioural strategies that maximise cumulative reward (Malleson et al., 2022). Additionally, RL does not require a dataset of correct answers (Dahlke et al., 2020), making it especially valuable for studying private EV drivers whose trajectory data is limited.
In transportation research, RL has been widely applied to optimise vehicle routing and charging decisions (Jebessa et al., 2022; Koh et al., 2020). One notable feature of these studies is their emphasis on optimising routing and charging strategies for EVs (Yan et al., 2021). One group of studies assumes agents share information with each other in a timely manner and uses RL to optimise the coordination of EV fleets (Lin et al., 2022; Sultanuddin et al., 2023). Another group focuses on individual drivers, where RL is used to identify optimal routing and charging strategies to minimise travel time, energy consumption, or charging costs (Dastpak et al., 2024; Jin and Xu, 2022; Yu et al., 2023). However, the assumptions of complete information exchange and perfect rationality do not apply to private EV drivers, who act independently and have limited, random communication with one another. Recent research on modelling cognitive decision-makers has increasingly emphasised the importance of incorporating bounded rationality into human decision-making models (Pappalardo et al., 2023). Despite RL’s ability to reflect adaptive learning, how to integrate bounded rationality into RL frameworks to better represent private EV drivers remains underexplored.
A further limitation of previous studies modelling RL-trained EV drivers is the challenge of scalability. Simulating EV driver behaviours across large geographical areas is critical, as these are where long-distance journeys bring critical challenges to EV drivers. Prior studies have used ABMs to simulate EV driver behaviours in large geographical scales (Anjos et al., 2020; Liao et al., 2023). However, incorporating RL-trained agents into large-scale environments can introduce further computational challenges due to the expanded road and charging networks, which increases the number of charging choices available to agents. While some efforts have been made to improve cooperative multi-agent RL, simulating large populations of RL-trained agents operating independently across geographically large-scale scenarios needs further exploration.
In summary, despite significant progress in estimating charging demand through both data-driven and simulation-based approaches, several critical gaps remain. Data-driven methods face challenges related to biased representations and limited data availability for private EV drivers. Simulation-based models often rely on static behavioural rules and fail to capture the adaptive nature of human decision-making. Integrating RL into ABMs offers advantages in incorporating an adaptive learning process. However, previous studies have mainly applied RL for EV fleet optimisation. It therefore remains underexplored how RL-trained ABM can be used to simulate private EV drivers with bounded rationality and adaptive behaviours, especially at large geographical scales where computational burdens increase.
Data and study area
We take GB as our study area, comprising England, Scotland, and Wales, with a focus on the trips made by England residents travelling across GB. As of April 2024, there are 59,125 publicly available chargers across the region (Department for Transport, 2024). England has a high volume of private vehicle trips, with residents travelling an average of 5373 miles in 2022 (Department for Transport, 2023a).
In the model, agents represent a combination of EV drivers and their vehicles. Travel schedules of EV drivers are established using trip attributes derived from the National Travel Survey (NTS) dataset (Department for Transport, 2023b), combined with Ordnance Survey (OS) Code-Point data (Ordnance Survey, 2023a). EV attributes are sourced from the EV Database (EV Database, 2024). The simulation environment integrates road networks from OS Open Roads dataset (Ordnance Survey, 2023b) and public chargers from ChargePoint dataset (ChargePoint, 2023). Charger availability status, scraped from the ChargePoint website, is used for model validation (ChargePoint, 2023). Further details are provided in the Data and Study Area section in the Supplemental Materials.
Methods
Multi-stage training-simulation framework
To address the limitations discussed in background, we propose a scalable multi-stage framework in which, at each training episode, representative agents from each cluster are trained. Their current policies are then used to simulate all agents within their respective clusters. Through continuous validation against empirical data at each episode, the framework identifies the charger usage patterns that most closely reflect real-world conditions. This approach provides a practical solution for modelling large-scale systems. It allows for the inclusion of larger agent populations and application across diverse scenarios while maintaining computational efficiency.
The details of the methodology framework are shown in Figure 1. We first cluster all EV drivers from the NTS dataset using k-means clustering based on their trip distance, initial battery level, trip co-occurrence density (TCD), and charger density (CD). The initial battery level, TCD and CD are not directly provided by the dataset, and the methods of calculating them are detailed in the Data and Study Area section in the Supplemental Materials. Trip purpose (work, leisure) was not directly included in the cluster analysis due to both the limited sample size of leisure trips and the categorical nature of this variable, which could bias the clustering results. Instead, each group is further subdivided by trip purpose following the clustering analysis to account for behavioural differences associated with work and leisure trips. Methodology framework.
Following the clustering analysis, representative agents are selected to initialise the training. For each cluster, the two agents closest to the cluster centroid are first identified, as these agents best represent the typical characteristics of their respective clusters. These two agents are then assigned to two separate training sets, ensuring that each training set contains one representative agent per cluster. Separately, simulation agents are independently sampled from the full agent population. 10 simulation agent sets are generated, each containing a different random sample of agents. To ensure the robustness of the model and to minimise data overfitting, the model is run for all combinations of the two training sets and the 10 simulation sets, resulting in 20 model runs in total. This number of runs was chosen to balance statistical robustness with computational resource constraints.
The RL model is specified in the next section. The model is available at https://github.com/fengzixin0617/ReplicatingthebehaviourofEVdriverswithagentbasedRLmodel. The trained policies from each episode guide the behaviour of all agents within their respective groups during the simulation phase, where agents share a common environment in which their charging decisions affect charger availability for others. During the simulation process, agents interact with each other in two ways: (1) All drivers are assumed to have access to the real-time charger status updates via online platforms. When a driver occupies a charger, the status of the charger changes from ‘available’ to ‘in use’ and is updated for all other drivers; and (2) an occupied charger does not necessarily prevent other drivers from heading to it, as limited charger availability in certain areas may lead drivers to queue behind other vehicles. As charging apps and websites do not display queue lengths, drivers must estimate waiting time based on personal experience. After each episode, the charger usage time windows are incorporated into the state space for the next training episode. This iterative feedback loop enables agents to learn from system-wide charging patterns based on past experiences. The model is executed on AWS using 64 vCPUs and 128 GB of RAM. Key parameters, including the learning rate, discount factor, and exploration rate, are calibrated to enhance performance.
At the end of each episode, we validate the model by comparing simulated charger usage patterns against real-world charging session data from ChargePoint dataset, covering both spatial distribution and temporal patterns of charging sessions. This comparison uses spatial and temporal correlation analysis detailed in Supplemental Materials. After all episodes are complete, the episodes with the highest correlation scores are selected to represent the stage where simulated behaviour most closely aligns with observed patterns. Notably, the best-matching episode does not correspond to the final episode where agents have fully converged to optimal behaviour. Instead, it typically falls between the early exploratory stage and the final convergence stage. This intermediate stage captures a realistic mix of driver experience levels across the population. The episodes that best matches real-world patterns therefore represent a population-level blend of different experience stages, instead of a uniformly optimal outcome.
RL model
Many different RL frameworks have been applied to train and simulate EV drivers, spanning both on-policy and off-policy models. These include Policy Gradient (Acquarone et al., 2022), State-Action-Reward-State-Action (SARSA) (Aljohani and Mohammed, 2022), Q-learning (Jiang et al., 2018; Lee and Brown, 2021), and Soft Actor-Critic (Chu et al., 2022). While these methods have been effective for optimisation tasks, not all are suitable for modelling the adaptability of decentralised private EV drivers. In on-policy models such as SARSA, the agent learns by following and updating the same policy. In contrast, off-policy models such as Q-learning enable agents to learn the optimal policy independently of the actions currently being taken (van Seijen et al., 2009).
In this paper, we use off-policy RL – Q-learning to enable human drivers to go beyond the current policy and learn from diverse past experiences. While value-based Q-learning performs well in many scenarios with discrete states and actions, its reliance on tabular approach limits its scalability. Training becomes inefficient when dealing with large state and action spaces, such as those in our study. To address this challenge, we employ the Deep Q-Network (DQN) framework introduced by Mnih et al. (2015). By leveraging deep neural networks to approximate the Q-value function to enhance performance, DQN is a more practical choice for our large-scale EV simulation.
Throughout this process, the agent’s objective is to complete all planned trips while avoiding battery depletion. Specifically, we formulate the charging decision-making process of EV drivers as a RL problem, where agents observe a state, take an action, receive a reward, and undergo a transition based on their chosen action. Each agent is initialised with a State of Charge (SOC) and a daily trip schedule comprising one or more destinations. The agent navigates through road networks using Dijkstra's shortest path algorithm, moving from its current node at time
At each time step t, the agent selects an action • •
We implement the learning process using a Deep Q-Network (DQN), which maps state-action pairs to Q-values through hidden layers. The DQN takes the state variables
After performing the selected action
For each mini-batch, the network computes the predicted Q-values and the target Q-values using the Bellman Equation. The network parameters are then updated by minimising the loss function
Another important term to define the agent’s experience is the reward • • • • • • • • • •
Results
Cluster analysis, reward change, model calibration, and model validation
K-means clustering was applied to all agents based on four continuous features: total trip distance, initial battery level, trip co-occurrence density, and public charger density. Cluster analysis results and the determination of the optimal number of clusters are provided in the Supplemental Materials. The cumulative reward change for each agent type across all episodes is provided in Figure S9 in Supplemental Materials.
Prior to model validation, three key parameters were calibrated: the learning rate, discount factor, and exploration rate. Details of the calibration process and results are included in the Model Calibration section in the Supplemental Materials.
The model was validated using 21 days of real-world charging session data through correlation analysis comparing simulated charger usage patterns with observed data, both spatially and temporally. Details of the model validation results are included in the Supplemental Materials. To further illustrate the behavioural validity of the results, we analysed the action selection probabilities of agents throughout the training process and evaluated if the emergent behaviours of different types of agents align with established and intuitive understanding of EV driver behaviour. Details are provided in the Action Selection Probability of Agents section in Supplemental Materials.
Charging demand distribution
After model validation, the SOC distribution along road networks was analysed using data from the 703rd episode, which achieved one of the highest spatial and temporal correlations with real-world data. To represent the average battery status at a location, mean SOC values were calculated at each network node based on all passing EV drivers.
Due to the dense distribution of network nodes, the H3 hexagonal spatial indexing system was used to aggregate and visualise SOC patterns. H3 is an open-source geospatial grid system that divides the Earth’s surface into hexagonal cells at varying resolutions (H3, 2024). Lower H3 resolution levels correspond to larger hexagons and broader spatial aggregation, while higher levels offer finer spatial detail. Mean SOC values were calculated for each hexagonal cell and visualised at resolutions 7 and 6 (Figure 2(a) and (b)), with average cell areas of approximately 5.2 km2 and 36.1 km2, respectively. Consecutive SOC change distribution along road networks at the 703rd episode.
To identify areas where EV drivers face a higher risk of battery depletion, the resolution 6 SOC map was classified into five categories using natural breaks. Areas where SOC falls into the lowest category (0 – 0.22) are shown in Figure 2(c). However, these patterns alone do not definitively indicate charging risk, as they may reflect drivers’ willingness to operate at lower battery levels rather than insufficient charging infrastructure. To accurately identify true risk areas, it is necessary to consider both SOC patterns and public charger availability together.
To this end, buffer zones of 500 m, 1000 m, and 1500 m were created around each node and the number of chargers within each buffer was calculated. Applying k-means clustering to the charger counts and average SOC values, three distinct clusters were identified for each buffer distance (Figure 3(a)–(c)). The selection of the optimal cluster number is detailed in the Supplementary Materials. Across all buffer distances, Cluster 0 consistently represented locations with both low average SOC values and limited nearby charging infrastructure, which indicates higher risks of battery depletion for EV drivers. The spatial distribution of these high-risk locations across buffer sizes is shown in Figure 3(d)–(f). Areas with high-risk of battery depletion.
The 500 m analysis identified widely distributed high-risk areas, but likely overestimates infrastructure needs, as drivers typically accept detours of more than 500 m for charging. The 1000 m and 1500 m analyses revealed more realistic risk patterns, both highlighting areas such as Outer London, the eastern M4 corridor, Greater Manchester, and south Newcastle. The key difference between the two buffer distances lies in Greater London. The 1500 m result primarily highlights high-risk areas along London’s boundaries, while the 1000 m result identified more high-risk areas toward Inner London. Notably, fewer areas along the M4 and M1 corridors were classified as high-risk compared to the SOC-only analysis shown in Figure 2(c). This suggests that, despite frequent low SOC observations, these corridors have relatively sufficient public chargers which reduce the actual risk of battery depletion.
Discussion
This paper presents a multi-stage agent-based RL model to train and simulate EV drivers’ charging and driving behaviours in GB. Unlike previous studies that apply RL to optimise EV fleet operation strategies, we employ RL to capture the adaptive learning process of private EV drivers. By validating each episode against real-world charging data, we identify the simulation stage that best reflects reality, which also represents a balance between adaptive behaviour and bounded rationality. The validity of the model is further supported by the emergent behaviours of trained agents, which align with established and intuitive understanding of EV driver behaviour.
The model generates a high-resolution aggregated SOC distribution along road networks, and identifies locations with higher risks of battery depletion. Many areas with low average SOC are situated along inter-city motorways, likely due to the extensive prevalence of long-distance travel in these corridors. By exploring charging risks across multiple search distances, we identified key ‘charging desert’ in GB, which faces both low SOC levels and limited charger provision. Battery depletion in these areas poses greater risks for long-distance travellers. Accordingly, while initial charger deployment focused on destination and residential charging (Department of Transport, 2022a, 2022b), our findings support recent policy shifts toward developing rapid charging hubs along motorway corridors and urban peripheries in these areas (Department of Transport, 2022a, 2022b; Mohammed et al., 2024).
The simulated charging demand distribution can also serve as a key input for infrastructure planning models. As noted by Kchaou-Boujelben (2021), one of the major challenges in public charger location models lies in accurately estimating the spatial and temporal distribution of charging demand. For example, Yi et al. (2023) used rule-based ABMs to generate high-resolution driving profiles and predict where charging requests are likely to arise before optimising charger placement. The agent-based RL model proposed in this study extends this direction of research by integrating RL-trained agents with adaptive behaviours and providing a more behaviourally realistic representation of charging demand distribution. These estimates can subsequently be integrated into public charger optimisation frameworks, such as those by Anjos et al. (2020) and Ljósheim et al. (2026), to support more robust infrastructure planning decisions.
This study has several limitations that warrant further research. First, although the RL framework offers an avenue for modelling agent behaviours in unseen scenarios, questions remain about the trade-offs between predictability and model tractability: while drivers’ final decisions in each episode can be observed, their exact reasons behind each choice cannot be specified. Second, due to computational constraints, this study simulates 1000 agents simultaneously. Future research could use high performance computing infrastructure to simulate larger driver populations. Third, the NTS data covers England residents but lacks trips by Scotland and Wales residents, which future work could address by incorporating additional regional data. The NTS dataset also does not provide information on EV ownership, and we assume that drivers maintain their existing travel patterns upon switching to EVs. Although travel survey data combined with appropriately specified EV attributes has been shown to effectively support EV driver behaviour modelling (Pareschi et al., 2020), future research could integrate EV adoption inference models that account for socio-demographic factors, residential property types, and attitudes towards EVs to generate more representative synthetic populations. Fourth, initial SOC levels are generated using a truncated normal distribution, which does not explicitly account for access to home and workplace charging. Such access can directly influence initial SOC, yet their ownership information is absent from the NTS dataset, which also lacks residential property types or workplace locations that could be used to infer charger access. Future research could address this by integrating home and workplace charger availability data, once available, to generate more realistic initial SOC distributions.
Moreover, while the current framework captures heterogeneity in learning progress across agent types, future research could more explicitly model the mix of experienced and novice drivers in the EV driver population. This could be achieved by introducing rule-based agents with simple decision heuristics representing novice drivers alongside RL-trained agents representing experienced drivers. Such an approach can create a more explicitly heterogeneous agent population and allow for investigation of how the proportion of novice versus experienced drivers affects aggregated charging patterns. Additionally, while this study focuses on charging demand from private EV drivers, future studies could incorporate other user groups such as EV fleets to provide a more comprehensive representation of public charger usage. The model also assumes that drivers follow the shortest path as determined by Dijkstra’s algorithm. While this ensures computational efficiency, real-world drivers may deviate from optimal routes due to personal preferences, habitual behaviour, or incomplete information. Future research should incorporate bounded rationality in route choices through approaches such as stochastic shortest path formulations to better reflect real-world driving behaviour. Finally, the charger location data may not comprehensively cover all public chargers across GB, which may affect the accuracy of identifying ‘charging deserts’ and the overall representation of charging accessibility. A more complete, nationwide dataset on public charger data would enable more detailed and reliable identification of charging deserts.
Conclusion
To address the limitations of both data-driven and simulation-based approaches, we developed an agent-based RL model using DQN to capture the charging behaviours of heterogeneous EV drivers in GB. The modelling framework captures both the adaptive charging behaviours of EV drivers and the emergent patterns arising from interactions at charging stations. Rather than seeking optimal charging strategies, which is atypical of private EV drivers, we validated the model against real-world charging session data and identified episodes that closely align with observed charging behaviours. The validity of the model is further supported by the emergent behaviours of trained agents, which align with established expectations of EV driver behaviour. Our results also provide a validated modelling framework for estimating the spatial distribution of charging demand and identifying areas at high risk of battery depletion, and highlight the importance of establishing more rapid charging hubs along motorway corridors in the next phase of public charger deployment.
Supplemental material
Supplemental Material - Replicating the behaviour of electric vehicle drivers using an agent-based reinforcement learning model
Supplemental Material for Replicating the behaviour of electric vehicle drivers using an agent-based reinforcement learning model by Zixin Feng, Qunshan Zhao and Alison Heppenstall in Environment and Planning B: Urban Analytics and City Science.
Footnotes
Acknowledgements
The authors would like to thank the anonymous reviewers for their insightful comments and suggestions on an earlier version of this manuscript.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The work was made possible by ESRC’s on-going support for the Urban Big Data Centre (ES/L011921/1 and ES/S007105/1). Qunshan Zhao has also received and the Royal Society International Exchange Scheme (IEC/NSFC/223042). Alison Heppenstall is funded by the SIPHER Consortium (MR/S037578/2), the Systems Science Research in Public Health (MC_UU_00022/5), and the ExAMPLER (EP/Y008839/1).
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data Availability Statement
The ChargePoint charging session data were scraped from ChargePoint’s charging station map website (https://driver.chargepoint.com/). While using web scrapping to gather the availability status and other details of public chargers, we ensured that our activities were limited to publicly accessible information and conducted solely for non-commercial academic research purposes. Our scraping process strictly adhered to the web scraping policy by the Office for National Statistics and complied with UK copyright law, which permits lawful access to material for non-commercial academic research. The data is available at https://researchdata.gla.ac.uk/1990/. The Ordnance Survey Open Road data is publicly available at https://www.ordnancesurvey.co.uk/products/os-open-roads. The Ordnance Survey Code-Point Open data is publicly available at https://www.ordnancesurvey.co.uk/products/code-point. The UK EV datasets is publicly available at https://ev-database.org/uk. The 2022 UK National Travel Survey (NTS) data is a special license data and can be accessed upon application through
.
Supplemental material
Supplemental material for this article is available online.
Author biographies
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
