Factors Influencing Electric Vehicle Adoption: Coupling Machine Learning Models and Open-Source Data

Abstract

Electric vehicles (EVs) are increasingly promoted in many countries to reduce transportation-related greenhouse gas emissions, offering numerous environmental and economic benefits, such as lower tailpipe emissions and operating costs. However, predicting EV adoption entails access to comprehensive data sets. This study introduces a cost-effective alternative by leveraging open-source socioeconomic and demographic (SED) data from census records and machine learning (ML) models. Focusing on high-resolution geography at the dissemination area level, this study examines the association of SED characteristics, urbanization, annual vehicle kilometer traveled (VKT), and charging infrastructure to predict EV adoption. Ensemble ML models, particularly eXtreme Gradient Boosting, achieve superior predictive accuracy (up to 95%), with forward sequential feature selection identifying 18 key features that enhance model performance. Furthermore, Shapley Additive exPlanation analysis indicates that higher education, income, urbanization, and charging infrastructure availability are strong drivers of EV adoption. In contrast, high population density, longer VKT, and extended commuting durations pose barriers. This approach validates the existing determinants of EV uptake and introduces a scalable, reproducible framework for policymakers. This study demonstrates the feasibility of high-resolution spatial forecasting by leveraging publicly available data. In addition, it provides actionable insights to support targeted policies and infrastructure development to accelerate EV adoption.

Keywords

electric vehicle prediction machine learning models open-source data

Electric vehicles (EVs) have emerged as a viable solution to environmental concerns and climate change in transportation ( 1 ). Motivated by technological advancements and the need to reduce transportation-related greenhouse gas emissions, the adoption of EVs has steadily increased ( 2 – 4 ). This increase is supported by various measures, including government policies promoting clean energy ( 5 , 6 ), advancements in battery technology, cost, driving ranges ( 7 , 8 ), and growing consumer interest in reducing carbon emissions ( 4 , 9 , 10 ).

Understanding the factors influencing EV adoption is crucial for policymakers, urban planners, and the automotive industry ( 3 , 6 , 11 ). Accurate predictive models are critical for forecasting future trends ( 12 ) and guiding infrastructure deployment ( 13 , 14 ). Identifying the determinants of EV adoption also enables stakeholders to make informed decisions that support the transition to electric mobility.

The current literature on EV adoption provides a comprehensive overview of various factors influencing this transition ( 3 , 5 – 7 , 9 , 11 , 15 , 16 ). Various studies highlighted the significance of policies, socioeconomic and demographic (SED) characteristics, and national commitments in shaping EV demand ( 6 , 15 , 17 – 19 ). Policies, demographic trends, consumer behavior, and purchasing choices are pivotal in EV adoption ( 6 , 15 – 18 ). These findings emphasize that aligning EV policies with consumer preferences is essential for encouraging broader EV deployment.

Comparative analyses across different regions highlight commonalities in factors influencing EV adoption. Research on the European Union and the US indicates that demographic characteristics, including age, education levels, population density, income levels, subsidies, and taxes, significantly impact EV market penetration ( 5 ). Moreover, exploring EV adoption in a mature market in Norway highlights the effects of SED characteristics, climate conditions (e.g., wind speed, temperature, and precipitation), and charging station availability on EV adoption patterns ( 3 ). These studies suggested that urban areas with high-income and well-educated individuals and more available charging stations tend to support EV adoption. In contrast, high-density areas with lower minimum temperatures and older populations (over 50 years) often have lower EV adoption rates. Similarly, a recent longitudinal study in Canada from 2013 to 2023 found increased consumer awareness of zero-emission vehicles (ZEVs), particularly in understanding charging methods and public charger availability, alongside higher valuations for home and fast charging in 2023. However, persistent confusion about hybrids and unchanged preferences for ZEV drivetrains suggest that awareness and infrastructure improvements alone may not drive sales growth without regulatory support ( 20 ).

Despite these insights, most previous studies on EV adoption relied on primary data collected specifically to predict the likelihood of EV adoption. The dominant approaches include stated and revealed preference approaches ( 4 , 10 , 21 – 23 ). However, this data collection entails substantial costs, effort, and sampling limitations in scope. Therefore, there is a growing need to utilize publicly available data sets (e.g., census data) to predict EV adoption. This facilitates a more cost-effective and comprehensive approach without relying on primary data sets. This approach could offer a practical alternative, expand the understanding, and forecast EV adoption trends.

Machine learning (ML) models are emerging as powerful tools with promising prediction capabilities. Traditionally, ML models have been employed to predict EV adoption by addressing factors such as customer attitudes, SED characteristics, and vehicle-related attributes ( 24 – 26 ). These studies demonstrated that ML models can explain interconnected relationships and offer valuable insights into the determinants of EV adoption by integrating attitudinal factors, including pro-EV attitudes, environmental concerns, and interest in EV technology. Techniques, such as the gradient boosting machine (GBM) and K-means clustering, have examined user preferences and forecast market uptake based on socioeconomic characteristics and vehicle preferences ( 24 , 25 , 27 ). These approaches identified market segments and tailored strategies to different customer groups to understand potential EV adopters.

In addition, ML techniques have been applied to identify EV-related factors, including vehicle range, charging duration, costs, tax incentives, and existing charging infrastructure ( 9 , 11 , 16 , 28 ). These studies reported that ML models can predict EV deployment in the automotive market, supporting strategic planning and market trend forecasting ( 9 , 28 ). Their findings indicated that ML models provide valuable insights into the key parameters influencing EV adoption ( 10 , 11 ), enabling informed decision-making for policymakers and industry stakeholders ( 4 ).

Contributions to the Literature

Previous studies have significantly contributed to identifying the determinants for EV adoption, emphasizing the importance of factors such as income, education, infrastructure, and environmental concerns. However, the reliance on expensive surveys in most studies often limits the scope and applicability of their findings, as outlined in Table 1.

Table 1.

Comparative Analysis of Electric Vehicle Adoption in the Current Literature

Feature	Existing relevant literature ( 3 , 5 , 9 , 11 , 24 – 26 , 28 )	This study
Data source	Primarily relies on primary data (i.e., surveys and tailored datasets) collected specifically for EV studies	Publicly available data sets (i.e., census data) at the dissemination area level
Key factors considered	Policies, SED characteristics, consumer attitudes, vehicle attributes (e.g., range and cost), infrastructure, climate conditions	SED characteristics, urbanization score, annual VKT, and charging infrastructure availability
Geographic scope	Broad regional comparisons (e.g., EU, US, and Norway) and national-level analyses	Spatially focused (DA-level analysis)
Methodology	Statistical models, comparative analysis, and ML models	ML models (optimal model selection and feature importance analysis)
Cost and accessibility	Often expensive because of primary data collection, limited by sampling scope and effort	Cost-effective and reproducible using open-source data
Unique contributions	Highlights regional SED differences Integrates attitudinal factors (e.g., pro-EV attitudes) Focuses on mature markets	Uses open-source census data for spatial prediction Incorporates VKT, urbanization score, and existing charging infrastructure Identifies feature contributions using ML models
Prediction focus	Predicting the likelihood of adoption, market segmentation, and user preferences	Accurate EV adoption forecasting using publicly available data
Limitations addressed	Limited by the scope of primary data, there is less emphasis on cost-effective alternatives	Overcomes cost, effort, and sampling limitations of primary data collection
Policy implications	Guides policy through detailed but region-specific or survey-dependent findings	Support informed decision-making with a generalizable framework

Note: EV = electric vehicle; ML = machine learning; SED = socioeconomic and demographic; VKT = vehicle kilometer traveled; DA = dissemination area.

The comparative analysis in Table 1 highlights that previous research on EV adoption relies heavily on resource-intensive primary data collection and focuses on regional comparisons or attitudinal factors. Therefore, there is a dire need to develop a comprehensive framework that predicts EV adoption using ML models from publicly available data. Although it does not cover all aspects shaping EV adoption decisions, this approach utilizes ML models to provide accurate predictions and valuable insights into the factors influencing EV adoption through readily accessible data, enabling generalization and reproducibility.

Therefore, this study develops a comprehensive framework employing ML techniques to examine the impact of various SED characteristics, urbanization score, annual vehicle kilometer traveled (VKT), and availability of charging infrastructure on EV adoption.

The main contributions of this study include.

Utilizing open-source census data to spatially predict EV adoption at the dissemination area (DA) geographical aggregation.

Addressing the impact of SED, annual VKT, urbanization level, and existing EV charging infrastructure on EV adoption.

Identifying the optimal ML model and determining the contribution of each feature (explanatory variables) and their overall association to predict EVs.

Data and Methods

Data

This study aims to identify the factors influencing the adoption of EVs in Ontario, Canada, using ML models and publicly available census data. According to the 2021 Canadian census data ( 29 ), Ontario spans approximately 0.89 million/km² and has a population of 14.22 million inhabitants residing in over 5.49 million dwellings. The population is distributed across more than 20,000 DAs.

The SED characteristics for each DA in Ontario were obtained from the 2021 Statistics Canada census ( 29 ), Canada’s most recent census data. Utilizing the SED characteristics for this analysis provides a comprehensive set of explanatory variables at the DA level to understand the factors influencing EV adoption. For instance, Figure 1 shows the spatial distribution of the average household size at the DA level across Ontario. Of note, the Greater Toronto and Hamilton Area (GTHA) refers to the contiguous urban region that includes some of the largest cities and metropolitan areas by population in Ontario. This region includes the Greater Toronto Area and the City of Hamilton ( 30 ).

Figure 1.

Spatial distribution of average household size across Ontario at the dissemination area level: (a) shows the GTHA, while (b) represents Ontario.

Table 2 summarizes the descriptive statistics for dependent and independent variables used in this study. This model includes population density, age, dwelling characteristics, employment rate, journey-to-work attributes, marital status, total income, dwelling value, and education levels. The maximum population density is 10,696.8 persons/km², with a mean value of 3,384.3. Some DAs have no recorded population density. The average age per DA is 42.6 years, and the mean employment rate is 54.5%.

Table 2.

Descriptive Statistics of Study Parameters

Parameter	Description	Minimum	Mean	Maximum	Standard deviation	Reference**
Dependent variable (Target)
EVCnt	Registered EV counts	0.0	4.9	16.0	4.6	( 11 )
Independent variables (Features)
PopDens	Population density per square kilometer	0.0	3,384.3	10,696.8	3,031.8	( 3 , 20 )
Age	Average age of the population (years old)	30.6	42.6	54.2	4.9	( 5 , 15 , 18 )
HouseDwell	Percentage of occupied private dwellings (house type)	10.0	77.6	100.0	28.2	( 9 )
ApartDwell*	Percentage of occupied private dwellings (apartment type)	0.0	22.4	100.0	28.1	( 11 )
HhldSize	Average household size	1.3	2.6	4.1	0.5	( 9 )
EmpRate	Labor employment rate	29.2	54.5	80.4	9.7	( 10 )
CommDur	Average commuting duration (minutes)	9.5	28.0	46.4	6.4	( 11 )
Car	Percentage of the population using a car for commuting to the workplace	48.9	86.2	100.0	14.6	( 10 )
Transit	Percentage of the population using public transit for commuting to the workplace	0.0	7.4	31.3	9.9	( 9 )
Active	Percentage of population using active modes for commuting to the workplace	0.0	5.8	23.3	7.0	( 10 )
CSDRes	Percentage of population that commutes within their census subdivision (CSD)	0.0	61.4	100.0	26.8	( 10 )
DiffCSDRes	Percentage of the population that commutes to different CSDs	0.0	16.1	69.4	20.0	( 10 )
Married*	Percentage of the population that is married and living common-law	50.5	72.0	93.3	8.3	( 15 , 18 )
NotMarried	Percentage of the population that is not married and not living common-law	6.7	28.0	49.5	8.3	( 15 , 18 )
SqrtIncome	The square root of average total income in 2020 (Canadian dollar)	$185.4	$337.1	$482.1	$58.0	( 17 )
SqrtDwellVal	The square root of the average value of dwellings (Canadian dollars)	$282.8	$857.5	$1,414.2	$216.7	( 6 )
LowEdu*	Percentage of the population that has no certificate, diploma, or degree	50.4	64.6	78.3	5.1	( 5 , 15 )
MedEdu*	Percentage of the population that has a high/secondary school diploma or equivalency certificate	0.0	17.1	35.0	6.0	( 5 , 15 )
HighEdu	Percentage of the population that has a postsecondary certificate, diploma, or degree	0.0	18.3	45.3	9.1	( 5 , 15 )
AnnVKT	Average annual vehicle kilometer traveled (1,000s)	7.1	13.5	19.9	2.4	( 24 , 25 )
UrbScore	A score derived that is 0–10; zero is the most rural/remote, and 10 is the densest urban	1.9	6.2	9.8	1.6	( 9 , 10 )
ChStNum	Number of existing charging infrastructure	0.0	0.2	46.0	0.9	( 15 )

Note: EV = electric vehicle.

These variables were removed from the analysis to reduce pair-wise linear correlation (threshold = 0.8). In addition, all variables are measured at the dissemination area level throughout Ontario, and the education levels are for all age groups; **Reference field relates the utilized predictors to previous studies.

In addition, this study examines the impact of the annual VKT, urbanization score, and the number of existing EV charging infrastructures on EV adoption. Figure 2 shows the distribution of the existing EV charging stations, including Level 2 and DC fast charging technologies, across Ontario ( 31 ). For home charging, this study assumes that home charging can be indirectly measured by considering individuals who dwell in houses (i.e., the HouseDwell variable in Table 2), as they typically have a dedicated garage and access to home charging. This is because home charging data is not publicly available.

Figure 2.

Spatial allocation of existing public electric vehicle charging stations across Ontario: (a) shows the GTHA and while (b) represents Ontario.

The maximum number of existing charging infrastructures and EV counts per DA are 46 and 16, respectively. This indicates that EV adoption in Ontario is in its early stages. For EV adoption, it was predicted from EV registration data (counts) at the DA level. Of note, the urbanization score (0–10) was estimated from multiple census indicators, including population density, housing density, employment rate, commuting patterns, and education levels, that reflect the degree of urbanization in each region ( 10 ).

A correlation analysis is conducted to identify and remove the highly correlated variables (i.e., correlation coefficient > 0.8). Of note, the Transit variable is retained to examine the impact of different transit modes on EV adoption. Figure 3 shows that the correlation values (upper triangle) are not excessively high, allowing for further relationship-based analysis. In addition, the pair plot (lower triangle) in Figure 3 shows nonlinear relationships between the model variables, suggesting that ML models are highly recommended for this analysis. The diagonal presents the distribution of the model variables.

Figure 3.

Correlation matrix of model variables.

Methods

This study develops a comprehensive framework for predicting EV adoption using ML models, as shown in Figure 4. The overall aim is to determine the association of several factors, including various SED characteristics, urbanization, annual VKT, and the existing charging infrastructure, to predict EV uptake. The analysis begins with raw data from the 2021 Statistics Canada Census ( 29 ).

Figure 4.

Analytical framework for the proposed methodology.

Several preliminary steps were conducted to prepare the data for modeling. First, the variables were coded, and the raw data were converted into meaningful variables. Then, the data was cleaned by addressing the missing values to ensure reliability. The missing values associated with a variable were replaced with the median of the sample. Outliers were handled using the interquartile range (IQR) method to prevent their impact on model performance. The IQR method starts by calculating $Q 1$ (25^th percentile) and $Q 3$ (75^th percentile) of the data to estimate the IQR (i.e., $Q 3 - Q 1$ ). After that, the outliers are defined as data points below $Q 1 - 1.5 \times IQR$ or above $Q 3 + 1.5 \times IQR$ . A correlation analysis was conducted to identify the relationships between variables and to remove highly correlated variables that could negatively impact the model results.

Once the data was cleaned, various SED variables (Table 2) were normalized using a Min-Max scaler, which transforms features by scaling each feature between zero and one to ensure comparability between all variables. The data set was then split into training (70%) and testing (30%) sets to facilitate the model-building and evaluation.

In addition, five ensemble ML models (i.e., Random Forest [RF], eXtreme Gradient Boosting [XGBoost], Gradient Boosting, CatBoost, and Adaptive Boosting [AdaBoost]) were evaluated to identify the most suitable model for predicting EV adoption. Hyperparameter tuning was performed to optimize the parameters of the selected models, enhancing their performance. The tuned models were then compared to select the best-performing one for accuracy and standard errors.

Once the optimal model was identified, a forward sequential feature selection (SFS) was employed to refine the model further by selecting the features based on their importance and contribution to the model’s performance. Finally, Shapley values were estimated using Shapley Additive exPlanations (SHAP) to interpret the significance of each feature, providing insights into how different variables predict EV adoption.

The developed framework utilizes publicly available data to gain insights into EV adoption determinants without relying on expensive surveys. By following this structured approach, the proposed framework ensures a robust and interpretable model that can assist policymakers and stakeholders in understanding and promoting EV adoption.

In this study, the dependent variable (i.e., EV counts at the DA level) was derived from detailed vehicle registration data. The data was aggregated at the DA level to mask personal identifier data. This involves integrating detailed six-digit postal code information with constraints imposed by broader three-digit postal code data, known as Forward Sortation Areas. The estimated EV counts were then mapped to their corresponding DAs using a postal code conversion lookup table, as shown in Figure 5. Of note, EV counts encompass battery electric vehicles and plug-in hybrid electric vehicles. In addition, the registered EV data is based on the same years as the census data.

Figure 5.

Spatial distribution of electric vehicle counts across Ontario at the dissemination area level: (a) shows the GTHA, while (b) represents Ontario.

Machine Learning Models

This study utilizes and compares ensemble ML methods, including RF, gradient boosting, and extreme gradient boosting, to evaluate EV adoption. These ensemble methods combine predictions from multiple models to enhance accuracy, reduce the risk of overfitting, and improve generalization. Ensemble methods can be categorized into two main types: bagging and boosting. Bagging involves training multiple models in parallel on different subsets of the training data and then averaging their predictions to achieve a final result. In contrast, boosting trains models sequentially, with each new model addressing the errors made by previous models.

The RF method, introduced by Breiman ( 32 ), employs a bagging approach to perform classification and regression tasks. The RF algorithm starts at the root node with the entire data set, divided into in-bag (training) and out-of-bag (validation) samples. This division allows the evaluation of each predictor variable based on its ability to separate the nodes. The algorithm uses a tree-based approach with pruning and cross-validation to minimize overfitting. Each decision tree in the RF predicts a given input $x$ . The final prediction for RF regression is the average of the predictions from all individual trees, as described in Equation 1.

\hat{y} = \frac{1}{T} \sum_{t = 1}^{T} f_{t} (x)

(1)

where

$\hat{y}$ = final predicted value for the input $x$ ,

$f_{t}$ = prediction of the $t$ -th tree, and

$T$ = total number of trees in the forest.

The GBMs are powerful techniques for supervised ML applications ( 33 – 36 ). GBMs construct an ensemble of weak prediction models, typically decision trees, added sequentially to minimize a loss function ( 36 ). Each new model is trained to move toward the negative gradient of the loss function for the predictions to address the errors made by previous models. By adjusting key parameters, such as the number of trees, tree depth, and learning rate, GBMs can achieve high predictive performance.

The GBM algorithm starts with a base model, usually the average of the target values for regression problems, as detailed in Equation 2, where $L$ represents the loss function (e.g., mean square error [MSE] for regression models) and $y_{i}$ represents the actual values. In addition, $γ$ represents the value that minimizes the loss function $L$ over all the training samples. Essentially, it is the optimal constant prediction that minimizes the loss function for the initial model $F_{0} (x)$ . For regression problems, this is often the average of the target values (i.e., the number of EVs).

F_{0} (x) = \arg min_{γ} \sum_{i = 1}^{n} L (y_{i}, γ)

(2)

In each iteration $m$ , the pseudo-residuals are calculated by estimating the negative gradients of the loss function for the current model predictions, as outlined in Equation 3. A weak learner (e.g., a decision tree), illustrated in Equation 4, is then fitted to these pseudo-residuals.

r_{i}^{m} = - {[\frac{\partial L (y_{i}, F (x_{i}))}{\partial F (x_{i})}]}_{F (x) = F_{m - 1} (x)}

(3)

f_{m} (x) = \arg min_{f} \sum_{i = 1}^{n} {(r_{i}^{m} - f (x_{i}))}^{2}

(4)

The model is then updated by adding the new weak learner scaled by the learning rate $η$ , as detailed in Equation 5, where $F_{m} (x)$ represents the prediction model at iteration $m$ . This process is repeated for M iterations, where $M$ represents the total number of trees. The final prediction of the GBM regression model is the sum of the initial model and all the subsequent weak learners added at each iteration.

F_{m} (x) = F_{m - 1} (x) + η f_{m} (x)

(5)

The eXtreme Gradient Boosting (XGBoost) is an advanced and optimized GBM version known for its exceptional performance and various practical applications ( 35 , 37 , 38 ). XGBoost excels in scalability and processing speed, making it suitable for large datasets and complex problems ( 11 , 35 , 36 , 39 ). XGBoost addresses overfitting and manages the bias–variance trade-off by incorporating bagging–bootstrap aggregation and feature randomness.

XGBoost uses an additive function to predict the target (dependent) variable. Let $D = {(x_{i}, y_{i})}$ , a data set with $m$ features and $n$ observations, where $| D | = n$ , $x_{i} \in R^{m}$ , and $y_{i} \in R$ . The predicted ${\hat{y}}_{i}$ is outlined in Equation 6, where $F$ represents the space of additive trees, as defined in Equation 7. Of note, $r$ represents the structure of an individual tree and $T$ represents the number of leaves in the tree.

{\hat{y}}_{i} = H (x_{i}) = \sum_{a = 1}^{A} f_{a} (x_{i}), f_{a} \in F

(6)

F = {f (x) = w_{r (x)} | r : R^{m} \to T, w \in R^{T}}

(7)

XGBoost uses the objective function in Equation 8 to optimize the ensemble of trees and minimize errors, where $l$ represents a loss function measuring the error between the actual $y_{i}$ and predicted ${\hat{y}}_{i}$ values, $k$ represents the iteration number, and $Ω$ represents the regularization term, as defined in Equation 9. Of note, $w_{j}$ represents the weights of the leaves, and $γ$ and $λ$ represent regularization parameters.

L (k) = \sum_{i = 1}^{n} l (y_{i}, {\hat{y}}_{i}^{(k - 1)} + f_{k} (x_{i})) + Ω (f_{k})

(8)

Ω (f_{k}) = γ T + \frac{1}{2} λ \sum_{j = 1}^{T} w_{j}^{2}

(9)

AdaBoost is a boosting algorithm that constructs a strong classification and regression by sequentially training weak learners and adjusting their weights to emphasize incorrectly predicted samples. Unlike gradient boosting, which relies on minimizing a differentiable loss function, AdaBoost assigns higher importance to incorrectly predicted observations.

Given a data set of $D = {(x_{i}, y_{i})}$ with $n$ observations, where $x_{i} \in R^{m}$ , and $y_{i} \in {- 1, 1}$ , AdaBoost iteratively builds a final hypothesis as a weighted sum of weak predictions, as detailed in Equation 10, where $h_{a} (x)$ represents the base learner and $α_{a}$ represents the weights assigned to each predictor.

H (x) = \sum_{a = 1}^{A} α_{a} h_{a} (x)

(10)

The model minimizes the exponential loss function at each iteration, given by Equation 11, where $w_{i}$ represents the sample weights, updated after each iteration, as detailed in Equation 12. In addition, the weight of each weak learner is computed by Equation 13, where $ϵ_{k}$ represents the weighted prediction error. By iteratively refining weak predictors and adjusting weights, AdaBoost enhances prediction accuracy.

L = \sum_{i = 1}^{n} w_{i} e^{- y_{i} H (x_{i})}

(11)

w_{i}^{k + 1} = w_{i}^{k} e^{- α_{k} y_{i} h_{k} (x_{i})}

(12)

α_{k} = \frac{1}{2} \ln (\frac{1 - ϵ_{k}}{ϵ_{k}})

(13)

CatBoost is a gradient boosting algorithm that enhances predictive accuracy while maintaining computational efficiency. Unlike traditional gradient boosting methods, CatBoost utilizes ordered boosting, which mitigates overfitting by ensuring that each data point is based on past observations. In addition, it employs the oblivious trees method, where the same splitting condition is used across all nodes at a given level. This approach leads to better generalization and computational efficiency.

Let $D = {(x_{i}, y_{i})}$ be a data set with $m$ features and $n$ observations, where $x_{i} \in R^{m}$ , and $y_{i} \in R$ . The model prediction follows an additive approach similar to XGBoost, represented in Equation 6, where $f_{a}$ belongs to the space of oblivious trees. CatBoost optimizes an objective function composed of a loss function $l$ and regularization term $γ$ , as outlined in Equation 14. Regularization is applied by adding a penalty term based on Ridge regularization on leaf weights. This additional penalty contributes to better model generalization.

L (k) = \sum_{i = 1}^{n} l (y_{i}, {\hat{y}}_{i}^{(k - 1)} + f_{k} (x_{i})) + γ (f_{k})

(14)

Sequential Feature Selection

SFS is a method used for feature selection or dimensionality reduction in sample sets, aiming to enhance the accuracy scores or improve the performance of ML models on complex datasets. SFS operates in two modes: forward and backward SFS ( 40 ). Forward SFS constructs a feature set by initially having no features and then incrementally adding one feature at a time. The feature that contributes the most to the improvement in prediction model performance, as determined by a specified metric (e.g., R-squared [R²], MSE, or cross-validation score), is included in the set. This iterative process continues until the cross-validation results indicate no further significant enhancement in the model’s performance.

In contrast, backward SFS starts with the complete feature set and progressively removes the least essential feature at each step. Of note, forward SFS is particularly efficient for high-dimensional data because it evaluates fewer features at each iteration ( 40 ). Therefore, this study employs forward SFS to identify the most relevant features for predicting EV adoption.

Mathematically, let $S_{k}$ be the set of features selected by forward SFS after the $k$ th iteration, with $S_{0} = \emptyset$ . At the ( $k + 1$ )th iteration, the feature $x_{i}$ to be added to $S_{k}$ is determined by Equation 15, where $L$ represents the loss function evaluating the model for the defined metric, $f (S_{k} \cup {x_{i}})$ represents the trained regression model on the feature set $S_{k} \cup {x_{i}}$ , and $D$ represents the training and evaluation dataset. The selected feature $x_{i}$ is then added to the set of selected features $S_{k + 1}$ . This iterative process continues until the stopping criterion is met.

x_{i} = \arg min_{x_{i} \notin S_{k}} L (f (S_{k} \cup {x_{i}}), D)

(15)

Shapley Additive exPlanations

The concept of Shapley originates from cooperative game theory, introduced by Shapley ( 41 ), aiming to fairly distribute the total payoff among players based on their contributions. Therefore, the Shapley value for a player $i$ is estimated by considering all possible coalitions $S$ of players that exclude player $i$ to measure the marginal contribution of player $i$ to these coalitions.

In ML, Shapley values are employed to identify the contribution of each feature to the model’s prediction using SHAP, proposed by Lundberg and Lee ( 42 ). In brief, SHAP adjusts the concept of Shapley values to explain the importance of features in predictive models. Each feature is treated as a player in a cooperative game where the payoff is the target variable (i.e., the model’s prediction). The aim is to distribute the predicted value among the features fairly, reflecting their respective contributions ( 43 ).

For a given observation $x$ and a model $f$ , the SHAP value for feature $i$ can be evaluated while considering all possible subsets of features, as outlined in Equation 16, where $F$ represents the set of all features, $f_{S} (x)$ represents the model’s prediction using the features in subset $S$ . In addition $f_{S \cup {i}} (x) - f_{S} (x)$ denotes the change in the prediction value when feature $i$ is added to the subset $S$ , and $| S |! \times (| F | - | S | - 1)! / | F |!$ represents the weight assigned to the marginal contribution, accounting for all possible orderings of features.

\begin{matrix} ϕ_{i} (f, x) = \sum_{S \subseteq F ∖ {i}} \frac{| S |! \times (| F | - | S | - 1)!}{| F |!} \\ \times [f_{S \cup^{{} i}} (x) - f_{S} (x)] \end{matrix}

(16)

In addition, SHAP values ensure that the sum of all feature contributions equals the difference between the prediction for an observation and the expected value (i.e., $E_{X} [f (X)]$ ) of prediction over the entire data set ( $D$ ), as expressed in Equation 17.

f (x) = E_{X} [f (X)] + \sum_{i = 1}^{| D |} ϕ_{i}

(17)

Therefore, this study employs SHAP to observe the degree of contribution of each feature (i.e., SED characteristics, urbanization, annual VKT, and existing charging infrastructure) and their overall association to predict EV adoption using ML models.

Results

ML Models Performance

The performance of various ensemble ML models (after tuning key hyperparameters) for predicting EV deployment (i.e., EV counts) at the DA level throughout Ontario is shown in Figure 6. Each model is evaluated using four well-known accuracy metrics: MSE, mean absolute error (MAE), Root mean squared error (RMSE), and R² values on training and testing datasets. Lower MSE, MAE, and RMSE and higher R² values indicate better performance.

Figure 6.

Comparison of performance metrics for all machine learning models.

The results demonstrate the effectiveness of using ensemble ML models to predict EV adoption with satisfactory accuracy. The XGBoost achieved the highest performance across all metrics, with 95% accuracy on the training data set and 89% on the testing data set. Based on these results, the XGBoost model is utilized for the subsequent analysis.

Figure 7 shows the residual distribution and probability density function for the differences between the actual and predicted EV counts at the DA level for all utilized ensemble models. The XGBoost model exhibits the narrowest range of residuals, indicating higher predictive accuracy. Therefore, based on the prediction accuracy results, the XGBoost model is utilized for the subsequent analysis.

Figure 7.

Residual distribution (all models).

Figure 8 shows the spatial distribution of EV counts predicted by the XGBoost model throughout Ontario at the DA level. Compared with the initial EV count representation shown in Figure 4, the findings shown in Figure 9 highlight the effectiveness of utilizing ML models with publicly available census data to predict and map EV adoption spatially. This approach demonstrates the potential of leveraging open-access data for detailed regional analysis and forecasting in EV adoption studies.

Figure 8.

Spatial distribution of predicted electric vehicle counts across Ontario at the dissemination area level: (a) shows the GTHA, while (b) represents Ontario.

Figure 9.

Actual versus predicted electric vehicle counts across Ontario at the dissemination level: (a) represents the Actual EV counts (GTHA area), (b) represents the Predicted EV counts (GTHA area), (c) represents the Actual EV counts (Ontario), and (d) represents the Predicted EV counts (Ontario).

Influential Factors for EV Adoption

A forward SFS was applied to identify the most relevant features that enhance the prediction of EV adoption. Figure 10 shows the results of applying forward SFS to the final predictions of the XGBoost model, using cross-validation to optimize the MSE metric. This process results in 18 key features that significantly improve the model’s performance and are critical to EV adoption prediction.

Figure 10.

Forward sequential feature selection results—with standard error.

Figure 11 shows the absolute mean SHAP value for each feature, indicating the magnitude of each feature’s contribution to predicting EV adoption at the DA level. Features are ranked in descending order from top to bottom according to their SHAP value, reflecting their importance to the model’s prediction. In addition, the SHAP summary plot shown in Figure 12 clarifies the impact of various features on EV adoption outcomes. The SHAP values are grouped by different features on the y-axis, with color indicating the feature value (red for high and blue for low). Positive SHAP values on the right side of the y-axis indicate a positive impact on EV adoption and vice versa. The features are also ordered by their association with predicting EV deployment at the DA level.

Figure 11.

Shapley Additive exPlanations feature importance plot.

Figure 12.

Impact of each feature on the model performance.

The analysis highlights that higher education levels, income, and dwelling values positively correlate with EV adoption, suggesting that wealthy areas with well-educated individuals tend to have higher rates of EV adoption. Similarly, urban areas and better access to active transportation options (e.g., walking or cycling) are positively associated with EV adoption, highlighting the role of urban infrastructure and sustainable transport preferences. In addition, more EV charging stations significantly promote EV adoption by providing the necessary infrastructure and reducing range anxiety.

However, some features show mixed or negative impacts on EV adoption. Higher population density is generally associated with lower EV adoption rates, possibly because of the limited access to public charging infrastructure in densely populated areas (Figure 13). Longer annual VKT and commuting durations have a slightly negative impact on EV adoption, probably because of concerns about EV range limitations.

Figure 13.

Shapley Additive exPlanations dependence plot—population density and existing charging infrastructure.

Other significant factors include demographics and commuting patterns. Younger populations are more inclined toward EV adoption, possibly because of their willingness to try new technologies (Figure 14). Meanwhile, areas with higher public transit usage or complex commuting patterns are less likely to adopt EVs. In addition, areas with lower percentages of single or non-traditionally married individuals show higher EV adoption rates (Figure 15).

Figure 14.

Shapley Additive exPlanations dependence plot—age and average income.

Figure 15.

Shapley Additive exPlanations dependence plot—marital status and average income.

The SHAP plots highlight the importance of a multifaceted approach using various SED factors to understand EV adoption. Higher education, income, urbanization, and charging infrastructure availability are vital parameters; other demographic and commuting factors also play crucial roles. The interconnected relationships between these measures highlight the need for targeted policies and infrastructure development to encourage widespread EV adoption, addressing financial and logistical considerations to accommodate diverse populations.

Discussion and Practical Relevance

This study develops a comprehensive framework that leverages ML techniques and publicly available census data to predict EV adoption with high accuracy (95% on training and 89% on testing data sets using XGBoost). Despite relying on open-source data rather than resource-intensive surveys, the results align closely with the findings from previous research. As summarized in Table 1, previous studies consistently identify SED factors, such as income, education levels, and infrastructure availability, as critical determinants of EV adoption ( 5 , 6 , 15 , 17 ). The findings corroborate these insights, demonstrating that higher income, education, and dwelling values positively correlate with EV adoption. These findings are consistent with survey-based studies conducted in regions such as the EU, the US, Norway, and Canada ( 3 , 5 , 20 ). Similarly, the results highlight the importance of charging infrastructure, emphasizing its well-documented role in reducing range anxiety and increasing adoption ( 9 , 11 , 16 , 20 , 28 ).

In addition, this study highlights nuanced adoption patterns consistent with previous research. For instance, the negative correlation between high population density and EV adoption reflects challenges in urban areas with limited access to charging options ( 3 ). Similarly, the higher adoption rates among younger populations align with survey findings highlighting their openness to new technologies ( 15 , 18 ). Urbanization and access to active transportation options further support established links between sustainable mobility and EV uptake ( 9 , 10 ). Other factors, such as longer VKT and extended commuting durations, negatively impacting adoption because of range limitations, are also consistent with vehicle-specific constraints identified in primary data research ( 7 , 11 ).

Unlike traditional studies that rely on costly surveys to assess attitudinal factors (e.g., pro-EV attitudes or environmental concerns) ( 24 , 25 ), this framework achieves comparable predictive performance using open-source data, reducing the reliance on such inputs. This approach does not capture psychological drivers, a limitation of this study; however, it effectively accounts for the core SED and infrastructural determinants highlighted in the literature. Furthermore, using SHAP to quantify feature importance enhances interpretability, linking the findings to survey-based studies by demonstrating how factors, such as education and infrastructure, influence adoption, similar to regression analyses in prior research ( 6 , 15 ). This study validates existing knowledge while demonstrating that publicly available data, combined with advanced ML techniques, can replicate and extend the explanatory power of survey-based research cost-effectively and in a scalable manner.

The proposed framework stands out for its simplicity and practicality. It offers policymakers an accessible and reproducible tool to predict and promote EV adoption without the costs and complexities associated with traditional survey-based approaches. By utilizing open-source census data and a streamlined ML pipeline, this framework enables stakeholders to make data-driven decisions with minimal technical or financial barriers. The high geographical resolution (DA level) and robust predictive accuracy associated with the developed framework make it particularly valuable for urban planners, transportation authorities, and industry leaders seeking to align policies and infrastructure investments with regional needs.

Implementing this framework involves intuitive steps that policymakers can adopt with essential support from data analysts or existing governmental statistical resources. These steps can be summarized as follows.

Data collection: Obtain publicly available census data containing SED variables (Table 2). These data sets are widely accessible, regularly updated, and do not require customized surveys.

Data preparation: Clean the data by addressing missing values and outliers, then normalize variables (e.g., using Min-Max scaler) to ensure comparability. This process can be automated using standard analytical tools, such as Python or R.

Model training: Split the data into training (70%) and testing (30%) sets, then apply an ensemble ML model such as XGBoost, which has demonstrated effectiveness in this study. Prebuilt libraries (e.g., scikit-learn and XGBoost) simplify this process, and hyperparameter tuning can be guided by default settings or minimal optimization.

Feature selection and interpretation: Use forward SFS to identify key predictors and SHAP analysis to assess their association. These methods generate actionable insights, such as prioritizing charging stations in high-income urban areas without the need for deep statistical expertise.

Prediction and mapping: Generate spatial predictions of EV adoption and visualize them (e.g., Figure 8) to identify target areas for intervention. These outputs can be easily integrated with geographic information system tools.

These steps require no proprietary data or costly fieldwork; instead, they leverage publicly available resources accessible to most governments. For instance, policymakers aiming to expand EV infrastructure could use the developed framework to identify high-income, highly educated areas with low charging station density, ensuring that investments align with predicted adoption hotspots. In addition, the reliance on standardized census data enhances the adaptability of the developed framework across different regions and countries with similar datasets.

The developed framework provides a simplified approach to EV adoption forecasting by transforming complex ML processes into a streamlined workflow. This approach eliminates the need for specialized survey teams or extensive data collection phases while delivering results comparable with those of more resource-intensive studies (Table 1). Policymakers can confidently use the developed framework to advance targeted strategies, whether incentivizing EV adoption in affluent urban areas or addressing infrastructural gaps in underserved communities, accelerating the transition to sustainable mobility efficiently and precisely.

An important observation from this model is the role of existing charging stations in EV adoption. This model utilizes existing charging stations as input for predictions; however, the relationship between EV adoption and charging infrastructure density is interdependent. Therefore, there is a chicken-and-egg relationship: more charging stations promote EV adoption, and more EVs necessitate increased access to charging. Therefore, this limitation necessitates future research that develops a well-coordinated strategy to simultaneously expand charging infrastructure in areas where EV adoption is likely to emerge.

Conclusion

In conclusion, this study contributes to the ongoing EV adoption research by leveraging ML models and publicly available data, offering a robust and cost-effective alternative to survey-based approaches. Previous studies identify vital determinants such as education, income, infrastructure, and environmental concerns through primary data; this developed framework achieves comparable predictive accuracy, with up to 95% on the training and 89% on the testing data sets using the XGBoost model, using open-source census data. This approach validates existing findings by analyzing SED characteristics, urbanization scores, annual VKT, and charging infrastructure availability, while offering a scalable, reproducible tool for spatially predicting EV adoption across various regions and countries.

The findings highlight the effectiveness of ensemble ML models, with XGBoost emerging as the optimal choice. Forward SFS identified 18 critical predictors, and SHAP analysis indicates that higher education levels, income, and dwelling values strongly drive EV adoption, aligning with survey-based research ( 5 , 6 , 17 ). Urbanization and access to active transportation options support EV uptake, and increased charging infrastructure mitigates range anxiety, reinforcing the findings from previous studies ( 9 , 11 , 28 ). Conversely, higher population density, longer VKT, and extended commuting durations impact adoption, reflecting the logistical challenges noted in urban areas and range-sensitivity contexts ( 3 , 7 ). In addition, demographic factors, including younger populations and lower percentages of single or non-traditionally married individuals, play pivotal roles, consistent with behavioral trends in the literature ( 15 , 18 ).

Beyond its analytical contributions, the developed framework offers practical value because of its simplicity and accessibility for policymakers. By leveraging open-source data and a streamlined ML pipeline, including data collection, preparation, model training, feature selection, and spatial prediction, this framework eliminates the need for expensive surveys while delivering high-resolution insights at the DA level. This easy-to-implement tool enables stakeholders to identify adoption hotspots, prioritize infrastructure investments, and develop targeted policies, such as expanding charging infrastructure in affluent urban areas or addressing adoption barriers in dense regions. In addition, the reproducibility of the developed framework across different regions amplifies its adaptability to diverse geographical and governmental contexts.

The developed framework does not account for attitudinal and psychological factors, which are significant drivers in survey-based research ( 24 , 25 ); however, its high accuracy and focus on tangible socioeconomic and infrastructural determinants compensate for this limitation. This study bridges the gap between resource-intensive research and actionable policymaking, demonstrating that publicly available data, combined with advanced ML techniques, can extend traditional findings. Furthermore, this study offers a scalable and cost-effective practical solution to support the transition to sustainable transportation by addressing financial, logistical, and demographic considerations that facilitate widespread EV adoption. Moreover, despite the proposed approach highlighting a strong association between charging station availability and EV adoption, it is unclear whether charger availability encourages adoption or if higher adoption prompts increased deployment of charging stations. Therefore, future longitudinal studies or randomized interventions could clarify whether the adoption of charging stations drives EV adoption or vice versa.

Footnotes

Author Contributions

The authors confirm contributions to the paper as follows: study conception and design: Shehabeldeen A., Mohamed M.; data collection: Shehabeldeen A.; analysis and interpretation of results: Shehabeldeen A.; draft manuscript preparation: Shehabeldeen A., Mohamed M. All authors reviewed the results and approved the final version of the manuscript.

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The authors would like to acknowledge support from the Natural Sciences and Engineering Research Council of Canada, Grant No: RGPIN-2025-05957.

ORCID iDs

Ali Shehabeldeen

Moataz Mohamed

Code Availability

The developed code will be made available on request.

References

Vidhi

Shrivastava

A Review of Electric Vehicle Lifecycle Emissions and Policy Recommendations to Increase EV Penetration in India. Energies, Vol. 11, No. 3, 2018, p. 483. https://doi.org/10.3390/en11030483.

Soukhov

Foda

Mohamed

Electric Mobility Emission Reduction Policies: A Multi-Objective Optimization Assessment Approach. Energies, Vol. 15, No. 19, 2022, p. 6905. https://doi.org/10.3390/en15196905.

Yang

A. N.

Liu

C. H.

Yang

C. R.

Electric Vehicle Adoption in a Mature Market: A Case Study of Norway. Journal of Transport Geography, Vol. 109, 2023, p. 103489. https://doi.org/10.1016/j.jtrangeo.2023.103602.

Mohamed

Higgins

Ferguson

Kanaroglou

Identifying and Characterizing Potential Electric Vehicle Adopters in Canada: A Two-Stage Modelling Approach. Transport Policy, Vol. 52, 2016, pp. 100–112. https://doi.org/10.1016/j.tranpol.2016.07.006.

Peng

Tang

J. H. C. G.

Yang

Meng

Zhang

Zhuge

Investigating the Factors Influencing the Electric Vehicle Market Share: A Comparative Study of the European Union and United States. Applied Energy, Vol. 355, 2024, p. 122327. https://doi.org/10.1016/j.apenergy.2023.122327.

Chandra

. Investigating the Impact of Policies, Socio-Demography and National Commitments on Electric-Vehicle Demand: Cross-Country Study. Journal of Transport Geography, Vol. 103, 2022, p. 103410. https://doi.org/10.1016/j.jtrangeo.2022.103410.

Madheshiya

Maurya

A. K.

Rai

A. K.

Enhancing Electric Vehicle Performance: A Co-Relation Study of Key Performance Parameters. IOP Conference Series: Earth and Environmental Science, Vol. 1285, No. 1, 2024, p. 012012. https://doi.org/10.1088/1755-1315/1285/1/012012.

Chen

Yang

H. K.

Song

Y. X.

Lai

W. W.

Zheng

L. L.

Thermal Management System for Stable EV Battery Operation with Composite Phase Change Materials. Physica Scripta, Vol. 99, No. 6, 2024, p. 065922. https://doi.org/10.1088/1402-4896/ad42df.

Bas

Zou

Cirillo

An Interpretable Machine Learning Approach to Understanding the Impacts of Attitudinal and Ridesourcing Factors on Electric Vehicle Adoption. Transportation Letters, Vol. 15, No. 1, 2023, pp. 30–41.

10.

Ferguson

Mohamed

Higgins

C. D.

Abotalebi

Kanaroglou

How Open are Canadian Households to Electric Vehicles? A National Latent Class Choice Analysis with Willingness-to-Pay and Metropolitan Characterization. Transportation Research Part D: Transport and Environment, Vol. 58, 2018, pp. 208–224. https://doi.org/10.1016/j.trd.2017.12.006.

11.

Zahid

Hussain

S. N.

Paudel

Sadiq

Hewage

Identification of Key Factors that Influence Electric Vehicle Adoption: A Case Study of Okanagan (Canada). SSRN 4809031, 2024.

12.

Naseri

Waygood

E. O. D.

Wang

B. B.

Patterson

Interpretable Machine Learning Approach to Predicting Electric Vehicle Buying Decisions. Transportation Research Record: Journal of the Transportation Research Board, 2023. 2677: 704–717.

13.

Tungom

C. E.

Niu

Wang

NEO: Neural Demand Prediction and Evolutionary Optimization of EV Network Charging Infrastructure. In Fuzzy Systems and Data Mining IX (A. J. Tallón-Ballesteros, and R. Beltrán-Barba, eds.), IOS Press, 2023, pp. 36–46.

14.

Deb

Tammi

Kalita

Mahanta

Review of Recent Trends in Charging Infrastructure Planning for Electric Vehicles. Wires Energy Environ, Vol. 7, No. 6, 2018, p. e306. https://doi.org/10.1002/wene.306.

15.

Gautam

Bolia

Understanding Consumer Choices and Attitudes Toward Electric Vehicles: A Study of Purchasing Behavior and Policy Implications. Sustainable Development, Vol. 32, No. 5, 2024, pp. 4895–4915. https://doi.org/10.1002/sd.2939.

16.

Wang

Eldeeb

Mohammed

Profiling Electric Vehicles Potential Markets Through a Stated Adaptation Design Space Game. Transportation Research Part D: Transport and Environment, Vol. 113, 2022, p. 103507. https://doi.org/10.1016/j.trd.2022.103507.

17.

Esteves

Alonso-Martínez

de Haro

Profiling Spanish Prospective Buyers of Electric Vehicles Based on Demographics. Sustainability, Vol. 13, No. 16, 2021, p. 9223. https://doi.org/10.3390/su13169223.

18.

Wolbertus

Kroesen

van den Hoed

Chorus

C. G.

Policy Effects on Charging Behaviour of Electric Vehicle Owners and on Purchase Intentions of Prospective Owners: Natural and Stated Choice Experiments. Transportation Research Part D-Transport and Environment, Vol. 62, 2018, pp. 283–297. https://doi.org/10.1016/j.trd.2018.03.012.

19.

Xue

Zhou

Impact of Incentive Policies and Other Socio-Economic Factors on Electric Vehicle Market Share: A Panel Data Analysis from the 20 Countries. Sustainability, Vol. 13, No. 5, 2021, p. 2928. https://doi.org/10.3390/su13052928.

20.

Long

Axsen

Gauer

V. H.

Niet

Differences in Canadian consumers’ Awareness and Preferences for Zero-Emissions Vehicles from 2013 to 2023. Transportation Research Part D: Transport and Environment, Vol. 141, 2025, p. 104633. https://doi.org/10.1016/j.trd.2025.104633.

21.

Requia

W. J.

Mohamed

Higgins

C. D.

Arain

Ferguson

How Clean are Electric Vehicles? Evidence-Based Review of the Effects of Electric Mobility on Air Pollutants, Greenhouse Gas Emissions and Human Health. Atmospheric Environment, Vol. 185, 2018, pp. 64–77. https://doi.org/10.1016/j.atmosenv.2018.04.040.

22.

Higgins

C. D.

Mohamed

Ferguson

M. R.

Size Matters: How Vehicle Body Type Affects Consumer Preferences for Electric Vehicles. Transportation Research Part A: Policy and Practice, Vol. 100, 2017, pp. 182–201. https://doi.org/10.1016/j.tra.2017.04.014.

23.

Ferguson

Mohamed

Maoh

On the Electrification of Canada’s Vehicular Fleets: National-Scale Analysis Shows that Mindsets Matter. IEEE Electrification Magazine, Vol. 7, No. 3, 2019, pp. 55–65. https://doi.org/10.1109/MELE.2019.2925763.

24.

Zarazua de Rubens

Who Will Buy Electric Vehicles After Early Adopters? Using Machine Learning to Identify the Electric Vehicle Mainstream Market. Energy, Vol. 172, 2019, pp. 243–254. https://doi.org/10.1016/j.energy.2019.01.114.

25.

Chen

C. F.

de Rubens

G. Z.

Noel

Kester

Sovacool

B. K.

Assessing the Socio-Demographic, Technical, Economic and Behavioral Factors of Nordic Electric Vehicle Adoption and the Influence of Vehicle-to-Grid Preferences. Renewable & Sustainable Energy Reviews, Vol. 121, 2020, p. 109692. https://doi.org/10.1016/j.rser.2019.109692.

26.

Liao

Molin

van Wee

Consumer Preferences for Electric Vehicles: A Literature Review. Transport Reviews, Vol. 37, No. 3, 2017, pp. 252–275. https://doi.org/10.1080/01441647.2016.1230794.

27.

Lee

Mulrow

Haboucha

C. J.

Derrible

Shiftan

Attitudes on Autonomous Vehicle Adoption using Interpretable Gradient Boosting Machine. Transportation Research Record: Journal of the Transportation Research Board. 2019. 2673: 865–878.

28.

Afandizadeh

Sharifi

Kalantari

Mirzahossein

Using Machine Learning Methods to Predict Electric Vehicles Penetration in the Automotive Market. Scientific Reports, Vol. 13, 2023, No. 1, p. 8345. https://doi.org/10.1038/s41598-023-35366-3.

29.

Statistics Canada. Census Profile, 2021 Census of Population [Online]. https://www12.statcan.gc.ca/census-recensement/2021/dp-pd/prof/index.cfm?Lang=E.

30.

Wikipedia. Greater Toronto and Hamilton Area. Wikipedia. https://en.wikipedia.org/wiki/Greater_Toronto_and_Hamilton_Area.

31.

Government of Canada. Electric Charging and Alternative Fuelling Stations Locator. April, 2024. https://natural-resources.canada.ca/energy-efficiency/transportation-energy-efficiency/electric-charging-alternative-fuelling-stationslocator-map#/find/nearest?country=CA.

32.

Breiman

Random forests. Machine Learning, Vol. 45, No. 1, 2001, pp. 5–32. https://doi.org/10.1023/A:1010933404324.

33.

Huynh

X.-P.

Park

S.-M.

Kim

Y.-G.

Detection of Driver Drowsiness Using 3D Deep Neural Network and Semi-Supervised Gradient Boosting Machine. In Computer Vision – ACCV 2016 Workshops ( Chen

C.-S.

K.-K.

, eds.), Springer International Publishing, Cham, 2017, pp. 134–145.

34.

Kuhn

Johnson

Applied Predictive Modeling. Springer, New York, NY, 2013.

35.

Kavzoglu

Teke

Predictive Performances of Ensemble Machine Learning Algorithms in Landslide Susceptibility Mapping Using Random Forest, Extreme Gradient Boosting (XGBoost) and Natural Gradient Boosting (NGBoost). Arabian Journal for Science and Engineering, Vol. 47, No. 6, 2022, pp. 7367–7385. https://doi.org/10.1007/s13369-022-06560-8.

36.

Sahin

E. K.

Assessing the Predictive Capability of Ensemble Tree Methods for Landslide Susceptibility Mapping Using XGBoost, Gradient Boosting Machine, and Random Forest. Sn Applied Sciences, Vol. 2, No. 7, 2020, p. 1308. https://doi.org/10.1007/s42452-020-3060-1.

37.

Cui

Cai

Stanley

H. E.

Comparative Analysis and Classification of Cassette Exons and Constitutive Exons. BioMed Research International, Vol. 2017, 2017, p. 7323508. https://doi.org/10.1155/2017/7323508.

38.

Chen

Guestrin

XGBoost: A Scalable Tree Boosting System. Proc., 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, 2016. https://doi.org/10.1145/2939672.2939785.

39.

Chen

Benesty

Khotilovich

Tang

Cho

Chen

Mitchell

Cano

Zhou

Xie

Lin

Geng

Yuan

Cortes

Xgboost: Extreme Gradient Boosting. R package version 0.4-2, Vol. 1, No. 4, 2015, pp. 1–4.

40.

Pudil

Novovicova

Kittler

Floating Search Methods in Feature-Selection. Pattern Recognition Letters, Vol. 15, No. 11, 1994, pp. 1119–1125. https://doi.org/10.1016/0167-8655(94)90127-9.

41.

Shapley, L. S. “17. A Value for n-Person Games”. Contributions to the Theory of Games, Volume II, edited by Harold William Kuhn and Albert William Tucker, Princeton: Princeton University Press, 1953, pp. 307–318. https://doi.org/10.1515/9781400881970-018

42.

Lundberg

S. M.

Lee

S. I.

A Unified Approach to Interpreting Model Predictions. Proc., 31st International Conference on Neural Information Processing Systems, Long Beach, CA, 2017, pp. 4768–4777.

43.

Štrumbelj

Kononenko

Explaining Prediction Models and Individual Predictions with Feature Contributions. Knowledge and Information Systems, Vol. 41, 2014, pp. 647–665.