Urban Air Pollution Data Collection,Mapping,and Prediction Using Mobile Sensors Installed on Courier Trucks

Abstract

Particulate matter with a diameter of 2.5 microns or less (PM_2.5) has significant impacts on human health, making it essential to understand its spatial and temporal variations. This study focuses on developing land use regression (LUR) models and improving their performance in predicting PM_2.5 concentrations in an urban setting. In this study, air quality data were collected using a sensor on a courier truck in downtown Toronto. Extreme gradient boosting (XGBoost), a machine learning algorithm, was employed to address limitations in traditional linear regression based LUR models, incorporating predictors such as land use, meteorology, and emissions to build robust models. A total of 27 models were trained, with varying road segment lengths, predictors, and outlier treatment thresholds. Three models tested the impact of road segment length on model predictions. Eight models examined the effect of removing outliers with different thresholds, revealing that appropriate thresholds improve accuracy. Ten models assessed the addition of emission and traffic data, which did not enhance performance, likely because of overlapping effects with other predictors. In six models, time-variant predictors such as time of day, month, humidity, wind speed, temperature, and pollutant concentrations from stationary stations were included. Adding these predictors significantly improved model performance, highlighting the complex relationships in LUR models for PM_2.5 predictions and offering valuable insights for air quality assessment.

Keywords

sustainability and resilience transportation and sustainability air quality and greenhouse gas mitigation air quality modeling emissions and air quality management

Air quality stands as a critical environmental concern, affecting millions of individuals worldwide. Among the air pollutants, particulate matter with a diameter of 2.5 microns or less (PM_2.5) severely affects human health ( 1 ). Understanding spatial or spatiotemporal pollutant concentrations plays a vital role in implementing effective mitigation strategies and in developing accurate exposure surfaces to support the analysis of long-term health effects ( 2 ). Reference air quality stations provide reliable air pollutant concentration information, but they have limitations as data is available at limited points ( 1 ). To overcome these limitations, researchers use models such as geostatistical interpolation, photo-chemical dispersion, and land use regression (LUR) to assess air pollution exposure and its associated health effects in urban areas ( 3 ). While geostatistical interpolation and dispersion models are valuable, their resolution is often coarse, making them less suitable for capturing small-scale urban variability that is crucial in exposure assessment ( 4 ). In contrast, LUR models have become popular for quantifying intraurban variation in air pollutant concentrations, offering simplicity and the ability to predict fine-scale pollution variations that can reduce exposure measurement errors ( 4 , 5 ). The main components of LUR models include monitoring data, geographic predictors, model development, and validation ( 5 , 6 ). Traditionally, LUR models have relied on stationary air quality monitors, but these methods fail to capture small-scale spatial variations in pollutant concentrations. Mobile monitoring offers an alternative with better spatial coverage, though it presents challenges in addressing temporal variability ( 6 , 7 ).

Criticism of linear regression based LUR models includes limited flexibility and difficulty in incorporating highly correlated predictors ( 8 ). Studies have explored machine learning models, such as neural networks, support vector machine, and tree-based models, which improve accuracy and handle multicollinearity ( 8 , 9 ). Predictors for LUR models include land use, meteorology, pollutant emissions, and built environment characteristics. Common predictor variables are traffic, population density, land use, topography, and location ( 10 ). Understanding the impact of such predictors is crucial for improving models and capturing spatiotemporal variation. Studies have used various methods to assess LUR model performance. Common approaches include cross-validation techniques such as leave-one-out cross-validation (LOOCV), where each data point is tested once while the model is trained on the rest. Regional cross-validation (RCV) evaluates model error by leaving out data points within a specific region and predicting concentrations there ( 11 ). Some studies use external data for validation, noting that a high training R² may not correspond to a high test-set R² when estimating long-term average exposure ( 12 ).

Across the literature, different sets of LUR models, predictors, and postprocessing of data have been used to improve predicted long-term or short-term exposure surfaces. In a study by Messier et al. ( 6 ), nitric oxide (NO) and black carbon (BC) measured by mobile monitoring and a land use regression-kriging (LUR-K) was implemented on part of a dataset to predict concentration of pollutants at unobserved locations. Kerckhoffs et al. ( 13 ) compared the performance of a simple linear regression, several algorithms based on regularization, and several machine learning algorithms (random forest, boosting, and bagging) to predict long-term ultrafine particles (UFP). They showed that machine learning models had better performance for their data. In another study, Kerckhoffs et al. ( 14 ) implemented three models (supervised stepwise regression, Least Absolute Shrinkage and Selection Operator(LASSO), and random forest) to predict long-term average UFP by using a deconvoluting method to separate local UFP contributions from background concentrations. In another study by Liu et al. ( 8 ), a linear regression was compared with gradient boosting decision tree (GBDT) models. The study showed that the GBDT model captured non-linear relationships and explained more of the variations in the pollutant concentrations ( 14 ). In recent years, more complex machine learning algorithms such as artificial neural networks (ANN) have been used for prediction. In a study by Wang et al. ( 15 ), the performance of two machine learning approaches, ANN and gradient boost, were compared with simple linear regression LUR using five data segmentation schemes. Machine learning methods exhibited superior performance over a simple LUR model. In the studies conducted by Ren et al. ( 16 ) and Wong et al. ( 17 ), various machine learning methods and simple LUR models were assessed. Both studies arrived at the conclusion that the extreme gradient boosting (XGBoost) algorithm outperformed the other approaches.

LUR models are widely used in air quality research to estimate pollutant concentrations in areas lacking direct monitoring data. By incorporating spatial predictors such as land use types, traffic density, population distribution, and meteorological data, LUR models can effectively predict pollution levels beyond measurement locations (whether fixed or mobile), including off-road environments. For instance, Padhi and Kumar ( 18 ) discusses the application of LUR models in air pollution assessment, highlighting their effectiveness in capturing spatial variability. Similarly, research by Hankey and Marshall ( 19 ) demonstrates the successful modeling of on-road particulate air pollution, including particle number, black carbon, and PM_2.5, using LUR techniques based on mobile monitoring data. Their findings highlight the effectiveness of LUR models in capturing spatial variability of pollutants across urban settings. While uncertainties may arise when extrapolating to areas far from monitoring points, LUR models have been validated in diverse settings, demonstrating their robustness in capturing spatial trends across urban and suburban environments. These strengths make LUR models valuable for understanding exposure in off-road locations.

In this study, we used the XGBoost algorithm to develop multiple models with the aim of predicting concentrations of PM_2.5. To gather the data for developing the models, a sensor was installed on a courier truck operating in Toronto, Canada. The primary focus of this research is to investigate the possibility of enhancing prediction ability of the model by employing diverse predictors and addressing outliers. By constructing distinct models for varied sets of input data as predictors, we conducted comparisons to understand the impact of these predictors and modifications on the overall performance of the model. Additionally, the feasibility of utilizing delivery vehicle routes for constructing a LUR model to predict exposure surfaces was assessed. The study outcomes can be leveraged to investigate the potential of a fleet serving as a mobile monitoring platform.

Beyond contributing to the methodological development of exposure surfaces, this study offers valuable insights for urban planning, public health, and community-level air quality management. Previous research has demonstrated the value of LUR models in identifying urban air pollution hotspots and supporting interventions to mitigate health risks associated with air pollution. For instance, Hoek et al. ( 11 ) reviewed the application of LUR models in air pollution epidemiology, highlighting their effectiveness in exposure assessment and their potential to inform public health policies. Similarly, Jerrett et al. ( 20 ) used LUR models to examine the relationship between air pollution and mortality in urban areas, underscoring their role in exposure analysis and public health risk assessment. Furthermore, Lu et al. ( 21 ) discussed the application of LUR models for identifying air pollution hotspots in urban environments, emphasizing their value in supporting population health by guiding urban planning decisions. By using low-cost mobile sensors and implementing advanced LUR modeling techniques, our findings allow for a more nuanced understanding of PM_2.5 variability across urban areas, which can help inform effective strategies to reduce population exposure and promote healthier urban spaces.

Methods

Study Area

This study is set in the city of Toronto, located on the north shore of Lake Ontario in southeastern Canada and with a population of over 2.7 million, making it the largest city in the country. Toronto experiences a temperate climate with distinct seasons. This city has a relatively flat topography, especially in the downtown core and along the lakeshore. Air quality in Toronto can vary depending on several factors, including weather, traffic, industrial activity, seasonal influences, and proximity to major sources like Toronto’s municipal expressways and provincial highways or its two airports: Toronto Pearson International Airport located west of the city and Billy Bishop Toronto City Airport on the Toronto Islands. Among all the sources, traffic emissions are a major local source of air pollution in Toronto. This includes emissions from cars, trucks, buses, and other forms of transportation.

Data Collection and Preprocessing

We conducted a mobile monitoring campaign in Toronto by installing a sensor on a truck operated by Purolator, a courier and package delivery company in Toronto, to capture spatiotemporal variation of ambient PM_2.5. The mobile monitoring campaign took place between June 21, 2022, to January 11, 2023.

The sensor, manufactured by Geotab, a telematics and fleet management company, uses a Honeywell HPMA 11S50 low-cost sensor. The HPMA 11S50 sensor is capable of measuring ambient particulate matter (PM) concentrations. It can detect and quantify real-time PM_2.5 and PM₁₀. In fact, the sensor is designed to capture a broad spectrum of particulate concentrations, suitable for urban air quality monitoring. It responds rapidly to changes in PM levels, ensuring near real-time data collection. The sensor operates reliably across a wide range of temperatures and humidity conditions. It was factory-calibrated to ensure accurate and consistent measurements ( 22 , 23 ). This sensor is integrated into a Geotab unit which also includes a SGX Sensortech MiCS-4514 to collect NO₂ concentrations and Bosch Sensortech BME280 to collect temperature, humidity, and atmospheric pressure ( 9 ). Moreover, Geotab reports a wide range of vehicle telematics data such as GPS location, fuel level, and other on-board diagnostics (OBD). The sensor reports a record whenever a property changes, for example whenever PM_2.5 concentration changes, a new record is reported. The instruments were calibrated by the manufacturer before installation. To minimize the risk of self-pollution from the truck’s exhaust emissions, the PM_2.5 sensor was strategically placed with the airflow intake positioned on the top of the truck’s right window, facing outward. This positioning was chosen to avoid direct exposure to exhaust gases and ensure that the sensor primarily measured ambient air quality rather than pollutants emitted by the vehicle itself.

To assess the reliability of the sensor readings over time, we compared PM_2.5 concentrations recorded by the mobile sensor with data from a near-reference monitoring station located along College Street, a busy arterial road on the campus of the University of Toronto, during periods when the truck operated near this station (within a 200-m radius). This station uses a Thermo Scientific 5030 SHARP Synchronized Hybrid Ambient Real-time Particulate Monitor; its location is presented in Figure 1. This comparison helped verify the consistency and accuracy of the mobile measurements, providing validation for the sensor's stability and accuracy throughout the study period.

Figure 1.

Map featuring the locations of monitoring stations in Toronto. The near-reference monitoring stations (Hanlan’s Point (HN) Station and University of Toronto (UofT) Station) are pinpointed along with the four regulatory monitoring stations operated by the Ontario Ministry of Conservation and Parks (Toronto Downtown Station, Toronto East Station, Toronto North Station, and Toronto West Station).

The concentration of PM_2.5 was recorded for a total of 56,577 data points while the truck was in operation. The truck was operating mostly between 7 a.m. and 7 p.m. and 20% of the data points were recorded during the morning (7 to 11 a.m.), 41% around midday (11 a.m. to 3 p.m.), 31% in the afternoon (3 to 7 p.m.), and the remaining were recorded outside those hours. The truck mostly operated in the downtown area. During the campaign, the truck was operational for approximately 300 h over the 6-month period, as a result of variable daily operational hours of the Purolator delivery vehicle, which did not consistently follow an 8-h schedule. However, the PM_2.5 sensor was active for only 157 h. This discrepancy arises from instances where drivers inadvertently forget to turn on the sensor. These factors led to reduced data collection time over the 6-month period. Another observation from our data analysis is the presence of records for 5,767 distinct minutes (equivalent to 96 h). This indicates that the sensor didn't continuously report data throughout its 157-h active time. The gaps in the data correspond to moments when the sensor did not report new values given the absence of significant concentration changes.

Each data point consists of a pair of x.y coordinates associated with a PM_2.5 concentration. For this reason, we averaged all observations corresponding with each road segment, defining those segments using different methods. In the first method, the observations were aggregated for each road segment (defined from one intersection to another) from a road network shapefile for Toronto. Then the center of each road was identified, and buffers were created around each point to compute model predictors. As a result, all data points were assigned to approximately 1050 road segments. In the second and third methods, segments were defined at every 50-m and 100-m interval, with the average concentration computed for all readings associated with these segments. With a 50-m aggregation, the total number of segments is 1,673, and with a 100-m aggregation, it is 705. These three methods were used to evaluate the impact of spatial aggregation. We used the 50-m segment as the base case against which all models will be compared since it refers to the highest resolution. These three methods of aggregation have been employed by other studies ( 4 , 6 , 24 ).

Generation of Predictor Variables

Several LUR models were developed using different sets of predictors to evaluate the impact of choosing predictors on the performance of LUR models. We divided our predictors into three categories including specific to a location predictor, traffic related predictors, as well as weather and air quality related predictors which can change temporally. To calculate predictors, varying buffer sizes (50, 100, 200, 250, 300, 500, 1000, 2000 m) were created around each segment mid-point (based on the three methods for data aggregation). The calculation of predictors involved the calculation of area, length, or number of land use types and traffic related variables within each of these buffers, computation of distances to important places in the city from the buffers’ centers, and temporally changing variables (weather and regional air quality).

Predictors that accurately explain emissions or traffic can affect LUR models and their effectiveness in estimating pollution concentrations ( 8 ). In this study, LUR models were created to assess the contrasting effects of utilizing two data sources for traffic and emissions. In one model, traffic volume and emissions for nitrogen oxides (NO_x) from the GTAModel, an activity-based model for the Greater Toronto Area which uses the Equilibrium Multi-Modal Environment (EMME) traffic assignment package, were included as predictors ( 25 , 26 ). Then, four models were developed by including annual average daily traffic (AADT) data instead of emission data. The AADT data used in this study was developed by Ganji et al. ( 27 ) by using traffic count data to calculate AADT instead of a travel demand model. The AADT data was improved by Ganji et al. ( 28 ) using aerial imagery. Four models were trained with AADT data from four different years (2017, 2018, 2019, 2020).

To address temporal variability of PM_2.5 concentrations, several models were trained by adding new predictors. For this purpose, we utilized data from two near-reference monitoring stations, one situated at the University of Toronto and the other one near Lake Ontario, at Hanlan’s point, south of the city. These monitoring stations provided environmental variables including humidity, wind speed, temperature, regional PM_2.5, and gaseous pollutants such as NO_x (nitrogen oxides), NO (nitric oxide), and NO₂ (nitrogen dioxide). We also added time of day as three categories (morning, midday, and afternoon), and day of week and month for examining the possibility of capturing daily, weekly and seasonal trends in PM_2.5. All the predictors, their categories, and type of measurement are detailed in Table 1.

Table 1.

Summary of Predictors Categorized into Land Use-Related Variables (e.g., Area, Population, Proximity), Traffic-Related Variables (e.g., Road Lengths, Traffic Counts, Emissions), and Temporally Changing Variables (e.g., Time of Day, Month, Weather Conditions)

Attribute (Name: ### is buffer size and && is name of stationary station)	Type of measurement
Land use related variables
Parking lots areas (###Parking)	Area within buffer per area of buffer
Commercial areas (###COMMERC)	Area within buffer per area of buffer
Governmental areas (###GOVERNM)	Area within buffer per area of buffer
Industrial areas (###_indust)	Area within buffer per area of buffer
Open area (###OPEN)	Area within buffer per area of buffer
Residential area (###RESIDEN)	Area within buffer per area of buffer
Waterbody area (###WATERBO)	Area within buffer per area of buffer
Parks (###PARKS)	Area within buffer per area of buffer
Road area (###RoadAre)	Area within buffer per area of buffer
Distance to the lake (lak_dis)	Closest Distance from Lake Ontario in meters
Population (###populat)	Population of people in each buffer per 100000 m²
Distance to Billy Bishop Toronto City Airport (ISairport_)	Closest Distance from Toronto Island Airport
Accommodation points (###N_Accom)	Number of points in buffers
Chimney (###N_Chimn)	Number of points in buffers
Gas station(###N_Gstat)	Number of points in buffers
Restaurant(###Restaur)	Number of points in buffers
Traffic related variables
Length of highway in a buffer (###L_Hway)	Length in a buffer in meters
Length of railway in a buffer (###L_Rway)	Length in a buffer in meters
Length of major roads in a buffer (###L_Mway)	Length in a buffer in meters
Length of bus lines (###L_Bus)	Length in a buffer in meters
length of all-road in a buffer (###Length)	Length in a buffer in meters
Closest distance from a point to the closest highway (Hway_dis)	Closest Distance from a point to the closest highway
Closest distance from a point to the closest major road (Mway_dis)	Closest Distance from a point to the closest major road
Closest distance from a point to the closest rail line (Rline_dist)	Closest Distance from a point to the closest rail line
Number of intersection (###N_inter)	Number of points in buffers
Number of traffic signal (###N_light)	Number of points in buffers
Number of bus stops (###N_Busst)	Number of points in buffers
Daily and yearly NOx (###AV_NOX, and ###TOT_NOX) from EMME data	Emission per area of buffer
Daily and yearly traffic (###AV_VOL, ###TOT_VO) from EMME data	Traffic count per area of buffer
AADT for four different years (### Y2017, ###Y2018, ###Y2019, and ###Y2020)	Traffic count per area of buffer
Temporally changing variables
Time (time_of_day)	Three categories (morning, evening, afternoon)
Month (month)	Month of year that data has been collected
Weekday	Day of week (Monday to Sunday)
Pollutants from stations (NO_&&, NO2_&&, NOx_&&,PM2.5_SHARP_&&)	Pollutant concentrations
Wind speed, relative humidity, and temperature (WS1_&&, RH1_&&, ATEM1_&&)	Weather conditions from two stationary stations

Note: NO_x = nitrogen oxides; EMME = Equilibrium Multi-Modal Environment; AADT = annual average daily traffic.

Land-Use Regression Modelling

In this study, the XGBoost model, which is a tree-based machine learning method, was trained using different sets of predictors to predict long-term PM_2.5 exposure surfaces and to study the impact of using different sets of predictors on the model performance. The XGBoost model can capture non-linear relations between predictors ( 15 , 16 ). The importance of predictors can be extracted from this model using different feature importance measures. In this study, the Python libraries’ scikit-learn and XGBoost were used ( 29 ). The grid search method was used to tune our models. After tuning and choosing the best model, we used a feature importance method to assess the impact of different features on model output and predictive performance.

To assess model accuracy, the dataset was partitioned into training and testing sets, with 20% allocated for testing. The model was trained and evaluated at the same time using a 10-fold cross-validation (CV) approach in combination with grid search for hyperparameter tuning. In fact, by using this method, the training data set was divided into 10 subsets, and the model was trained and evaluated 10 times. Each time, one of the 10-folds is used for testing the model and other folds for training. The model performance was averaged for all these 10 models to provide an overall assessment of its accuracy and generalization. After finding the best models and their hyperparameters, the models were tested by using the 20% test set which the model had not seen during training and tuning. The best coefficient of determination (R²) values for the test set and 10-fold CV were compared for the different models.

Finally, we conducted a validation analysis by comparing the models’ predicted PM_2.5 concentrations with the average of measurements during the sampling period from four regulatory monitoring stations in Toronto, operated by the Ontario Ministry of Conservation and Parks (excluding the near-reference station used for sensor comparison); the locations of these stations are identified in Figure 1. We computed the accuracy of model predictions, calculated as $Accuracy (%) = 100 \times (1 - \frac{(| Measured Value - Modeled Value |)}{(Measured Value)})$ .

The impact of deleting outliers on XGBoost performance was evaluated. To evaluate the effect of deleting outliers on the performance of LUR models, we compared our models when we delete outliers larger and less than specific percentiles. We deleted PM_2.5 concentrations when they were larger than 99.7th, 99th, and 95th percentiles and less than 5th (all zero records), 10th, and 20th. We trained eight models by using various combinations of these thresholds, removing data exceeding or falling below the specified values as necessary and for some models, we exclusively utilized either the upper or lower threshold for data removal. Three models involved the removal of data greater than the 99th, 99.7th, and 95th percentiles (referred as Outlier_1, 2, 3). Another set of three models focused on removal of data greater than the 95th percentile and less than the 5th percentile, greater than the 95th percentile and less than the 10th percentile, and greater than the 95th percentile and less than the 20th percentile (referred as Outlier_4, 5, 6, respectively). Additionally, one model exclusively removed data falling below the 5th percentile (all zeros) (Outlier_7), while another model removed data greater than the 99.7th and less than the 5th percentiles (Outlier_8).

To assess the influence of incorporating emission and traffic count predictors on the performance of the model, we extended our analysis. In one model, we incorporated both emission and traffic count data from the GTAModel. Additionally, we included traffic count data as AADT for four different years (2017, 2018, 2019, 2020) into four separate models. These predictor sets were integrated not only into the base case model, but also into the model configuration where we removed data points less than 5th percentile (remove all zeros) and all data points greater than the 95th percentile. We also tested the impact of adding emission and traffic count predictors to the Outlier_4 model.

To handle and capture temporal variation in the concentration of the pollutants in the models, we have assessed several models with different predictors. In one model (named model_time) time of day represented as three categorical variables (morning, midday, afternoon) was considered. In another model (named as day_month) day of week (Monday to Sunday) and month (1–12) were added as predictors. Moreover, four different models (named as Station_1 to 4) were developed by adding data from stationary stations as predictors. In the Station_1 model, hourly average relative humidity, wind speed and weather temperature from two stations (HN Station and UofT Station) were added to the base case as predictors; in the Station_2 model, hourly average relative humidity, wind speed and ambient temperature along with NO, NO₂, NOx and PM_2.5 concentrations from the two stations were added as predictors; in the Station_3 model, all the predictors from Station_2 model using data less than 95th percentile and higher than 5th percentile were used; and for Station_4 model, we added time of day as three categories to the setting of Station_3 model.

Including the different methods of data aggregation, use of predictors, and treatment of outliers, we end up with 27 models, summarized in Figure 2.

Figure 2.

Overview of the 27 models categorized by key factors and represented in distinct colors: models evaluating (1) road segment length (orange), (2) outlier treatment (green), (3) emission and traffic data incorporation (blue), (4) combined emission/traffic data with outlier treatment (red), and (5) temporally changing predictors (yellow).

Results and Discussion

Sensor Performance

The comparison between the PM_2.5 values recorded by the mobile sensor and the regulatory monitoring station yielded an R² value of 0.63. This suggests reasonable alignment between the mobile and stationary measurements, though variability is still present. A significant factor influencing this correlation is the number of records per minute and the proximity of the truck to the station, as only records collected within a 200-m radius of the station were included in this comparison. Minutes with a higher density of second-by-second records tend to show averaged readings closer to the station's data, as more frequent measurements reduce random fluctuations (Figure 3). Conversely, minutes with fewer records or greater distances may be more affected by transient conditions, leading to less consistent alignment with the stationary measurements.

Figure 3.

Scatter plot of PM_2.5 concentrations recorded by the mobile sensor versus the University of Toronto monitoring station, with color indicating the number of measurements per minute for the sensor installed on the truck. Station records represent average PM_2.5 concentration per minute. The dashed line (y = x) represents perfect agreement.

Descriptive Statistics

Our data analysis shows an average PM_2.5 concentration of 9.56 µg/m³ with a standard deviation of 14.34 µg/m³, ranging from zero to a maximum of 1096 µg/m³, suggesting potential measurement errors at high levels. The high standard deviation underscores the variability in air quality within downtown Toronto, reinforcing the importance of effective analysis to capture variability. Figure 4a illustrates the box plot for PM_2.5 concentrations for each sampling day. The data reveals significant variation between and within days, without a clear trend between summer and fall.

Figure 4.

(a) Distribution of PM_2.5 measurements. (b) Average PM_2.5 concentration per road segment.

Figure 4b displays the distribution of average PM_2.5 concentrations on road segments. This figure shows the complex spatial variability of PM_2.5, which may be influenced by both spatial and temporal factors, considering that not all segments were sampled simultaneously. Based on Figure 4b, the concentration of PM_2.5 in areas closer to the Toronto downtown core exhibit elevated concentrations when compared with regions farther from the city center, such as the eastern part of the city. Figure 4, a and b , provides evidence that pollution levels change within short distances and time periods, affected by weather, local emissions, and the layout of the city. These small-scale urban variabilities can be captured using LUR models ( 5 ).

Base Model

The base case uses a 50-m road segment aggregation, chosen as the benchmark for comparison. This fine resolution increases data points for training and captures spatial variability. It serves as a foundation for evaluating other aggregation methods. The R² for the base-case model, yielded a value of 0.249 for the test dataset and 0.251 for the 10-fold CV. The SHapley Additive exPlanations (SHAP) summary plot for the base model (Figure 5) highlights key predictors influencing PM_2.5 concentrations, such as road area in small buffers, proximity to major roads, and intersection density. These predictors align with the concentration sensitivity to localized factors, demonstrating its ability to capture the intricate impacts of urban infrastructure and traffic emissions on air quality.

Figure 5.

SHapley Additive exPlanations (SHAP) summary plot for top 30 features of the base model ranked by their importance. Every point represents a SHAP value, reflecting the influence of a feature on the model output for a specific observation. The color of each dot indicates the value of the corresponding independent feature, ranging from low (blue) to high (red). Higher SHAP values for features suggest elevated levels of PM_2.5.

Variations on Base Model

The results of the 27 different models tested, which included three data aggregation methods, eight outlier treatment methods, five local traffic emissions data sources, five local traffic emissions data sources applied with an outlier treatment method, and six models with temporal variables, are presented in Figure 6. The R² values for the test set and 10-fold CV for these models are reported.

Figure 6.

Overview of the 27 models tested for PM_2.5 prediction, categorized by data aggregation methods (orange), outlier treatments (green), traffic and emission data impact on models (blue), traffic and emission data impact with outlier treatment (red), and temporally varying predictors (yellow). R² values are reported for the test set and 10-fold cross-validation (CV).

The two segment aggregation methods (100-m and original segments) were compared with base case to determine their impact on model performance. The 100-m aggregation method performed slightly better than both the 50-m (base case) and the original road segment aggregation for model accuracy. However, the original road segment aggregation, while not as accurate as the 100-m approach, provided more realistic representation of the spatial variability inherent to road networks. These results highlight the importance of testing various spatial aggregation resolutions, as each method balances accuracy with the ability to represent spatial structures and patterns effectively.

We tested the impact of different threshold values for outlier removal on the performance of our XGBoost models for predicting PM_2.5 concentration. The threshold values are highlighted in Figure 4a to illustrate the range of data points affected by each outlier removal method. In the first three models (Outlier_1, Outlier_2, and Outlier_3), where we employed outlier removal criteria targeting data points greater than the 99th, 99.7th, and 95th percentiles, the model’s performance exhibited a slight improvement compared with the base case. However, in the third model utilizing the 95th percentile threshold, there is a smaller change in the R² of the 10-fold CV. This can be the result of removing data points for some days with high concentration that do not qualify as outliers for that day. For the models where we removed datapoints less than the 10th percentile or 20th percentile threshold (Outlier_5 and Outlier_6), the R² decreased, possibly because correctly recorded data points (not true outliers) may have been removed. As shown in Figure 4, these two methods (removing datapoints less than the 10th percentile or 20th) may result in the removal of most records for some days. In our data, the 5th percentile of the data is equal to zero. For three models, we used the 5th percentile and lower threshold to remove data points. When we just use lower thresholds (Outlier_7 model), the R² diminished slightly. The results for the Outlier_4 and Outlier_8 models, where we had upper thresholds for 95th and 99.7th percentile, respectively, and zeros were removed, show that the R² for these two models were slightly better than other ones. However, the model results show that removing outliers does not significantly improve performance in predicting PM_2.5 concentrations.

The results of the models that we created by adding local traffic and emission predictors show no improvement, possibly because the impact of traffic had been captured by predictors that we had in the base case model such as road density and road typology.

The results of the models with temporally varying predictors show that by adding day of week and month as predictor, the performance of the model improved significantly. By adding humidity, temperature, and wind speed, the R² increased from 0.26 to 0.69 which shows the importance of these variables that can capture temporal variability of the model for predicting PM_2.5, and by adding concentration of pollutants from stationary stations, the R² increased to 0.753. Moreover, the results for the Station_3 and Station_4 models demonstrate that by treating outliers and adding weather and regional air quality predictors to the model (Station_2), the performance of LUR models for PM_2.5 can be improved substantially.

The best-performing model, Station_4 includes additional features with larger buffer sizes of 500 m, 1000 m, and 2000 m to capture regional impacts. These larger buffers provide a broader spatial context that complements the finer-scale variability captured by smaller buffers, resulting in a balanced representation of both local and regional influences on PM_2.5 concentrations. This approach also mitigates concerns about spatial autocorrelation by capturing regional trends and accounting for spatial dependencies in the model. The results of this approach, shown in Figure 7, represent the PM_2.5 concentration surface for afternoon hours (3 to 7 p.m.), where the inclusion of larger buffer sizes highlights regional pollution patterns across the study area.

Figure 7.

Predicted PM_2.5 concentration for the City of Toronto using the Station_4 model during afternoon hours (3 to 7 p.m.).

In Figure 7, the PM_2.5 surface using the Station_4 model is used to predict PM_2.5 for the entire city, including regions beyond the area that the data was originally collected (beyond downtown area). As expected, higher concentrations are predicted along major roads and highways because of increased emissions. This figure demonstrates that the model effectively predicts pollutant concentrations in locations outside the original data collection area.

Validation Against Observations at Reference Stations

The results of the comparison of the Station_4 model predictions against observations at the four reference stations operated by the Ministry of Environment, Conservation, and Parks are presented in Figure 8. They demonstrate reasonable consistency between the predicted and observed values across different times of day and locations. For instance, in the afternoon, the accuracy of the model’s predictions ranged from 84.6% in Toronto Downtown to 95.5% in Toronto North. Similar patterns were observed during the morning and midday periods, with accuracies exceeding 88% for most comparisons. This consistency highlights the model's ability to extend its predictions beyond the immediate road network, leveraging broad spatial predictors such as land use and meteorology to capture regional trends effectively. While limitations remain in areas far removed from roadways, the updated surface incorporates larger buffer sizes (500 m, 1000 m, and 2000 m), enhancing the model's capability to estimate off-road concentrations more reliably. These findings support the robustness of our approach for predicting spatial pollution patterns across diverse urban areas.

Figure 8.

Validation of the Station_4 model against measurements at four regulatory monitoring stations in Toronto. Panel (a) shows the observed (measured) and predicted (modeled) PM_2.5 concentrations (in µg/m³) across stations for each time of day (morning, midday, afternoon). Panel (b) presents the calculated accuracy (%) of the model predictions.

Discussion and Conclusion

A mobile monitoring campaign, utilizing a sensor-equipped truck operated by a courier company, demonstrates a novel approach to capture spatiotemporal variations in PM_2.5 concentration. The study's timeframe, from June 21, 2022, to January 11, 2023, allows for a nuanced analysis of seasonal influences on air quality. This study investigates factors influencing PM_2.5 concentration predictions using LUR models by developing several different models using different sets of predictors. The findings highlight the complex interplay of factors influencing PM_2.5 levels, including meteorological conditions, traffic emissions, and land use variables. Considering temporal factors significantly enhanced the model's performance, as evident from the comparison of the day_month model with the base case. The incorporation of day of the week and month as predictors resulted in a substantial improvement in the R² for both the test set and the 10-fold CV. This improvement suggests that distinct patterns exist throughout the week, possibly influenced by varying human activities, traffic density, and meteorological conditions. Moreover, this improvement suggests that accounting for temporal variations, such as day-of-week patterns and seasonal differences, contributes valuable information to the model. For instance, weekdays might exhibit different pollution patterns compared with weekends, and seasonal variations could be crucial in understanding the impact of weather conditions on PM_2.5 concentrations. This finding aligns with previous research emphasizing the impact of temporal variations on air quality and emphasizes the need for comprehensive spatiotemporal models ( 30 , 31 ).

The study also investigated the impact of different spatial aggregation methods on model performance, providing insights into the impact of aggregation method on model prediction. The comparison of three aggregation methods—road segment shapefile, 50-m, and 100-m distanced points—showed that the 100-m resolution performed slightly better than the other points. The results imply that choosing an appropriate spatial aggregation method is crucial for achieving a more accurate representation of local air quality patterns. So, for developing an LUR model, testing several resolutions might help to improve the model’s accuracy which is aligned with the findings of the study by Hankey and Marshall ( 19 ).

The study evaluates the impact of outlier removal on model performance, revealing trade-offs associated with different thresholds. Models that deleted outliers (errors in recording) while retaining non-outlier records demonstrated improvement in the performance compared with models with extensive data removal. For instance, the Outlier_4 model, which focused on removing data greater than the 95th percentile while keeping lower concentrations, exhibited a slightly better R² for both the test set and 10-fold CV. This indicates that careful outlier handling, preserving valid data points, contributes to a more robust and accurate LUR model as has been shown in previous studies ( 32 , 33 ). However, this study shows that it is essential to strike a balance, as overly aggressive outlier removal may lead to the loss of crucial information and compromise model performance.

The inclusion of predictors related to emissions and traffic counts presented mixed results. While the addition of predictors from EMME and AADT data did not show significant improvement in the base case model, it is crucial to note that the base case model already included relevant predictors related to land use, traffic, and location. The lack of substantial improvement suggests that the existing set of predictors in the base case model adequately captured the spatial variability caused by traffic and emissions in PM_2.5 concentrations. This finding underscores the importance of carefully selecting predictors based on the study context and existing knowledge of local sources of pollution.

The models incorporating data from stationary monitoring stations as predictors exhibited a remarkable increase in performance. The introduction of temporal predictors, such as time of day and month, as well as data from stationary monitoring stations, enhances the ability of models to capture temporal variation. The marked improvement in R² values for models incorporating temporal variables highlights the significance of considering time-related factors in predicting PM_2.5 concentrations.

In this study, we relied on cross-validation to evaluate our models’ performance because of the limited availability of independent data for validation. While this approach provides a useful measure of model accuracy, we acknowledge that random cross-validation can potentially result in overfitting, especially when using datasets with limited spatial diversity. Future research should aim to include independent datasets or temporal validation to further test the robustness of the model and ensure its transferability across space.

In conclusion, this analysis shows that taking temporal, spatial, and outlier-related factors into account can help develop accurate LUR models for predicting PM_2.5 concentrations. The results show that the integration of innovative monitoring techniques, thoughtful predictor selection, and exploration of outlier impact could be used to improve LUR model accuracy. Our findings underscore the need for a multidimensional approach that considers both spatial and temporal factors to enhance the accuracy of models for PM_2.5 concentrations.

Footnotes

Author Contributions

The authors confirm contribution to the paper as follows: study conception and design: Junshi Xu, Marianne Hatzopoulou, Matthew Roorda; data collection: Milad Saeedi, Junshi Xu, Usman Ahmed; analysis and interpretation of results: Milad Saeedi, Marianne Hatzopoulou; draft manuscript preparation: Milad Saeedi, Marianne Hatzopoulou, Matthew Roorda. All authors reviewed the results and approved the final version of the manuscript.

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Funding for this project was provided by the Natural Sciences and Engineering Research Council of Canada (NSERC) under grant ALLRP 555620 - 20 (“City Logistics for the Urban Economy”), Purolator Inc., Geotab, and the City of Toronto.

ORCID iDs

Milad Saeedi

Junshi Xu

Usman Ahmed

Matthew Roorda

Marianne Hatzopoulou

References

Simon

M. C.

Hudda

Naumova

E. N.

Levy

J. I.

Brugge

Durant

J. L.

Comparisons of Traffic-Related Ultrafine Particle Number Concentrations Measured in Two Urban Areas by Central, Residential, and Mobile Monitoring. Atmospheric Environment, Vol. 169, 2017, pp. 113–127. https://doi.org/10.1016/j.atmosenv.2017.09.003

Van De Beek

Kerckhoffs

Hoek

Sterk

Meliefste

Gehring

Vermeulen

Spatial and Spatiotemporal Variability of Regional Background Ultrafine Particle Concentrations in the Netherlands. Environmental Science and Technology, Vol. 55, No. 2, 2021, pp. 1067–1075. https://doi.org/10.1021/acs.est.0c06806

Shah

R. U.

Robinson

E. S.

Apte

J. S.

Marshall

J. D.

Robinson

A. L.

Presto

A. A.

Socio-Economic Disparities in Exposure to Urban Restaurant Emissions Are Larger than for Traffic. Environmental Research Letters, Vol. 15, No. 11, 2020, p. 114039. https://doi.org/10.1088/1748-9326/abbc92

Apte

J. S.

Messier

K. P.

Gani

Brauer

Kirchstetter

T. W.

Lunden

M. M.

Marshall

J. D.

Portier

C. J.

Vermeulen

R. C. H.

Hamburg

S. P.

High-Resolution Air Pollution Mapping with Google Street View Cars: Exploiting Big Data. Environmental Science and Technology, Vol. 51, No. 12, 2017, pp. 6999–7008. https://doi.org/10.1021/acs.est.7b00891

Yuan

Kerckhoffs

Shen

de Hoogh

Hoek

Vermeulen

Integrating Large-Scale Stationary and Local Mobile Measurements to Estimate Hyperlocal Long-Term Air Pollution Using Transfer Learning Methods. Environmental Research, Vol. 228, 2023, p. 115836.

Messier

K. P.

Chambliss

S. E.

Gani

Alvarez

Brauer

Choi

J. J.

Hamburg

S. P.

, et al. Mapping Air Pollution with Google Street View Cars: Efficient Approaches with Mobile Monitoring and Land Use Regression. Environmental Science and Technology, Vol. 52, No. 21, 2018, pp. 12563–12572. https://doi.org/10.1021/acs.est.8b03395

Yuan

Kerckhoffs

Hoek

Vermeulen

A Knowledge Transfer Approach to Map Long-Term Concentrations of Hyperlocal Air Pollution from Short-Term Mobile Measurements. Environmental Science and Technology, Vol. 56, No. 19, 2022, pp. 13820–13828. https://doi.org/10.1021/acs.est.2c05036

Liu

Chen

Wei

Nonlinear Relationship between Urban Form and Street-Level PM2.5 and CO Based on Mobile Measurements and Gradient Boosting Decision Tree Models. Building and Environment, Vol. 205, 2021, p. 108265. https://doi.org/10.1016/j.buildenv.2021.108265

Horn

(2019, A. 9). Mobile AirQuality Measurement Project Studies Hyper-Local Air Quality in Aachen with IoT. GeoTab Blog. https://www.Geotab.Com/Blog/Air-Quality/.

10.

Huang

Duan

Chen

Wang

Zhou

Rao

Effect of Urban Morphology on Air Pollution Distribution in High-Density Urban Blocks Based on Mobile Monitoring and Machine Learning. Building and Environment, Vol. 219, 2022, p. 109173. https://doi.org/10.1016/j.buildenv.2022.109173

11.

Hoek

Beelen

de Hoogh

Vienneau

Gulliver

Fischer

Briggs

A Review of Land-Use Regression Models to Assess Spatial Variation of Outdoor Air Pollution. Atmospheric Environment, Vol. 42, No. 33, 2008, pp. 7561–7578.

12.

Kashima

Yorifuji

Sawada

Nakaya

Eboshida

Comparison of Land Use Regression Models for NO2 Based on Routine and Campaign Monitoring Data from an Urban Area of Japan. Science of the Total Environment, Vol. 631–632, 2018, pp. 1029–1037. https://doi.org/10.1016/j.scitotenv.2018.02.334

13.

Kerckhoffs

Hoek

Portengen

Brunekreef

Vermeulen

R. C. H.

Performance of Prediction Algorithms for Modeling Outdoor Air Pollution Spatial Surfaces. Environmental Science and Technology, Vol. 53, No. 3, 2019, pp. 1413–1421. https://doi.org/10.1021/acs.est.8b06038

14.

Kerckhoffs

Hoek

Gehring

Vermeulen

Modelling Nationwide Spatial Variation of Ultrafine Particles Based on Mobile Monitoring. Environment International, Vol. 154, 2021, p. 106569. https://doi.org/10.1016/j.envint.2021.106569

15.

Wang

Saleh

Hatzopoulou

Potential of Machine Learning for Prediction of Traffic Related Air Pollution. Transportation Research Part D: Transport and Environment, Vol. 88, 2020, p. 102599. https://doi.org/10.1016/j.trd.2020.102599

16.

Ren

Georgopoulos

P. G.

Comparison of Machine Learning and Land Use Regression for Fine Scale Spatiotemporal Estimation of Ambient Air Pollution: Modeling Ozone Concentrations across the Contiguous United States. Environment International, Vol. 142, 2020, p. 105827. https://doi.org/10.1016/j.envint.2020.105827

17.

Wong

P. Y.

Lee

H. Y.

Zeng

Y. T.

Chern

Y. R.

Chen

N. T.

Candice Lung

S. C.

H. J.

Da Wu

Using a Land Use Regression Model with Machine Learning to Estimate Ground Level PM2.5. Environmental Pollution, Vol. 277, 2021, p. 116846. https://doi.org/10.1016/j.envpol.2021.116846

18.

Padhi

B. K.

Kumar

Application of Land Use Regression (LUR) Models in Air Pollution Assessment. In Air Quality and Human Health ( Padhy

P. K.

Niyogi

Patra

P. K.

Hecker

, eds.), Springer Nature Singapore, Singapore, 2024, pp. 79–86.

19.

Hankey

Marshall

J. D.

Land Use Regression Models of On-Road Particulate Air Pollution (Particle Number, Black Carbon, PM2.5, Particle Size) Using Mobile Monitoring. Environmental Science and Technology, Vol. 49, No. 15, 2015, pp. 9194–9202. https://doi.org/10.1021/acs.est.5b01209

20.

Jerrett

Burnett

R. T.

Arden Pope

Krewski

Newbold

K. B.

Thurston

, et al. Spatial Analysis of Air Pollution and Mortality in Los Angeles. Epidemiology, Vol. 16, No. 6, 2005, pp. 727–736. https://doi.org/10.1097/01.ede.0000181630.15826.7d

21.

Bechle

M. J.

Wan

Presto

A. A.

Hankey

Using Crowd-Sourced Low-Cost Sensors in a Land Use Regression of PM2.5 in 6 US Cities. Air Quality, Atmosphere and Health, Vol. 15, No. 4, 2022, pp. 667–678. https://doi.org/10.1007/s11869-022-01162-7

22.

Báthory

Kiss

M. L.

Palotas

A. B.

Reliability of Particulate Matter Sensor Operation during Uncomfortable Weather Conditions. In 9th European Combustion Meeting 2019, 2019.

23.

Honeywell International Inc. Honeywell HPM Series Particulate Matter Sensors. Honeywell International Inc., Richardson, TX, 2021.

24.

Adams

M. D.

Massey

Chastko

Cupini

Spatial Modelling of Particulate Matter Air Pollution Sensor Measurements Collected by Community Scientists While Cycling, Land Use Regression with Spatial Cross-Validation, and Applications of Machine Learning for Data Correction. Atmospheric Environment, Vol. 230, 2020, p. 117479. https://doi.org/10.1016/j.atmosenv.2020.117479

25.

Chatzis

Managing Traffic Complexity. Canadian Transport Planning Software Package Emme, 1970s–2010s. Journal of Transport History, Vol. 42, No. 3, 2021, pp. 444–466. https://doi.org/10.1177/00225266211005753

26.

Miller

E. J.

Vaughan

King

Austin

Implementation of a “Next Generatio”, Activity-Based Travel Demand Model: The Toronto Case. In Presentation at the Travel Demand Modelling and Traffic Simulation Session of the 2015 Conference of the Transportation Association of Canada, 2015.

27.

Ganji

Shekarrizfard

Harpalani

Coleman

Hatzopoulou

Methodology for Spatio-Temporal Predictions of Traffic Counts across an Urban Road Network and Generation of an on-Road Greenhouse Gas Emission Inventory. Computer-Aided Civil and Infrastructure Engineering, Vol. 35, No. 10, 2020, pp. 1063–1084. https://doi.org/10.1111/mice.12508

28.

Ganji

Zhang

Hatzopoulou

Traffic Volume Prediction Using Aerial Imagery and Sparse Data from Road Counts. Transportation Research Part C: Emerging Technologies, Vol. 141, 2022, p. 103739. https://doi.org/10.1016/j.trc.2022.103739

29.

Pedregosa

Varoquaux

Gramfort

Michel

Thirion

Grisel

Blondel

, et al. Scikit-Learn: Machine Learning in Python. Journal of Machine Learning Research, Vol. 12, 2011, pp. 2825–2830.

30.

Qin

Yang

Liu

Cao

Zou

Jin

Zhang

Duan

Potential for Developing Independent Daytime/Nighttime LUR Models Based on Short-Term Mobile Monitoring to Improve Model Performance. Environmental Pollution, Vol. 268, 2021, p. 115951.

31.

Mölter

Lindley

de Vocht

Simpson

Agius

Modelling Air Pollution for Epidemiologic Research - Part II: Predicting Temporal Variation through Land Use Regression. Science of the Total Environment, Vol. 409, No. 1, 2010, pp. 211–217. https://doi.org/10.1016/j.scitotenv.2010.10.005

32.

Basu

Alam

M. S.

Ghosh

Gill

McNabola

Augmenting Limited Background Monitoring Data for Improved Performance in Land Use Regression Modelling: Using Support Vector Regression and Mobile Monitoring. Atmospheric Environment, Vol. 201, 2019, pp. 310–322. https://doi.org/10.1016/j.atmosenv.2018.12.048

33.

Araki

Shimadera

Yamamoto

Kondo

Effect of Spatial Outliers on the Regression Modelling of Air Pollutant Concentrations: A Case Study in Japan. Atmospheric Environment, Vol. 153, 2017, pp. 83–93. https://doi.org/10.1016/j.atmosenv.2016.12.057