Abstract
The current literature rarely addresses the statistical relationship between collision occurrence (objective safety) and reports of perceived unsafety at road sites (subjective safety). This study aims to analyze this relationship using social media data collected and classified from Twitter (now X), applying a Bayesian seemingly unrelated regression (SUR) framework to jointly model both safety dimensions. By analyzing collision data and road-safety-related tweets (RSTs) from Vancouver between 2017 and 2019, macro-level prediction models were developed within the SUR framework and relative risks (RRs) were derived from model parameters to identify the key risk factors. When the RRs for objective safety aligned with those for subjective safety, it indicated that subjective safety could serve as a reliable predictor of objective safety. Five collision types and 9 distinct categories of classified RSTs were employed to develop 45 Bayesian SUR models, where four groups of explanatory variables were used to test different scenarios. While there was a general alignment between subjective and objective safety overall, notable discrepancies were also found when different categories were analyzed. Near-miss observations and safety reports related to active modes were particularly aligned to collision occurrence, especially with property-damage-only collisions. Conversely, design and maintenance-related issues showed minimal alignment with collisions. Overall, the study provided a valuable resource for future risk reduction studies by showing how specific subjective safety observations can or cannot reflect real-world collision risk. This approach suggests the potential for integrating subjective reports into comprehensive safety assessments, offering key insights for policymakers to develop safety strategies more proactively.
Keywords
Two distinct terms are used to define road safety: objective and subjective. Objective safety denotes the frequency and severity of collisions occurring on the road network, while subjective safety, or perceived security, refers to how safe individuals feel while using various transportation modes and road infrastructures ( 1 ). Subjective safety is, therefore, affected not only by roadway, traffic and environmental conditions but also by personal experiences and social factors ( 2 ). A variety of collision-based and, more recently, non-collision-based measures have been proposed to objectively measure safety. While these objective metrics remain the primary source for assessments, also subjective safety should be monitored and assessed. Research on subjective road safety is limited because of the scarcity of readily available road safety surveys, which often have a small number of respondents, and can be costly to conduct. Furthermore, the literature has not extensively explored subjective safety measures, models and their interrelationship with objective measures ( 3 ).
Subjective safety can supplement objective safety studies and requires thorough monitoring, as it affects road users’ mobility, mode choice, travel frequency, and physical activity levels. Road agencies and traffic law enforcement rely on these perceptions to formulate policies that influence collisions. For instance, people are more inclined to walk or cycle in areas where they feel safe, promoting healthier lifestyles and reducing traffic congestion ( 4 ). Conversely, a lack of perceived safety can discourage the use of certain transportation modes, leading to a higher dependence on cars, which affects traffic operations and environmental sustainability ( 5 ). Addressing subjective safety can contribute to more inclusive and equitable transportation systems. By giving equal priority to subjective safety alongside objective measures, policymakers can develop comprehensive strategies to improve overall road and traffic safety.
The use of crowdsourcing on social media presents an opportunity to expand the study of subjective road safety assessments. Crowdsourcing can be more efficient than traditional survey methods and enables the aggregation of large-scale datasets ( 3 ). Crowdsourcing big data related to subjective road safety can be conducted on social media platforms like Twitter (now known as X). This platform’s archived and streaming data is available for research and business purposes through its developer account and application programming interface (API) ( 6 ). However, X/Twitter data can be unstructured and noisy, limiting its utility unless individuals’ sentiments related to safety can be accurately extracted, classified, and analyzed ( 7 ). This necessitates the use of appropriate tools for data gathering, filtering, processing, classification, and analysis. Recently, the authors developed a machine learning tool to extract, classify, and analyze subjective road safety data from Twitter, which has been employed in the current study and is further detailed in the method section ( 8 ).
To utilize subjective safety in road evaluations, it is crucial to explore thoroughly the statistical relationship between subjective and objective safety. To date, literature that attempts to do this is very scarce, in particular literature exploring the correlation between the count of collisions for a road site with the frequency of mentions reporting the site as unsafe ( 9 , 10 ). There remains a lack of rigorous studies to statistically evaluate this relationship. Most existing studies rely on small sample surveys that are difficult to validate or scale. Meanwhile, the emergence of crowdsourced data offers a new opportunity to capture road users’ perceptions and reports for use in large-scale investigations. However, this data has yet to be systematically examined alongside objective safety measures using robust statistical frameworks. This study addresses that gap by employing a methodology to quantify the alignment between subjective and objective safety across urban areas, contributing to both the development of data-driven safety tools and the broader understanding of how perceived and actual risk interact in road environments. Therefore, this study aims to conduct a rigorous statistical analysis of the relationship between road safety-related tweets (subjective safety) collected and classified from social media and collisions (objective safety).
In addition to addressing this gap, the study also contributes methodologically by applying a Bayesian seemingly unrelated regression (SUR) framework ( 11 ), which has been previously employed to test the association between collision and non-collision metrics for safety analysis ( 12 ). Bayesian SUR is superior to traditional univariate response models treating, in this context, objective and subjective safety metrics as bivariate response variables. This approach was chosen because it enables the identification of potential quantitative relationships between subgroups within these two safety dimensions. Combined with a structured machine learning pipeline for tweet classification, the methodology provides a scalable framework for integrating crowdsourced data into rigorous statistical safety analysis.
For this analysis, public road-safety-related tweets and collision data were collected for the city of Vancouver between 2017 and 2019 across different census tracts (CTs) within the same region. Macro-level prediction models for collision and non-collision metrics were developed within the SUR framework (Lovegrove and Sayed [ 42 ]; Wang et al. [ 43 ]). Relative risks (RRs) were derived from model parameters to identify the key risk factors. Where RRs for objective safety aligned with those for subjective safety, it suggested that subjective safety was a reliable indicator for predicting objective safety.
It is important to note that the intent of this study is not to replace established collision-based safety metrics but to explore how subjective safety data from social media may complement traditional data sources in contexts where perceptions of risk and road users’ safety reports offer added value.
Literature Review
Objective and Subjective Road Safety
Subjective safety encompasses the perceived feelings of safety or anxiety experienced by road users ( 11 ). While objective safety data, particularly collision data, forms the core information in road safety, the collection of subjective safety data is deemed highly important as well ( 12 ). Incorporating subjective data into collision analysis can complement traditional analysis, especially in situations where obtaining accurate information is challenging ( 13 ). Griffith et al., ( 14 ) underscored the significance of understanding and incorporating subjective safety data into policy development for the improvement of safety performance. Wang et al. ( 15 ) highlighted the importance of assessing these perceptions, particularly in diverse road environments, to gain a better understanding of their implications.
The value of subjective safety data lies in its ability to capture the human factor in road safety, reflecting behaviors, attitudes, and risk perceptions that directly affect safety outcomes. However, the objective and subjective effect of a safety countermeasure might be different; in some instances, countermeasures perceived as safe by users may not significantly reduce collision frequency, while in other cases, perceived safety might lead to a substantial reduction in collisions ( 1 , 12 ). Lawson et al. ( 16 ) found that dedicated bike lanes and pedestrian pathways, known as potential safety countermeasures, can also significantly enhance cyclists’ and pedestrians’ feelings of safety. Similarly, Teschke et al. ( 17 ) reported that well-maintained sidewalks, adequate street lighting, and visible crosswalks contribute not only to reducing collisions but also heightening the sense of safety among pedestrians. Kerr et al. ( 4 ) reported that people are more inclined to walk or cycle in areas where they feel safe but Osama et al. ( 18 ) reported that by increasing walkability and bikeability, the risk of active mode collisions increases.
A few studies have explored the statistical relationship between objective and subjective safety and the findings are inconclusive and primarily based on small samples ( 19 ). Hvoslef ( 9 ) compared the number of times a location was mentioned as dangerous by people to its collision frequency and showed that there may be a positive and direct relationship between subjective and objective safety but the strength of the relationship was weak and did not account for exposure. de Leur and Sayed ( 20 ) developed a road safety risk index based on subjective safety assessments to identify and diagnose problematic areas. The comparison between the risk index results and road safety measures derived objectively showed a statistically significant agreement between the two ( 20 ). Stülpnagel et al. ( 21 ) investigated the relationship between cyclist-vehicle crashes and subjective risk perception for different types of cycling infrastructure and speed limits. Their results showed that although objective and subjective risk indicators are generally well aligned, specific scenarios exist where cyclists underestimate the actual crash risk ( 21 ). Fuest et al. ( 22 ) developed two scoring methods to predict cyclists’ subjective safety from objective environmental information. They validated these scores by comparing them to questionnaire ratings evaluating cyclists’ subjective safety. Finally, they compared the two scores and the questionnaire ratings with objective safety measures. Interestingly, neither the scores nor the questionnaire ratings sufficiently predicted crash occurrences at the locations. However, their findings underscore the importance of subjective safety as a construct independent of objective safety ( 22 ).
Overall, this topic requires further investigation as a comprehensive study focusing on the statistical relationship between subjective and objective road safety, involving a large sample and different collision types, is still missing in the literature.
X/Twitter Data Use in Transportation
The use of social media for transportation applications is well-documented in the literature. The exploration of social media data for public safety, particularly in roadway safety, has gained increasing interest ( 23 ), presenting the potential for more comprehensive and effective data collection and analysis in this area ( 24 ). X/Twitter, in particular, has shown significant potential across various transportation research areas. Each geotagged tweet is associated with temporal and spatial data, including location, latitude and longitude coordinates, or X/Twitter place, which can be extracted using the X/Twitter application programming interface (API). By utilizing appropriate tools to gather, filter, process, classify, and analyze these spatial-temporal attributes, significant information about transportation-related topics such as traffic jams, accidents, and road conditions in specific regions can be identified ( 25 ). However, it is essential to acknowledge the challenges associated with using social media data, including issues of data reliability, the large volume of information, and the need to accurately interpret user-generated content.
X/Twitter has been utilized for identifying and classifying activity patterns ( 26 ), predicting traffic flow ( 27 ), detecting travel modes ( 28 ), identifying traffic patterns ( 29 ), managing transport information ( 30 ), detecting traffic collisions in real time ( 31 , 32 ), extracting public sentiments about transportation services ( 33 ), studying traffic safety culture ( 3 ), and assessing user perceptions toward active mobility ( 7 ). Abedi and Sacchi ( 8 ) developed a machine learning algorithm to extract, categorize, and analyze subjective data on road safety from Twitter, which was used in this study and is described in more details in the method section.
Overall, Canada has over 8 million active X/Twitter users, which ranks as the fourth most popular social media site in the country, with around 40% of online adults having an account and 36.8% visiting monthly ( 34 ). Although there are some demographic variations, research suggests that X/Twitter users are generally representative of the broader population, particularly in gender and race ( 35 ). However, it has been observed that X/Twitter users, despite forming a random sample, tend to skew toward a male demographic, with higher activity levels among young adults and a greater concentration in urban areas ( 36 ). This demographic skew may influence the types of road safety concerns that are publicly expressed, potentially underrepresenting perceptions from older, rural, or lower-income individuals. As such, the interpretation of subjective safety data derived from social media should account for this potential bias in user representation ( 37 ). Consequently, while X/Twitter users may not perfectly reflect the entire population, they provide a reasonable approximation of the North American demographic context.
Method
Subjective Safety Data Classification
The collection and classification of public road-safety-related tweets from X/Twitter was performed using the machine learning tool described in Abedi and Sacchi ( 8 ). This tool filters road-safety-related tweets from the broader dataset and classifies them into distinct subgroups based on contextual relevance. The hierarchical structure of tweet classification, along with the allocation of different tags for each subgroup, is described in Figure 1.

Classification framework for road-safety-related tweets ( 10 ). Reprinted with permission. Abedi, M. M. & E. Sacchi. Can Social Media Data Be Useful for Assessing Road Safety? An Investigation Using X/Twitter. Proceedings of the Canadian Society for Civil Engineering Annual Conference 2024, Volume 15. In: Zaki, M., Tang, Y., Robinson, C. (eds) Proceedings of the Canadian Society for Civil Engineering Annual Conference 2024, Volume 15. CSCE 2024. Lecture Notes in Civil Engineering, vol 710. Springer Nature, 2025. https://doi.org/10.1007/978-3-031-95111-4_5
In summary, the initial steps include two binary classifiers to filter tweets. Step 1 retained road-safety-related tweets, including collisions, from all tweets while Step 2 excluded tweets reporting collision events. The result of these steps produced what is defined as road-safety-related tweets (RST) excluding collision events. This distinction allows the collection and classification of subjective road safety data only. In steps 3, 4 and 5, binary classifiers were introduced to determine whether a tweet was related to a specific location and time, and whether the tweet was related to education/enforcement issues (e.g., a driver violating a traffic law) or unrelated to them (i.e., related to an infrastructure/engineering aspect). From steps 6 to 9, each tweet received up to three tags based on context: step 6 classified the type of road safety issue, categorizing tweets as near miss tweets (NMT), complaint or report tweets (CRT), or others. Step 7 identified the transportation modes mentioned in the tweet, such as vehicle mode (VMT), active mode (AMT), or others. Step 8 assigned tags related to road safety contributing factors: road user behavior (UBT), roadway features (RFT), or other factors. Finally, step 9 further classified RFT-tagged tweets into design or maintenance tweets (DMT), traffic control tweets (TCT), and environment tweets (ENT).
Also, sentiment analysis can be performed to understand users’ attitudes (positive, neutral, or negative) toward road safety across various classification groups, excluding NMT and CRT, which are inherently negative. The valence aware dictionary and sentiment reasoner (VADER) model, a rule-based tool for classifying text sentiment as positive, negative, or neutral, was utilized in this tool. Only tweets classified as negative (with scores below −0.05) were included in this study. By focusing on these negative sentiments, the study aims to identify tweets where users are experiencing dissatisfaction or danger, that is, the number of mentions reporting a site as unsafe ( 8 ).
Statistical Analysis
The statistical analysis in this study was adapted from Gordon et al. ( 38 ). A unified approach (SUR model) to the analysis of rates at the macro-level scale for collisions and RSTs (mentions reporting a site as unsafe) was employed. The SUR model provides a robust framework for analyzing systems of equations where dependent variables may be interrelated, which is critical for this study. Unlike univariate models, SUR allows for the incorporation of shared error structures between equations, improving estimation efficiency and accounting for potential interdependencies between subjective and objective safety metrics. To employ rates, exposure was incorporated into the response variable definition. Instead of fitting independent models for collision rates (Y1) and RST rates (Y2), SUR extends the usual univariate response model for collisions to a bivariate response model that includes both response variables. In a normal theory framework, SUR can be described as a system that incorporates a correlation structure between Y1 and Y2 as:
where β1 and β2 are a set of regression parameters and X1 and X2 are a set of explanatory variables. Typical explanatory variables at the macro-scale level are exposure, sociodemographic, transportation demand management, and network variables ( 39 ). As in Gordon et al. ( 38 ), exposure was also part of the explanatory variables, so converting between rates and counts is merely a matter of adjusting the model coefficients. Finally, ε1 and ε2 are normally distributed error terms belonging to vector ε:
with
where I is the identity matrix, and σ11 and σ22 denote the variances in the collision and RST regressions, respectively. The parameter σ12 = σ21 represents the covariance between Y1 and Y2. Typically, the weighted least squares (WLS) method can be used to estimate model parameters.
As reported in Gordon et al. ( 38 ), a square root transformation of the data was employed to estimate parameters in a log-linear modelling framework. This transformation aligns the model with the standard approach for analyzing rates, such as the Poisson log-linear model. Unlike the Poisson model, where the mean must equal the variance, the SUR model does not have this restriction, as it includes two parameters for variance and one for covariance. The only requirement is that the data should not be too sparse, as the WLS solution relies on asymptotic theory, meaning the number of collisions or RST should not be zero for many observations. Equations 4 and 5 describe the required transformation for both the dependent and independent variables.
Consistent with Gordon et al. ( 38 ), Bayesian estimation of model parameters was used in place of the WLS regression. The likelihood function employed and prior distributions for the parameters are reported in Equations 6 to 9:
where the matrix Σ is assumed to be fixed in the likelihood function and the regression model equation is incorporated into the first prior as the mean of a normal distribution (μ) denoted by λ Here, λ represents a linear combination of the regression parameters βj and the explanatory variables
The posterior distributions of model parameters can be sampled using Markov chain Monte Carlo (MCMC) techniques. In this study, the code RStudio IDE ( 40 ) was employed. The code produced draws from the posterior distribution of the parameters and, given those draws, the MCMC technique was used to approximate the posterior mean of the parameters from 30,000 iterations where the first 3,000 iterations were discarded as burn-in runs.
Once regression models are established and parameter estimates are obtained, relative risks (RRs) for collision and RSTs can be computed to identify which risk factors are most associated with collision occurrence. In this study, the logarithm of the relative risk (log RR) was actually employed because the sampling distribution of RRs on the log scale approximates normality. If the log RR of collisions and RSTs are similar, it is inferred that RSTs could serve as a correlated measure for collisions. Therefore, the primary focus of this analysis is on the differences in log RR: the hypothesis of whether 0 is contained within the central 95% of this distribution was tested to assess if the relative risks (RRs) of different variables of collision and RST models are comparable under specified conditions. Equation 10 was used to compute a posterior sample for the log rates:
Subsequently, log relative risks (log RRs) difference can be computed by subtracting log rates based on combinations of specific explanatory variables.
Data
Following a macro-level approach, the database was compiled by census tract areas. Census tracts (CTs) are geographic units with populations typically ranging from 2,500 to 8,000 people ( 41 ) utilized in various transportation planning-level studies ( 42 , 43 ). For this study, data was collected from the city of Vancouver, British Columbia, covering the years 2017, 2018, and 2019. Vancouver comprises 120 CTs, each identified by a unique code. After excluding areas with missing data, 114 CTs were selected for analysis. Figure 2 illustrates both the geographic area boundaries and the census tract boundaries.

Geographic study area and census tract boundaries.
The process of collecting publicly available RSTs from Twitter was accomplished with the tool described in the method section. This tool enabled the extraction of tweets (collected using the free Twitter developer account for academics, which was discontinued in 2023) across the entire study area for the selected time period, processing each CT separately and removing duplicates by cross-referencing tweet IDs. After eliminating duplicates and tweets with geographic tags covering areas larger than a census tract (CT), the total tweet count was reduced to 181,416. In the subsequent phase, 16,746 tweets were classified as road safety related, including tweets reporting collisions. Because the study focused on subjective road safety, the 2,072 tweets related to collisions were excluded and, finally, 14,674 tweets were designated as RSTs, covering all 114 CTs within the city of Vancouver. Only geo-located tweets were used, where the location selected by the user is assumed to reflect the area being discussed. This approach ensures spatial relevance, as tagging a location typically implies the content refers to that specific place.
Collision data by severity and type was obtained from the Insurance Corporation of British Columbia (ICBC) website ( 44 ) for the same time frame (2017–2019). In details, total collisions (TOT), fatal and injury collisions (FI), and property damage only collisions (PDO) were considered as severity groups. In addition, single and multiple vehicle collisions (VEH) and active transportation collisions (AC) were considered as collision types.
For each group of classified tweets, only those with an overall negative sentiment were considered. The prefix “NS” before each group’s name indicates negative sentiment. However, as mentioned earlier, for near miss tweets (NMT) and complaint or report tweets (CRT), the total count was used to calculate the tweet rate because of the inherently negative nature of these tweets. Table 1 provides a summary of the statistics for these variables.
Summary Statistics of Collision and X/Twitter Data at the Census Tract Level (2017-2019)
Note: SD = standard deviation; Min. = minimum; Max. = maximum.
Macro-level variables were extracted and compiled from two primary sources. First, Census Canada was used to extract sociodemographic, employment, and mode split data for each zone for the 2016 census year (closest year to the evaluation framework and not affected by the COVID-19 pandemic unlike the subsequent 2021 census). Second, City of Vancouver data was used to extract geocoded files of land use, road network, and zone-census tract boundaries. ArcGIS Pro was used to map the data and extract the Excel sheets for further analysis.
These variables were categorized into four distinct groups: exposure, socio-demographics, transportation demand management, and network variables. The process of selecting and dropping variables was based on two main criteria: first, their significance as macro-level modeling variables for the same area in the literature ( 39 ), and second, the availability and accessibility of the data. Based on these criteria, total lane kilometers (TLKM) and total area of each zone (AREA) were collected as potential exposure variables. In the group of sociodemographic variables, zonal family size (FS), home density (HD), zonal residents (POP), population density (POPD), average age of zonal residents (AGE), average household size (HS), and employment rate (ER) were collected. For transportation demand management, total commuters from each zone (TCM), total commuter density (TCD), commuters within the census subdivision (CWCS), and the percentage of active mode travelers (PAM) were chosen. In the last group, intersection and signal variables, including the number of signals (SIG), signal density (SIGD), number of intersections (INT), and intersection density (INTD), were listed and calculated as network variables. Table 2 lists all the selected explanatory variables along with their descriptive statistics.
Summary Statistics of Macro-Level Variables at the Census Tract Level
Note: SD = standard deviation; Min. = minimum; Max. = maximum; Ha = hectare.
Results
The results are presented using a standardized format that includes collision types, classified tweets and the comparison across scenarios. Given the number of collision types and tweet categories analyzed, readers are encouraged to refer to Table 1 when interpreting acronyms. Five collision types (TOT, FI, PDO, VEH, and ACM) and 9 distinct categories of classified RSTs (Negative sentiment-Road safety related tweets (NS-RST), NMT, CRT, Negative sentiment-Vehicle modes related tweets (NS-VMT), Negative sentiment-Active modes related tweets (NS-AMT), Negative sentiment-Road user behavior tweets (NS-RBT), Negative sentiment-Design or maintenance tweets (NS-DMT), Negative sentiment-Traffic control tweets (NS-TCT), and Negative sentiment-Environment tweets (NS-ENT)) were employed to develop 45 (5 x 9) Bayesian SUR models. Three model types with explanatory variables between collisions and RSTs were considered, and five separate scenarios were tested (total of 45 × 5 = 225 tests) to identify similarities in the two safety dimensions considered (objective and subjective).
Table 3 summarizes the different scenarios and models developed. After testing various explanatory variables, only those statistically significant in a model at the 95% confidence level were selected. Consequently, a subset of the variables listed in Table 2 was used. Between the two exposure measures collected (TLKM and AREA), TLKM produced significant results. In the group of sociodemographic variables, population density (POPD), family size (FS), and home density (HD) were found to be significant. Additionally, among the various transportation demand management variables and network variables, total commuter density (TCD) and signal density (SIGD) produced significant results, respectively.
List of Scenarios and Models Tested
By way of example, Table 4 presents the regression estimates, standard errors, p-values for the sociodemographic model using Bayesian SUR. This model represents one of the 225 Bayesian SUR models developed for this study as an example to illustrate the structure and interpretation of the regression outputs. The overall goodness of fit of the SUR model was evaluated using the McElroy-R2 statistic, where values nearing 1 typically suggest a satisfactory fit ( 45 ). The remaining models were developed and analyzed in a similar manner and were not reported for brevity. It is worth noting that moderate correlation between FS and HD was considered acceptable as dropping one would significantly reduce the model’s explanatory value: family size (FS) represents demographic composition (i.e., who resides in the household), while household density (HD) reflects living conditions, specifically the degree of crowding.
Regression Estimates for the Sociodemographic Model
Note: SD = standard deviation; TLKM = total lane kilometers; POPD = population density; FS = family size; HD = home density.
For each model, a sample from the posterior distribution of the log RR difference was generated using MCMC simulation in RStudio. This analysis tested the hypothesis introduced in the method section with results reported as either “Pass” or “Fail”. A “Pass” indicates that 0 was contained within the central 95% of the log RR difference distribution of the SUR model, signifying that the subjective safety metric accurately mimics RR in collisions. Conversely, a “Fail” indicates that 0 was not contained within the central 95% of the log RR difference distribution of the SUR model, signifying that the subjective safety metric does not mimic RR in collisions.
By way of example, Figure 3 presents the posterior distributions of the log relative risks (log RRs) for the sociodemographic model in Table 4 using population density (POPD) as control variable (scenario 1) and holding other variables constant. Figure 3 shows the posterior distribution of log RR values for the ACM model (3a) and for the NS-AMT (3b), respectively. The estimated mean log RR for ACM associated to POPD was positive and equal to 0.168, with a 95% confidence interval between 0.044 and 0.294. Similarly, the estimated mean log RR for NS-AMT associated to POPD was positive and equal to 0.262, with a 95% confidence interval between 0.136 and 0.389.

Posterior distribution of log RR for the sociodemographic model scenario 1 for (a) ACM and (b) NS-AMT.
The resulting posterior distribution of the log RR difference between ACM and NS-AMT is displayed in Figure 4. The mean of this distribution is 0.095, with the 95% confidence interval between −0.082 and 0.266. Since the interval included 0, the test result is “Pass” which suggests that NS-AMT accurately mimics ACM in this modeling scenario.

Posterior distribution of log RR difference for the sociodemographic model scenario 1.
The same process and test were conducted for each of the 225 Bayesian SUR models, encompassing all possible combinations considered in this study. The results of these tests are presented in Tables 5 to 9, corresponding to scenario number 1 to 5, respectively. In these tables, different collision groups are represented as columns, while different groups of classified tweets are shown as rows. For models with “Pass”, McElroy R squared is reported in parentheses as a measure of overall goodness of fit of the Bayesian SUR model.
Test Results for Scenario 1
Note: RST = road-safety-related tweets; TOT = total collisions; FI = fatal and injury collisions; PDO = property damage only collisions; VEH = single and multiple vehicle collisions; ACM = active mode vehicle collisions; NS = negative sentiment; NMT = near miss tweets; CRT = complaint or report tweets; VMT = vehicle modes related tweets; AMT = active modes related tweets; RBT = road users behavior tweets; DMT = design and maintenance tweets; TCT = traffic control tweets; ENT = environment tweets.
Test Results for Scenario 2
Note: RST = road-safety-related tweets; TOT = total collisions; FI = fatal and injury collisions; PDO = property damage only collisions; VEH = single and multiple vehicle collisions; ACM = active mode vehicle collisions; NS = negative sentiment; NMT = near miss tweets; CRT = complaint or report tweets; VMT = vehicle modes related tweets; AMT = active modes related tweets; RBT = road users behavior tweets; DMT = design and maintenance tweets; TCT = traffic control tweets; ENT = environment tweets.
Test Results for Scenario 3
Note: RST = road-safety-related tweets; TOT = total collisions; FI = fatal and injury collisions; PDO = property damage only collisions; VEH = single and multiple vehicle collisions; ACM = active mode vehicle collisions; NS = negative sentiment; NMT = near miss tweets; CRT = complaint or report tweets; VMT = vehicle modes related tweets; AMT = active modes related tweets; RBT = road users behavior tweets; DMT = design and maintenance tweets; TCT = traffic control tweets; ENT = environment tweets.
Test Results for Scenario 4
Note: RST = road-safety-related tweets; TOT = total collisions; FI = fatal and injury collisions; PDO = property damage only collisions; VEH = single and multiple vehicle collisions; ACM = active mode vehicle collisions; NS = negative sentiment; NMT = near miss tweets; CRT = complaint or report tweets; VMT = vehicle modes related tweets; AMT = active modes related tweets; RBT = road users behavior tweets; DMT = design and maintenance tweets; TCT = traffic control tweets; ENT = environment tweets.
Test Results for Scenario 5
Note: RST = road-safety-related tweets; TOT = total collisions; FI = fatal and injury collisions; PDO = property damage only collisions; VEH = single and multiple vehicle collisions; ACM = active mode vehicle collisions; NS = negative sentiment; NMT = near miss tweets; CRT = complaint or report tweets; VMT = vehicle modes related tweets; AMT = active modes related tweets; RBT = road users behavior tweets; DMT = design and maintenance tweets; TCT = traffic control tweets; ENT = environment tweets.
Figure 5 illustrates the cumulative number of tests with “Pass” for each category of classified RSTs using a stacked bar chart. For instance, the first stacked bar illustrates the aggregate count of successful tests ( 12 ) for NS-RST: 5 with TOT, 5 with PDO, 1 with VEH, and 1 with ACM. Notably, there were no successful tests observed between NS-RST and FI. The remaining stacked bars can be interpreted similarly.

Stacked bar chart of the total number of tests with “Pass” for each RST group by collision type.
Overall, NMT achieved the highest number of tests with “Pass”, totaling 15 and, following NMT, NS-RST recorded 12 successful tests. NS-AMT, NS-ENT, and NS-TCT had 10, 9 and 8 successful tests, respectively. Both NS-VMT and NS-RBT recorded 7 successful tests. Finally, NS-DMT did not show any successful tests with any collision types.
In the subsequent analysis, each collision type was compared to determine which category exhibited the highest number of successful tests. The findings are depicted in Figure 6, represented as a pie chart. According to this chart, PDO collisions rank first with 26 successful tests (34% of the total successful tests). ACM followed closely with 24 successful tests (32%), TOT with 14 successful tests (18%), VEH with 8 successful tests (11%), and FI with 4 successful tests (5%).

Distribution of tests with “Pass” among different collision types.
Discussion
The results from the previous section highlighted the RST groups that can accurately mimic RR in collisions. In general, the NS-RST model showed an average fidelity with different collision prediction models, achieving a “Pass” in 12 out of 25 tests. The highest fidelity was recorded for TOT and PDO) collisions, achieving a “Pass” in 5 out of 5 tests for each, even if the latter are not typically the primary focus of urban safety policies. On the contrary, the model exhibited the lowest fidelity with FI) as it did not achieve any “Pass” results. This suggests a moderate link, overall, between subjective safety, as observed on X/Twitter, and objective safety.
NMT model, which represents near miss events reported by people on X/Twitter, showed the highest fidelity with collision models, achieving a total of 15 successful tests out of 25 across various collision types. Given that traffic conflicts are the most recognizable and objectively observable events in this study, it was expected that this group would closely align with actual collisions; also, the strong correlation between different types of conflicts, measured through various approaches, and actual collisions is well known in the field ( 46 ). The strongest fidelity of the NMT model was observed with ACM, achieving 5 “Pass” results out of 5 tests. Also, a good fidelity was observed with PDO collisions (3 successful tests out of 5), TOT and VEH indicating that NMT may serve as an indicator of potential risks for several collision types.
In the context of transportation mode, NS-AMT demonstrated a significant association to ACM, achieving a remarkable 5 out of 5 successful tests. In contrast, NS-VMT showed 2 successful tests with VEH. Therefore, objective safety for active modes appears to be better captured by individuals compared with vehicle modes. This finding aligns with similar studies that used survey methods to assess people's perceptions of safety ( 47 , 48 ). It suggests that individuals are more likely to detect or report road safety issues when they are walking or biking, possibly because safety concerns are more readily perceived and shared. The reduced cognitive load of active modes compared with driving allows for greater focus on road safety matters.
NS-DMT did not show significant fidelity with any collision types, indicating limited utility as a risk measure in this context. This suggests that people’s safety perception and understanding of road and street design or maintenance-related issues is not well aligned with objective safety. The results align with similar studies showing that effective road safety countermeasures, such as speed cameras and traffic calming, are sometimes met with public skepticism because of misconceptions about their benefits ( 49 , 50 ).
The subsequent analysis focused on individual collision groups revealed that PDO collisions had the strongest overall connection with classified tweets, accounting for 34% of the total passed tests. ACM collisions followed closely at 32%, suggesting that these types of collisions are particularly well-captured by people and reflected in the sentiment expressed in RST. On the contrary, FI demonstrated the weakest fidelity with RST, passing only 5% of the tests conducted. This aligns with a similar study that found people have a weak understanding of actual safety in situations that increase the risk of FI collisions ( 51 ).
Conclusions
This study aimed to address a gap in the literature by providing a comprehensive statistical evaluation of the relationship between subjective and objective safety using crowdsourced (social media) data. The scope of the study was specifically limited to the city of Vancouver, focusing on the period between 2017 and 2019. The results revealed varying fidelity between RST groups and different types of collisions. Notably, near miss observations means the NMT group in our study (see Table 1 for instance), showcasing a robust fidelity with collisions, particularly PDO collisions. Also, safety reports related to active modes were particularly aligned to collision occurrence. This underscores the potential of utilizing crowdsourced data to identify areas at higher risk of collisions. Furthermore, PDO collisions exhibited a robust connection with classified tweets, indicating that these incidents are well captured by individuals and reflected in social media. However, this result might be less impacting for urban transportation agencies because of the lower severity and public health implications of PDO collisions.
The study revealed notable discrepancy between road design/maintenance-related safety reports and collision occurrence. Certain safety interventions perceived as effective by the public did not correspond to significant reductions in collision occurrence, and vice versa. These varying degrees of fidelity underscored the complex nature of the relationship between subjective and objective safety perceptions. These findings offer a valuable resource for guiding future risk reduction studies by demonstrating how a specific subjective safety metric, which closely mimics real-world collision risk, can or cannot indicate the effectiveness of interventions and their potential impact on collision likelihood. Also, integrating subjective safety metrics into road safety analyses enables policymakers to identify areas where road users feel unsafe, even if collision frequencies are low. Policymakers could leverage this approach to prioritize safety improvements in areas identified through subjective metrics, fostering greater equity and inclusivity in transportation planning by addressing the concerns of diverse road user groups.
Moving forward, future studies could explore the temporal dynamics of these fidelity, investigate the impact of geographic variability, and assess this approach with traditional road safety metrics to fully evaluate their practical applications and limitations. Additionally, while Twitter users broadly reflect the general population, there are possible demographic biases, such as a higher representation of younger, urban males, which need to be accounted for in analysis and interpretation. These further research efforts will help determine the full extent to which social media data can be utilized for improving road safety and developing more effective safety interventions.
In conclusion, this study contributed to the growing body of literature on the use of social media for traffic safety research by providing evidence of the relationship between subjective safety perceptions and objective safety outcomes. The findings of this study offer practical value for cities aiming to assess subjective safety perceptions and reports of road users. Urban transportation agencies often conduct community engagement activities to gather public feedback on safety concerns and potential countermeasures in specific neighborhoods. The assessment of the relationship between classified social media data and objective safety (collision frequency) can help the understanding of when perceived risk mirror a real (objective) risk. This integration might allow safety initiatives to be both data-driven and aligned with community sentiment, helping to prioritize interventions even before collisions happen.
Footnotes
Author Contributions
The authors confirm contribution to the paper as follows: study conception and design: Emanuele Sacchi; data collection: Mohammad Majid Abedi; analysis and interpretation of results: Mohammad Majid Abedi, Emanuele Sacchi; draft manuscript preparation: Mohammad Majid Abedi, Emanuele Sacchi. All authors reviewed the results and approved the final version of the manuscript.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported by a discovery grant from the Natural Sciences and Engineering Research Council of Canada (NSERC).
