Sage Journals: Discover world-class research

Abstract

This study proposes a three-stage framework for real-time crash likelihood and severity prediction. Firstly, a real-time crash likelihood prediction model was developed. Secondly, a real-time crash severity clustering model was proposed to cluster the crashes into different severity levels. Thirdly, a severity clustering validation model was developed to assess the performance of the proposed severity clustering model. Extensive data processing techniques were employed to collect real-time features from State Road 408 in Orlando, Florida, and a total of 6,750,072 events (625 crash events and 6,749,447 non-crash events) along with 24 real-time features were used. To develop the crash likelihood prediction model, nine machine-learning techniques were attempted, and the convolutional neural network model was found to provide the best result with respect to the sensitivity (0.916), false alarm rate (0.111), and area under the receiver operating characteristic curve (0.967). Davies–Bouldin index criteria were used to find the detector location that generated the most accurate traffic information to cluster the crashes into severity levels, and based on this traffic information, k-means clustering was applied to develop the severity clustering model. Finally, a severity clustering validation model was developed after investigating nine machine-learning techniques to validate the developed severity clustering model, and the decision tree model provided the best results based on three levels of sensitivity and specificity values. The developed framework has the potential to help traffic management centers to warn road users or develop transportation systems management and operations strategies in real time to avoid crashes or minimize the severity and, thus, can significantly contribute to improving road safety.

Keywords

safety transportation safety management systems safe systems

Crashes on roads have been a global problem for a long while, and this problem causes approximately 1.35 million deaths and 50 million injuries each year, thus placing a high socio-economic toll and heavy burden on health and economics around the world ( 1 – 5 ). These alarming statistics indicate the inefficiencies of reactive safety research, and thus urge researchers to explore proactive safety approaches to improve road safety. Accurate and reliable estimation and prediction of a crash before it occurs is the most challenging task in proactive safety research ( 6 ). The advancement of intelligent transportation system technologies enables researchers to obtain real-time traffic, events, weather, and other relevant parameter information related to crashes from different sources ( 7 , 8 ), and thus eases this challenge by paving the way to advance this road safety research direction.

The basic methodology to conduct crash likelihood prediction research is to collect real-time information from different sources, fuse them to assess their viability for crash likelihood prediction, and investigate the relationship among different contributing factors ( 9 – 12 ). Most of the literature on proactive road safety research has mainly focused on real-time crash prediction using statistical or machine-learning methods ( 6 , 13 , 14 ), finding the contributing factors, or assessing the relationship among these contributing factors. However, advanced research demands traffic management centers to inform road users about the possible crash likelihood and apply transportation systems management and operations (TSM&O) strategies based on the real-time crash severity level. There is also a need for a validation technique for this warning system to verify that the predicted crash severity level is accurate enough; otherwise, the TSM&O strategies might reduce the mobility, throughput, and overall efficiency of the network ( 15 , 16 ). These research gaps have motivated the authors to conduct this study. To minimize these gaps, the authors developed a real-time crash likelihood prediction and severity clustering framework to predict the crash likelihood, classify the crash events into different severity levels, and validate the classified crash severity level. The proposed methodology utilizes extensive data processing techniques and focuses on methods that are applicable in real time.

This research work first focused on developing a model to predict the crash likelihood using suitable machine-learning algorithms. Then it developed a severity clustering model to define different severity levels associated with each crash and predict the crash events to fall into a particular predefined severity level. Finally, a severity clustering validation model was developed to assess the performance of the proposed severity clustering model. This framework has the potential to guide the traffic management center and associated stakeholders to take appropriate actions to avoid any unnecessary crashes or minimize the level of crash severity in real time, and thus improve safety on roads.

Literature Review

Crash Likelihood

Numerous studies have been conducted to predict the likelihood of crash occurrences and identify crash events around the world. These studies encompassed both freeways and arterials considering the real-time parameters. The description of those studies, including the country/region, road type, year of the study, time period of the data considered in those studies, and key findings are summarized in Table 1. Table 1 indicates that those studies particularly developed and suggested different models for identifying crash and non-crash events (crash prediction models) under different scenarios, assessed the condition when the safety warning should be given, and proposed different variable selection methods. These studies also identified the key contributors or features in real-time crash prediction, investigated the relationship among different real-time parameters with crash occurrences, and compared the performances of different classification models. They also tried to identify traffic states during the crash condition, identified real-time crash characteristics under different time periods, that is, weekday and weekend, and developed time series dependency crash predictions models. These methods were all involved in developing a real-time crash prediction model. There were few attempts to develop a complete crash likelihood framework where it will simultaneously develop a crash prediction model in real time with high accuracy, classify the crashes based on different crash severity levels, and finally propose a severity clustering validation model to show the performance and accuracy of the developed crash severity model. One study that was conducted by Cheng et al. ( 17 ) attempted to propose a crash risk evaluation workflow through crash risk prediction, crash risk quantification, and crash risk classification. The study had several drawbacks, that is, only 2 weeks of data from six segments of an unknown Chinese road, 95 crashes, and a limited number of real-time parameters ( 17 ). Therefore, the current study is an attempt to minimize the gap by proposing a complete proactive crash likelihood prediction and crash severity clustering framework.

Table 1.

Key Details of the Real-Time Crash Likelihood Study

Reference	Country/Region	Year	Time	Road Type	Key Findings/Contributions
Ahmed and Abdel-Aty ( 6 )	Colorado, U.S.A.	2010–2011	13 months	Freeway (I-70)	Proposed crash prediction framework for 4 different scenarios, that is, 1. AVI, RTMS, weather stations, and roadway geometry; 2. RTMS and roadway geometry; 3. AVI and roadway geometry; 4. weather stations and roadway geometry
Shi and Abdel-Aty ( 13 )	Florida, U.S.A.	2013	8 months	State road	Viability of evaluating operation and safety was explored. Determined the appropriate conditions to trigger safety warnings. Proposed real-time monitoring of congestion and safety.
Lin et al. ( 7 )	Virginia, U.S.A.	2005	1 year	Freeway (I-64)	Proposed frequent pattern tree-based variable selection method. Frequent pattern (FP) tree-based Bayesian network model is recommended for best prediction.
Wang et al. ( 18 )	Florida, U.S.A.	2013–2015	21 months	State roads	Proposed a multilevel Bayesian logistic regression model for crashes at expressway weaving segments. Identified factors responsible for weaving segment crashes.
Basso et al. ( 19 )	Santiago, Chile	2014–2016	18 months	Freeway	Proposed real-time prediction models using disaggregate data. Conducted 300 repetitions of randomly selected partitions. Validated the models on the original unbalanced data set. Vehicle composition does not play a first-order role.
Yuan et al. ( 20 )	Florida, U.S.A.	2017	10 months	Arterial	Investigated the relationships between crash occurrence and real-time traffic and signal timing characteristics. Model based on 5–10 min interval dataset performed the best. Identified significant factors in crash occurrences in arterials. Bayesian random parameters conditional logistic model outperformed Bayesian random parameters logistic and Bayesian conditional logistic models.
Yu et al. ( 21 )	Shanghai, China	2014	1 month	Freeway	Convolutional neural network model has better classification performance compared to the traditional multi-layer perceptron (MLP) model with the tensor-based structure data.
Yuan and Abdel-Aty ( 14 )	Florida, U.S.A.	2017–2018	13 months	Arterial	Investigated the relationship between crash occurrence at signalized intersections and real-time traffic, signal timing, and weather characteristics.
Huang et al. ( 22 )	Iowa, U.S.A.	-	2 years	Freeway (I-235)	Applied deep learning algorithms including several model variants for crash detection and crash risk estimation. CNN with drop-out operation performed better. Difficult to predict the crash risk using traffic condition 10 min before a crash.
Cheng et al. ( 17 )	Shanghai, China	2018	2 weeks	Freeway	Proposed crash risk evaluation workflow through crash risk prediction, crash risk quantification, and crash risk classification. Major weakness is very small sample size.
Zheng et al. ( 23 )	California, U.S.A.	2008–2009	2 years	Freeway (I-880)	Identified optimal crash precursors for different freeway sections. Proposed threshold selection method for real-time crash risk. Downstream average speed was recommended as the best crash precursor variable.
Xu et al. ( 24 )	California, U.S.A.	2010	9 months	Freeway (I-880)	Divided freeway traffic flow into different states. Evaluated the safety performance associated with each state.
Yu and Abdel-Aty ( 25 )	Colorado, U.S.A.	2010–2011	13 months	Freeway (I-70)	Suggested SVM model with radial-basis kernel for the crash prediction using real-time parameters. Variable selection procedure is needed before model estimation. Explanatory variables have identical effects on crash occurrence for the SVM models and logistic regression models.
Theofilatos et al. ( 26 )	Greece	2008–2011	4 years	Tollway	Explored crash occurrence using real-time parameters. Suggested to utilize appropriate logistic regression models specifically designed for rare events. A negative relationship between crash occurrence and speed in crash locations.
Xu et al. ( 27 )	California, U.S.A.	2008	1 year	Freeway (I-880)	Predicted the crash likelihood at different levels of severity with particular focus on severe crashes. Traffic flow characteristics contributing to crash likelihood were quite different at different levels of severity. PDO crashes were more likely to occur under congested traffic conditions with highly variable speed and frequent lane changes. KA and BC crashes were more likely to occur under less congested traffic flow conditions.
Kwak and Kho ( 28 )	South Korea	2008–2010	3 years	Freeway	Proposed prediction models for different segment types and traffic flow states. Traffic flow characteristics leading to crashes are differed by segment type and traffic flow state.
Yang et al. ( 29 )	China	2013	1 month	Freeway	Proposed a theoretical method for identifying the optimal threshold of crash risk. Threshold by the maximum entropy method achieved the highest predictive accuracy of a crash
Li et al. ( 30 )	Florida, U.S.A.	2017–2018	13 months	Arterial	Proposed a real-time crash risk prediction model on arterials.
Parsa et al. ( 31 )	Chicago, U.S.A.	2016–2017	13 months	Freeway	Feature dependency analysis was conducted, that is speed and volume at upstream were interpreted, distance to central business district and residential area were analyzed, speed at upstream and downstream was evaluated.
Yu and Abdel-Aty ( 32 )	Colorado, U.S.A.	2006–2011	5 years and 4 months	Freeway (I-70)	Investigated different characteristics of weekday and weekend crashes. Identified the confounding factors of weekday and weekend crashes.
Zhou et al. ( 33 )	North Dakota, U.S.A.	1996–2014	18 years	Railway crossing	Compared the crash prediction accuracy between random forest and decision tree models in highway-rail grade crossings. Concluded that random forest is superior to decision tree.
Yuan et al. ( 34 )	Florida, U.S.A.	2017–2018	16 months	Arterial	Predicted real-time crash risk by considering time series dependency.
Abdel-Aty and Pemmanaboina ( 35 )	Florida, U.S.A.	1999–2002	4 years	Freeway (I-4)	Proposed a crash likelihood prediction model using real-time traffic-flow variables and rain data. Calibrated the model using archived loop detector and rain data and historical crash data.
Li and Abdel-Aty ( 36 )	Florida, U.S.A.	2019–2020	7 months	Arterial	Investigated the application of trajectory fusion to crash likelihood prediction. Temporal attention and trajectory fusion mechanism improved the prediction accuracy.

Note: AVI = automatic vehicle identification; RTMS = remote traffic microwave sensors; CNN = convolutional neural network; SVM = support vector machine; PDO = property damage only; K = fatal crash; A = incapacitating injury; B = non-incapacitating injury; C = possible injury.

Non-Crash Sample Selection

Because of the scarcity of crash-related traffic conditions compared to normal traffic conditions, crash-related traffic data suffers from a data imbalance problem. To solve this issue, previous researchers employed various resampling techniques. Most of the earlier studies selected/generated non-crash samples using the well-accepted 1:4 ( 6 ) crash:non-crash sample ratio to identify the crash events in real-time crash prediction models. A few studies used some other ratios, that is, 1:10 ( 18 ), 1:5 ( 37 ), and so on. However, there is a recent trend of generating non-crash samples using the synthetic minority over-sampling technique (SMOTE) ( 19 , 30 , 38 ). Since the SMOTE has the potential to improve the prediction accuracy of a real-time crash prediction model compared to earlier techniques ( 39 ), the current study used the SMOTE to generate non-crash samples.

Real-Time Traffic Parameter Duration Selection

Many of the earlier studies suggested that the 5–10 min traffic state before a crash is most suitable as a real-time crash precursor ( 23 , 34 , 36 ). However, many studies considered more periods, that is, 20 min ( 20 ) and 30 min ( 17 ), to explore which prior traffic state showed the most accurate performance as a real-time crash indicator. Moreover, most of these studies collected traffic information in around 30 s intervals and aggregated them to 5 min intervals to obtain the extended prior traffic state of a crash. Based on previous related work, this study uses the 5–10 min traffic data before a crash occurrence for the proposed crash likelihood framework.

Data Preprocessing and Feature Selection

Very few of the earlier proactive safety research used extensive data preprocessing and feature selection techniques. Common preprocessing techniques include removing unrealistic values, such as occupancy > 100, speed = 0 or speed > 100, flow > 25/30 s, and flow = 0 with speed > 0 ( 35 ). Some recent studies used the random forest ( 13 ), frequent pattern tree ( 7 ), k-nearest neighbor ( 7 ), graphical analysis ( 19 ), Pearson correlation ( 30 ), variable importance ( 6 ), classification and regression tree ( 25 ), Shapley additive explanation ( 31 ), principal component analysis ( 35 ), scree plot ( 35 ), conditional logistic regression (LR) ( 28 ), and extra-tree classifier ( 30 ) for preprocessing the real-time features and selecting the final important features for the models. This study used systematic and extensive data processing techniques to achieve the objectives.

Crash Prediction Models

To predict the real-time crash likelihood, investigate the relationship among different features, and separate crash events from non-crash events, different statistical or machine-learning models have been used so far. Table 2 summarizes the models previously used for predicting real-time crashes. From Table 2, it is evident that machine-learning techniques outperformed statistical techniques with respect to accuracy. The study tried nine machine-learning techniques that are popular for the classification problem and were recommended in the earlier literature for real-time crash likelihood prediction (10, 21, 25, 31, 40 –49) and proposed the most suitable one based on the model evaluation measures used in the current study.

Table 2.

Crash Prediction Models

Models used	Accuracy	Reference
Bayesian logistic regression Multilevel logistic regression Extended logistic regression Conditional logistic regression Matched case-control logit model Fixed effects logit model Random parameters logit model Binary logit model Random effects Bayesian logistic regression model	62%–76%	(13, 14, 17 –20, 23 –25, 27, 32)
Random forest Decision tree Stochastic gradient boosting Multiple additive regression trees TreeNet Xtreme gradient boosting	79%–89% accuracy with 0.16%–22.9% false alarm rate	( 6 , 31 , 33 )
k-nearest neighbor model	61.11%	( 7 )
Support vector machine	67%–81%	( 19 , 23 , 25 )
Deep learning method convolutional neural network with focal loss function Long short-term memory convolutional neural network Long short-term memory recurrent neural network Temporal attention-based long short-term memory (TA-LSTM) and convolutional neural network (CNN)	60.67%–93.2% accuracy with 9.7%–39.33% false alarm rate	( 21 , 22 , 30 , 34 , 36 )
Genetic programming technique	59%–79%	( 28 )
Bias correction method Penalized maximum likelihood estimation	NA	( 26 )

Note: NA = not available.

Nine machine-learning techniques that were used in the current study as crash prediction model are extreme gradient boosting (XGBoosting) ( 31 ), the artificial neural network ( 40 , 41 ), the random forest ( 42 ), LR ( 10 , 43 ) k-nearest neighbors ( 44 ), the decision tree (DT) ( 45 ), naive Bayes ( 41 , 46 ), gradient boosting (GB) ( 47 ), and the convolutional neural network (CNN) ( 21 , 48 ). As the current study focused on developing a three-stage framework for real-time crash likelihood and severity prediction, we recommend readers to go through the cited articles to learn the theoretical aspects of the used machine-learning algorithms.

Crash Severity Clustering

So far, k-means clustering is the only used method to classify crash likelihood severity in the literature ( 17 , 24 ). The current study attempted this method to cluster the crashes into different severity levels in real time, given a crash is likely to occur.

Severity Clustering Validation

k-means clustering is an unsupervised machine-learning technique that relies on distance functions to group datapoints together. This entails that the data used for training the clustering model does not have a concrete label and is thus difficult to validate. To the best of the knowledge of the authors, this is the first attempt to develop a severity clustering validation model to predict that a particular crash will fall into the classified crash severity levels. The study examined nine machine-learning techniques and finally proposed the best one based on performance measures.

Proposed Contributions

The study developed a real-time crash likelihood prediction and crash severity clustering framework. The proposed framework predicts crash likelihood in real time with high accuracy, classifies the crashes into different severity levels, and validates the generated severity classes to confirm the accuracy of the developed crash severity clustering model.

Methodology

Study Area

To achieve the objective of this study, a 21.4 mi segment in each direction of State Road 408 (SR-408) was selected. The purpose of selecting this road was that it is one of the major expressways for commuter travel in the downtown area of Orlando, Florida ( 38 , 50 ). Another reason to select this was its availability of real-time data sources. In particular, this road has 110 microwave vehicle detection systems (MVDSs), which are spaced around 0.5 mi apart. These MVDSs provide the speed, volume, and lane occupancy information for each lane in real time ( 51 ). SR-408 passes through major population centers and generates a reliable source of real-time information, perfectly serving the purpose of this study.

Data Preparation

Firstly, the crash data for the year 2017 was collected from Signal Four Analytics (S4A) and State Safety Office Geographic Information System (SSOGIS) on SR-408. The reason for using two crash data sources was to extract the most accurate and complete information with respect to the crashes that occurred within the study period. Then the traffic data was collected mainly from the MVDSs.

After initial data collection from the sources, a basemap was prepared for processing all the crash data. To match detector and crash data to the corresponding segments, The National Performance Management Research Data Set (NPMRDS) map was utilized as a primary route shape file. The NPMRDS map follows the Traffic Message Channel (TMC) standard to identify unique segments. Initially, the routes were divided into sub-segments using the TMC standard. However, with the TMC standard, some segments had multiple detectors in a single segment. Therefore, to measure the traffic features that can present the condition of a segment by using data from adjacent detectors, this study separated each segment based on the location of the detectors. Through basemap processing, detector data and crash data can be matched to each road segment. After preparing the basemap, traffic features, that is, speed, volume, and occupancy, were extracted from the detector information. The deployed MVDS traffic detectors update traffic features every 30 s. These 30 s data were aggregated over 5 min and 5–10 min before crash data was collected ( 9 , 10 ) from the two upstream (u/s) and two downstream (d/s) detectors for each crash, as shown in Figure 1.

Figure 1.

Data extraction from detectors.

To develop a classification model, we need to distinguish between crash and non-crash events; therefore, we need non-crash samples to train the prediction model. To extract the non-crash events, the SMOTE was used as suggested by the literature ( 38 , 52 , 53 ). After collecting all the required features, major data preprocessing was done following the methodology as shown in Figure 2. Firstly, the traffic data were imported and merged for each crash and non-crash events. Next, some new traffic features, that is, the average, standard deviation (SD), and coefficient of variation (CV), of the speed and volume were generated. The CV is defined as the ratio of the SD to the average and indicates the relative variation among traffic parameters ( 28 ). Moreover, the differences in speed (DS) between the inner and outer lanes for each detector position were calculated and added to the feature list. In addition, the congestion index (CI), a measure to reflect the congestion intensity ( 50 ), was also calculated and added to the feature list. A detector location file was subsequently created and used to filter all the data that did not have any detectors within 2 mi from the crash location. Then, the unreasonable detector values were removed, and rigorous cleaning was performed, that is, 6 h before and after events, the data of a crash were removed to avoid unstable traffic state information ( 8 ). Finally, all the processed data were merged and prepared to be used for the models.

Figure 2.

Major data preprocessing steps.

Framework Development

After the data processing steps, the crash likelihood prediction model, severity clustering model, and cluster validation model were developed. Figure 3 depicts the full framework flowchart.

Figure 3.

Methodological approach to developing the framework.

Crash Likelihood Prediction Model

Firstly, all the features that cannot be obtained in real time were removed using subject matter knowledge and data availability in real time. The dataset was split into training–testing datasets using a 70%:30% split, respectively. Since the traffic-related crash data was highly imbalanced, SMOTE resampling was applied to the training dataset. The Pearson correlation coefficients among the features were checked to avoid the multi-collinearity issue. Independent variables that had coefficient values greater than 0.5 were disregarded in the model ( 54 ). In addition, the feature importance of the variables was calculated using the random forest classifier ( 55 ). Then, the model hyperparameters were tuned to obtain the optimized results using the grid search algorithm ( 56 ). Finally, models were fitted with the optimized parameters and evaluation metrics were obtained. For the CNN model, we used the two-dimensional (2D) tensor with input shape (3780039, 4, 6, 1). The output is the same crash likelihood as the other models. Sensitivity, specificity, and the area under the receiver operating characteristic curve (ROC curve) were used as the evaluation metrics to select the final model.

Crash Severity Clustering Model

After selecting the final crash likelihood prediction model, an investigation was made to observe the severity likelihood given a crash is likely to occur. The severity levels of crashes were assessed based on the speed and volume relationship. Firstly, models were developed for all detector locations to cluster the crashes into different severity levels based on the speed–volume relationship using the k-means clustering algorithm. Here, speed and volume for a particular detector location were used as the input, and the severity level was obtained as the output. Then the detector giving the most accurate speed and volume parameters to develop the severity clustering model was chosen based on the Davies–Bouldin index (DBI) criteria. Finally, the severity clustering model was developed using the speed and volume information from the best detector location. The output of this model was different levels of severity based on the speed–volume relationship considered in this study and is further described in the Results and Discussion section. The purpose of this crash severity clustering model is not only to enable the traffic management center to inform users about the crash likelihood, but also the traffic management center can implement TSM&O strategies in real time based on the level of severity to avoid a crash or minimize the crash severity level.

Severity Clustering Validation Model

Finally, a severity clustering validation model was developed to check the accuracy of the developed severity clustering model. The purpose of this severity clustering validation model is to ensure how accurately the developed severity clustering model is in assessing the likelihood of a crash event to be severe. A total of nine machine-learning models were developed following the same methodology used to develop the crash likelihood prediction models. Here, the input variables were the same as the crash likelihood prediction models, and the output was the severity level. The final severity clustering validation model was selected based on the sensitivity and specificity values from each level of the developed severity clustering model.

Results and Discussion

Data Statistics

The final processed dataset had 6,750,072 events (625 crash events and 6,749,447 non-crash events) before resampling, and a total of 24 features. After the 70%:30% data split, the training dataset had 4,725,049 events (437 crash events and 4,724,612 non-crash events), and the testing dataset had 2,025,023 events (188 crash events and 2,024,835 non-crash events). After applying the SMOTE, the training dataset had 9,449,224 events. As shown in Table 3, the resampling step did not alter the statistical property of the data.

Table 3.

Descriptive Statistical Features of Real and Synthetic Data

		Real data				Synthetic data
Features	Sources	Min.	Max.	Mean	SD	Min.	Max.	Mean	SD
Volume at d/s detector 2	Collected from MVDS	0.20	0.95	0.68	0.10	0.47	0.83	0.67	0.05
Speed at d/s detector 2	Collected from MVDS	0.52	0.93	0.84	0.08	0.66	0.93	0.82	0.06
SD-speed-d/s detector 2	Calculated from average speed and actual speed	0.20	0.77	0.43	0.14	0.24	0.66	0.46	0.07
DS-d/s-detector 2	Difference in speed between inner and outer lanes	0	0.76	0.44	0.15	0.19	0.72	0.45	0.10
Volume at d/s detector 1	Collected directly from MVDS	0.31	0.94	0.68	0.11	0.48	0.84	0.68	0.05
Speed at d/s detector 1	Collected directly from MVDS	0.46	0.93	0.83	0.08	0.62	0.94	0.81	0.07
SD-speed-d/s detector 1	Calculated from average speed and actual speed	0.15	0.83	0.44	0.14	0.22	0.72	0.49	0.09
DS-d/s-detector 1	Difference in speed between inner and outer lanes	0	0.82	0.50	0.17	0.14	0.89	0.54	0.16
Volume at u/s detector 1	Collected directly from MVDS	0.20	0.95	0.69	0.10	0.48	0.84	0.68	0.05
Speed at u/s detector 1	Collected directly from MVDS	0.47	0.92	0.82	0.09	0.57	0.94	0.80	0.08
SD-speed-u/s detector 1	Calculated from average speed and actual speed	0.22	0.82	0.47	0.15	0.22	0.75	0.51	0.10
DS at u/s detector 1	Difference in speed between inner and outer lanes	0	0.83	0.48	0.16	0.17	0.80	0.50	0.13
Volume at u/s detector 2	Collected directly from MVDS	0.21	0.95	0.68	0.11	0.49	0.84	0.68	0.05
Speed at u/s detector 2	Collected directly from MVDS	0.48	0.92	0.83	0.08	0.60	0.94	0.81	0.07
SD-speed-u/s detector 2	Calculated from average speed and actual speed	0.21	0.84	0.47	0.15	0.24	0.71	0.50	0.10
DS-u/s-detector 2	Difference in speed between inner and outer lanes	0	0.75	0.45	0.15	0.18	0.74	0.47	0.11
CV-speed-d/s detector 2	Ratio of the standard deviation to the average	0.003	0.12	0.02	0.02	0.008	0.07	0.03	0.01
CV-speed-d/s detector 1		0.002	0.12	0.02	0.02	0.007	0.08	0.03	0.01
CV-speed-u/s detector 1		0.011	0.35	0.07	0.06	0.02	0.21	0.09	0.04
CV-speed-u/s detector2		0.009	0.35	0.07	0.06	0.01	0.21	0.08	0.04
CI at d/s detector 2	Calculated from actual speed and speed limit	0	0.85	0.25	0.23	0.03	0.73	0.32	0.19
CI at d/s detector 1		0	0.88	0.26	0.23	0.02	0.81	0.34	0.21
CI at u/s detector 1		0	0.84	0.26	0.23	0.01	0.83	0.32	0.22
CI at u/s detector 2		0	0.84	0.23	0.22	0.01	0.80	0.28	0.20

Note: d/s = downstream; SD = standard deviation; DS = differences in speed; u/s = upstream; CV = coefficient of variation; CI = congestion index; MVDS = microwave vehicle detection system; Min. = minimum; Max. = maximum.

Next, the Pearson correlation coefficient of all these 24 features were checked, as shown in Figure 4a, and variables with coefficient values greater than 0.5 were disregarded. This reduced the total number of features to 10. Then, the random forest classifier was used to observe the feature importance to understand the most important feature to predict the crash likelihood. Figure 4b shows that the speed difference between the inner lane and outer lane at all the four detector locations was found to be the strongest predictor of crash likelihood. The speed, volume, and SDs of speed in the u/s detectors were found to be important. Also, the SD of speed and CI in the d/s detectors make contributions to crash likelihood prediction.

Figure 4.

Correlation and feature importance: (a) Pearson correlation coefficients and (b) feature importance.

Crash Likelihood Prediction

Nine machine-learning classification models were used to attempt to find the most suitable techniques to predict the crash likelihood of a particular event. The most suitable model was selected based on four evaluation criteria: the sensitivity, specificity, false alarm rate, and area under the ROC curve. In this study, sensitivity represents how accurate a model is in predicting an actual crash event as a crash event. The specificity value represents how accurately a model is in predicting an actual non-crash as a non-crash event. The false alarm rate indicates the percentage of wrong non-crash events predicted as crash events. The area under the ROC curve represents how effectively a particular model distinguishes between crash and non-crash events. Table 4 shows the results of the models attempted. Most of the models had good specificity values, which indicate that they are predicting the non-crash events accurately. However, the aim of this paper is to predict crash events more accurately, and sensitivity value gives us that measure. LR, GB, and the CNN produced reasonable sensitivity values. Among these three, the CNN had the highest sensitivity values. In addition, the CNN generated competitive specificity, false alarm rate, and area under the ROC curve values. Furthermore, these evaluation results are competitive with the earlier real-time crash likelihood prediction studies shown in Table 2. Therefore, the CNN model was selected to accurately predict the crash likelihood from crash and non-crash events.

Table 4.

Results of Machine-Learning Algorithms Applied to Select the Crash Likelihood Model

Machine-learning techniques	Sensitivity	Specificity	False alarm rate	Area under the ROC curve
Extreme gradient boosting	0.393	0.999	0.00084	0.943
Artificial neural network	0.681	0.991	0.0092	0.945
Random forest	0.132	0.999	0.00008	0.677
Logistic regression	0.888	0.999	0.114	0.951
k-nearest neighbors	0.298	0.999	0.0004	0.667
Decision tree	0.117	0.999	0.00045	0.558
Naive Bayes	0.638	0.953	0.047	0.924
Gradient boosting	0.8085	0.955	0.0447	0.961
Convolutional neural network	0.916	0.889	0.1111	0.967

Note: ROC curve = receiver operating characteristic curve.

Crash Severity Clustering

The frequently used k-means clustering technique was utilized to develop the severity clustering model given a crash is likely to occur. Since there is a direct relationship between speed and volume with the crash severity ( 57 ) and the current study collected traffic information from two u/s and two d/s detectors, the speed and volume relationship was investigated for all four detectors. The initial investigation of the speed–volume relationship for the crash events revealed that there were almost no crash cases in the low-speed and low-volume conditions. Therefore, crashes were divided into three clusters in the speed–volume relationship curve, that is, high-speed and low-volume, high-speed and high-volume, low-speed and high-volume. Based on this observation, the clustering algorithm was applied to classify the crash events into these three clusters for all four detectors to identify which crash falls into which cluster. Then the DBI was estimated for each detector location using the clustered crash events. The DBI is a well-accepted measure to evaluate clustering algorithms, and it evaluates the goodness of split by the k-means algorithm for a given number of clusters. A lower DBI value indicates better separation of the clusters ( 57 , 58 ). Figure 5 shows the results from clustering at the detector locations, and it is evident that u/s detector location 2 gave the best results based on the lowest DBI value. Therefore, speed and volume information from the u/s detector at location 2 was used to define distinct levels of crash severity. For clarification, in our study, on average the detectors are spaced around 0.5 mi apart, and the study did not consider any detector information that is 2 mi away from the crash location. Further, the study defined these three speed–volume clusters as follows: the high-speed and low-volume cluster was defined as high-severity level, the high-speed and high-volume cluster was defined as medium-severity level, and the low-speed and high-volume cluster was defined as low-severity level. The rationale for such definitions is that crashes are more likely to be severe and fatal in the high-speed and low-volume condition, incapacitating and non-incapacitating injuries are expected in the high-speed and high-volume condition, and property damage and low injury are expected to occur in the low-speed and high-volume condition (3, 59 –61). Here, the crash severity levels are as per the KABCO scale, where K refers to a fatal crash, A refers to an incapacitating injury, B refers to a non-incapacitating injury, C refers to a possible injury, and O refers to property damage only ( 3 , 62 ). Finally, we proposed this crash severity clustering model to identify the severity level of a crash based on the speed and volume information collected from u/s detector location 2. The developed severity clustering model has the potential to help the traffic management center to apply TSM&O strategies based on the predicted severity level to avoid the crash or reduce the severity of the crash in real time.

Figure 5.

k-means clustering to identify crash severity levels at different detector locations: (a) upstream detector location 1 (Davies–Bouldin index [DBI] = 0.7254), (b) upstream detector location 2 (DBI = 0.6769), (c) downstream detector location 1 (DBI = 0.7244), and (d) downstream detector location 2 (DBI = 0.7157).

Severity Clustering Validation

Finally, a severity clustering validation model was further developed to check the accuracy of the proposed severity clustering model. The purpose of this severity clustering validation model is to check whether a certain crash falls into a particular severity level that is predicted by the severity clustering model. To check that accuracy, we attempted nine machine-learning techniques and proposed the most suitable techniques for this severity clustering validation model. Table 5 shows the results from the different techniques. Since there are three severity levels predicted by the severity clustering model, we evaluated the model performance for all three levels based on the sensitivity values and specificity values. Since the priority should be to avoid crashes at the high-severity level, we first checked the sensitivity values. Based on the sensitivity values, XGBoosting, the DT, and GB were found to be competitive. These three models have the same sensitivity value in the medium-severity level; however, the DT has higher values in the low-severity level. Also, the highest specificity value was found in all three levels for the DT model. Therefore, the DT model was selected as the most suitable for this severity clustering validation model. Thus, the severity clustering validation model showed that our proposed severity clustering model was performing with high accuracy to assign the crashes into particular severity levels in real time, given a crash is likely to occur.

Table 5.

Results of the Severity Clustering Validation Model

Machine-learning techniques	Sensitivity			Specificity
Machine-learning techniques	High-severity level	Medium-severity level	Low-severity level	High-severity level	Medium-severity level	Low-severity level
Extreme gradient boosting	1	0.957	0.974	0.965	1	1
Artificial neural network	0.903	0.978	0.921	0.953	0.977	0.952
Random forest	0.99	0.936	1	0.976	1	0.986
Logistic regression	0.951	0.936	0.974	0.953	1	0.966
k-nearest neighbors	0.835	0.863	0.816	0.852	0.951	0.915
Decision tree	1	0.957	1	0.976	1	1
Naive Bayes	0.806	0.872	0.974	0.929	0.930	0.912
Gradient boosting	1	0.957	0.974	0.965	1	1
Support vector classification	0.903	0.766	0.895	0.824	1	0.928

Conclusions and Future Research

This study was an attempt to minimize the research gaps in real-time safety research by proposing a framework to help traffic management centers to alert road users in real time to avoid crash likelihood or minimize crash severity. In this work, data was collected from SR-408 in Orlando, Florida, and data features with real-time measurability were selected. Several important features for crash analyses were generated, and the dataset was processed in a systematic way to remove unreasonable values and make real-time features applicable to the models. The final processed dataset had 6,750,072 events (625 crash events and 6,749,447 non-crash events) and 24 features. Then SMOTE sampling was applied to the training dataset to minimize the imbalanced impacts, and the final training dataset contained 9,449,224 events. To achieve the main objectives of this study, firstly a crash likelihood prediction model was developed to predict the likelihood of a crash occurrence in real time. Nine machine-learning techniques were attempted and, finally, the CNN was selected based on the sensitivity (0.916), specificity (0.889), area under the ROC curve (0.967), and false alarm rate (0.111) evaluation criteria. Then a severity clustering model was developed to assign the crash events into three severity levels, that is, the high-severity level, medium-severity level, and low-severity level using the k-means clustering algorithm. All four u/s and d/s detector locations were checked, and based on the DBI criteria, speed and volume information from the u/s detector at location 2 was used to develop this severity clustering model. The purpose of this model was to help the traffic management authority to warn road users about the crash severity level they are going to encounter in real time, given a crash is likely to occur. Finally, a severity clustering validation model was developed to check the accuracy of the developed severity clustering model. More specifically, this severity clustering validation model checked how accurately the crashes that are predicted to fall into a certain severity level by the severity clustering model were falling into the anticipated severity level. Nine machine-learning techniques were attempted to develop this severity clustering validation model and, based on the sensitivity and specificity value at three severity levels, the DT model was found to perform best. The sensitivity values for the high, medium, and low severity levels were found to be 1, 0.957, and 1, respectively. This validated that our proposed severity clustering model was performing with high accuracy.

The proposed framework has the potential to minimize the number of crashes on roads and improve road safety by enabling the traffic management authority to alert users in real time about the crash likelihood along with the crash severity level associated with that crash. Future studies can extend this framework and attempt to improve the accuracy of the current framework by incorporating more micro-level variables and using advanced modeling techniques.

Footnotes

Author Contributions

The authors confirm contribution to the paper as follows: study conception and design: M.R. Islam, M. Abdel-Aty, Z. Islam; data collection: M.R. Islam, Z. Islam; analysis and interpretation of results: M.R. Islam, M. Abdel-Aty, Z. Islam, A. Abdelraouf; draft manuscript preparation: M.R. Islam, M. Abdel-Aty, Z. Islam, A. Abdelraouf. All authors reviewed the results and approved the final version of the manuscript.

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iDs

Md Rakibul Islam

Mohamed Abdel-Aty

Zubayer Islam

Amr Abdelraouf

References

Centers for Disease Control and Prevention. Road Traffic Injuries and Deaths—A Global Problem. https://www.cdc.gov/injury/features/global-road-safety/index.html#:∼:text=Eachyear%2C1.35millionpeople,onroadwaysaroundtheworld.&text=Everyday%2Calmost3%2C700people,bicycles%2Ctrucks%2Corpedestrians. Accessed August 1, 2022.

World Health Organization. Road Traffic Injuries. https://www.who.int/news-room/fact-sheets/detail/road-traffic-injuries. Accessed July 15, 2022.

Islam

M. R.

Abdel-Aty

Islam

Zhang

Risk-Compensation Trends in Road Safety During COVID-19. Sustainability, Vol. 14, No. 9, 2022, p. 5057. https://doi.org/10.3390/su14095057.

Bektaş

Examining the Impact on Road Safety Performance of Socioeconomic Variables in Turkey. Transportation Research Record: Journal of the Transportation Research Board, 2022. 2676: 435–445.

Islam

M. R.

Barua

Akter

Hadiuzzaman

Haque

Impacts of Nongeometric Attributes on Crash Prediction at Urban Signalized Intersections of Developing Countries. Journal of Transportation Safety & Security, Vol. 12, No. 5, 2020, pp. 671–696. https://doi.org/10.1080/19439962.2018.1526840.

Ahmed

Abdel-Aty

A Data Fusion Framework for Real-Time Risk Assessment on Freeways. Transportation Research Part C: Emerging Technologies, Vol. 26, 2013, pp. 203–213. https://doi.org/10.1016/j.trc.2012.09.002.

Lin

Wang

Sadek

A. W.

A Novel Variable Selection Method Based on Frequent Pattern Tree for Real-Time Traffic Accident Risk Prediction. Transportation Research Part C: Emerging Technologies, Vol. 55, 2015, pp. 444–459. https://doi.org/10.1016/j.trc.2015.03.015.

Islam

M. R.

Hadiuzzaman

Barua

Shimu

T. H.

Alternative Approach for Vehicle Trajectory Reconstruction Under Spatiotemporal Side Friction Using Lopsided Network. IET Intelligent Transport Systems, Vol. 13, No. 2, 2019, pp. 356–366. https://doi.org/10.1049/iet-its.2018.5195.

Abdel-Aty

Pande

Identifying Crash Propensity Using Specific Traffic Speed Conditions. Journal of Safety Research, Vol. 36, No. 1, 2005, pp. 97–108. https://doi.org/10.1016/j.jsr.2004.11.002.

10.

Abdel-Aty

Uddin

Pande

Abdalla

M. F.

Hsia

Predicting Freeway Crashes from Loop Detector Data by Matched Case-Control Logistic Regression. Transportation Research Record: Journal of the Transportation Research Board, 2004. 1897: 88–95.

11.

Abdel-Aty

Abdalla

M. F.

Linking Roadway Geometrics and Real-Time Traffic Characteristics to Model Daytime Freeway Crashes: Generalized Estimating Equations for Correlated Data. Transportation Research Record: Journal of the Transportation Research Board, 2004. 1897: 106–115.

12.

Abdel-Aty

Uddin

Pande

Split Models for Predicting Multivehicle Crashes During High-Speed and Low-Speed Operating Conditions on Freeways. Transportation Research Record: Journal of the Transportation Research Board, 2005. 1908: 51–58.

13.

Shi

Abdel-Aty

Big Data Applications in Real-Time Traffic Operation and Safety Monitoring and Improvement on Urban Expressways. Transportation Research Part C: Emerging Technologies, Vol. 58, 2015, pp. 380–394. https://doi.org/10.1016/j.trc.2015.02.022.

14.

Yuan

Abdel-Aty

Approach-Level Real-Time Crash Risk Analysis for Signalized Intersections. Accident Analysis & Prevention, Vol. 119, 2018, pp. 274–289. https://doi.org/10.1016/j.aap.2018.07.031.

15.

Birriel

Mitchell

Sullivan

Peters

Applying Transportation Systems Management and Operations (TSMO) to Rural Areas. Federal Highway Administration, Washington, D.C., 2022.

16.

Amekudzi-Kennedy

Clark

Wilson

Singh

Transportation Performance Management for System Operations: Development of Processes, Tools, Measures and Targets. Georgia Department of Transportation, 2020.

17.

Cheng

Yuan

Zhao

Crash Risks Evaluation of Urban Expressways: A Case Study in Shanghai. IEEE Transactions on Intelligent Transportation Systems, Vol. 23, No. 9, 2022, pp. 15329–15339. https://doi.org/10.1109/TITS.2022.3140345.

18.

Wang

Abdel-Aty

Shi

Park

Real-Time Crash Prediction for Expressway Weaving Segments. Transportation Research Part C: Emerging Technologies, Vol. 61, 2015, pp. 1–10. https://doi.org/10.1016/j.trc.2015.10.008.

19.

Basso

L. J.

Bravo

Pezoa

Real-Time Crash Prediction in an Urban Expressway Using Disaggregated Data. Transportation Research Part C: Emerging Technologies, Vol. 86, 2018, pp. 202–219. https://doi.org/10.1016/j.trc.2017.11.014.

20.

Yuan

Abdel-Aty

Wang

Lee

Wang

Utilizing Bluetooth and Adaptive Signal Control Data for Real-Time Safety Analysis on Urban Arterials. Transportation Research Part C: Emerging Technologies, Vol. 97, 2018, pp. 114–127. https://doi.org/10.1016/j.trc.2018.10.009.

21.

Wang

Zou

Wang

Convolutional Neural Networks with Refined Loss Functions for the Real-Time Crash Risk Analysis. Transportation Research Part C: Emerging Technologies, Vol. 119, 2020, p. 102740. https://doi.org/10.1016/j.trc.2020.102740.

22.

Huang

Wang

Sharma

Highway Crash Detection and Risk Estimation Using Deep Learning. Accident Analysis & Prevention, Vol. 135, 2020, p. 105392. https://doi.org/10.1016/j.aap.2019.105392.

23.

Zheng

Liu

Wang

Investigating the Predictability of Crashes on Different Freeway Segments Using the Real-Time Crash Risk Models. Accident Analysis & Prevention, Vol. 159, 2021, p. 106213. https://doi.org/10.1016/j.aap.2021.106213.

24.

Liu

Wang

Evaluation of the Impacts of Traffic States on Crash Risks on Freeways. Accident Analysis & Prevention, Vol. 47, 2012, pp. 162–171. https://doi.org/10.1016/j.aap.2012.01.020.

25.

Abdel-Aty

Utilizing Support Vector Machine in Real-Time Crash Risk Evaluation. Accident Analysis & Prevention, Vol. 51, 2013, pp. 252–259. https://doi.org/10.1016/j.aap.2012.11.027.

26.

Theofilatos

Yannis

Kopelias

Papadimitriou

Impact of Real-Time Traffic Characteristics on Crash Occurrence: Preliminary Results of the Case of Rare Events. Accident Analysis & Prevention, Vol. 130, 2019, pp. 151–159. https://doi.org/10.1016/j.aap.2017.12.018.

27.

Tarko

A. P.

Wang

Liu

Predicting Crash Likelihood and Severity on Freeways with Real-Time Loop Detector Data. Accident Analysis & Prevention, Vol. 57, 2013, pp. 30–39. https://doi.org/10.1016/j.aap.2013.03.035.

28.

Kwak

H.-C.

Kho

Predicting Crash Risk and Identifying Crash Precursors on Korean Expressways Using Loop Detector Data. Accident Analysis & Prevention, Vol. 88, 2016, pp. 9–19. https://doi.org/10.1016/j.aap.2015.12.004.

29.

Yang

Wang

Quddus

Xue

How to Determine an Optimal Threshold to Classify Real-Time Crash-Prone Traffic Conditions?

Accident Analysis & Prevention, Vol. 117, 2018, pp. 250–261. https://doi.org/10.1016/j.aap.2018.04.022.

30.

Abdel-Aty

Yuan

Real-Time Crash Risk Prediction on Arterials Based on LSTM-CNN. Accident Analysis & Prevention, Vol. 135, 2020, p. 105371. https://doi.org/10.1016/j.aap.2019.105371.

31.

Parsa

A. B.

Movahedi

Taghipour

Derrible

Mohammadian

A. (Kouros)

. Toward Safer Highways, Application of XGBoost and SHAP for Real-Time Accident Detection and Feature Analysis. Accident Analysis & Prevention, Vol. 136, 2020, p. 105405. https://doi.org/10.1016/j.aap.2019.105405.

32.

Abdel-Aty

Investigating the Different Characteristics of Weekday and Weekend Crashes. Journal of Safety Research, Vol. 46, 2013, pp. 91–97. https://doi.org/10.1016/j.jsr.2013.05.002.

33.

Zhou

Zheng

Tolliver

Keramati

Accident Prediction Accuracy Assessment for Highway-Rail Grade Crossings Using Random Forest Algorithm Compared with Decision Tree. Reliability Engineering & System Safety, Vol. 200, 2020, p. 106931. https://doi.org/10.1016/j.ress.2020.106931.

34.

Yuan

Abdel-Aty

Gong

Cai

Real-Time Crash Risk Prediction Using Long Short-Term Memory Recurrent Neural Network. Transportation Research Record: Journal of the Transportation Research Board, 2019. 2673: 314–326.

35.

Abdel-Aty

M. A.

Pemmanaboina

Calibrating a Real-Time Traffic Crash-Prediction Model Using Archived Weather and ITS Traffic Data. IEEE Transactions on Intelligent Transportation Systems, Vol. 7, No. 2, 2006, pp. 167–174. https://doi.org/10.1109/TITS.2006.874710.

36.

Abdel-Aty

Real-Time Crash Likelihood Prediction Using Temporal Attention–Based Deep Learning and Trajectory Fusion. Journal of Transportation Engineering, Part A: Systems, Vol. 148, No. 7, 2022. https://doi.org/10.1061/JTEPBS.0000697.

37.

Yin

Huang

Zhang

Gao

Influence of Different Sampling Techniques on the Real-Time Crash Risk Prediction Model. Proc., IEEE Conference on Industrial Electronics and Applications (ICIEA), Xi’an, China, IEEE, New York, Jun 19, 2019, pp. 1795–1799.

38.

Islam

Abdel-Aty

Cai

Yuan

Crash Data Augmentation Using Variational Autoencoder. Accident Analysis & Prevention, Vol. 151, 2021, p. 105950. https://doi.org/10.1016/j.aap.2020.105950.

39.

Xia

Zhang

Learning Similarity with Cosine Similarity Ensemble. Information Sciences, Vol. 307, 2015, pp. 39–52. https://doi.org/10.1016/j.ins.2015.02.024.

40.

Pande

Abdel-Aty

Assessment of Freeway Traffic Parameters Leading to Lane-Change Related Collisions. Accident Analysis & Prevention, Vol. 38, No. 5, 2006, pp. 936–948. https://doi.org/10.1016/j.aap.2006.03.004.

41.

Theofilatos

Chen

Antoniou

Comparing Machine Learning and Deep Learning Methods for Real-Time Crash Prediction. Transportation Research Record: Journal of the Transportation Research Board, 2019. 2673: 169–178.

42.

Mondal

A. R.

Bhuiyan

M. A. E.

Yang

Advancement of Weather-Related Crash Prediction Model Using Nonparametric Machine Learning Algorithms. SN Applied Sciences, Vol. 2, No. 8, 2020, p. 1372. https://doi.org/10.1007/s42452-020-03196-x.

43.

Zhang

Comparing Prediction Performance for Crash Injury Severity Among Various Machine Learning and Statistical Methods. IEEE Access, Vol. 6, 2018, pp. 60079–60087. https://doi.org/10.1109/ACCESS.2018.2874979.

44.

Tang

Zhao

Real-Time Highway Traffic Accident Prediction Based on the k-Nearest Neighbor Method. Proc., International Conference on Measuring Technology and Mechatronics Automation, Vol. 3, Zhangjiajie, China, IEEE, New York, April 11–12, 2009, pp. 547-550.

45.

Singh

Kaur

Evaluation and Classification of Road Accidents Using Machine Learning Techniques. In Emerging Research in Computing, Information, Communication and Applications: ERCICA 2018 ( N.

Shetty

Patnaik

Nagaraj

Hamsavath

Nalini

, eds.), Vol. 882, Springer, Singapore, 2019, pp. 193–204.

46.

Chen

Zhang

Yang

Milton

J. C.

“Dely” Alcántara

An Explanatory Analysis of Driver Injury Severity in Rear-End Crashes Using a Decision Table/Naïve Bayes (DTNB) Hybrid Classifier. Accident Analysis & Prevention, Vol. 90, 2016, pp. 95–107. https://doi.org/10.1016/j.aap.2016.02.002.

47.

Zheng

Ren

Zhou

Keramati

Tolliver

Huang

A Gradient Boosting Crash Prediction Approach for Highway-Rail Grade Crossing Crash Analysis. Journal of Advanced Transportation, Vol. 2020, 2020, pp. 1–10. https://doi.org/10.1155/2020/6751728.

48.

Wang

Song

Behan

Jie

Shangguan

Crash Prediction for Freeway Work Zones in Real Time: A Comparison Between Convolutional Neural Network and Binary Logistic Regression Model. International Journal of Transportation Science and Technology, Vol. 11, No. 3, 2022, pp. 484–495. https://doi.org/10.1016/j.ijtst.2021.06.002.

49.

Sun

Chen

Use of Support Vector Machine Models for Real-Time Prediction of Crash Risk on Urban Expressways. Transportation Research Record: Journal of the Transportation Research Board, 2014. 2432: 91–98.

50.

Cai

Abdel-Aty

Yuan

Lee

Real-Time Crash Prediction on Expressways Using Deep Generative Models. Transportation Research Part C: Emerging Technologies, Vol. 117, 2020, p. 102697. https://doi.org/10.1016/j.trc.2020.102697.

51.

Abdelraouf

Abdel-Aty

Yuan

Utilizing Attention-Based Multi-Encoder-Decoder Neural Networks for Freeway Traffic Speed Prediction. IEEE Transactions on Intelligent Transportation Systems, Vol. 23, No. 8, 2021, pp. 11960–11969. https://doi.org/10.1109/TITS.2021.3108939.

52.

Yan

Zhang

Liu

Qiao

Zhang

Single-Vehicle Crash Severity Outcome Prediction and Determinant Extraction Using Tree-Based and Other Non-Parametric Models. Accident Analysis & Prevention, Vol. 153, 2021, p. 106034. https://doi.org/10.1016/j.aap.2021.106034.

53.

Abdel-Aty

A Hybrid Machine Learning Model for Predicting Real-Time Secondary Crash Likelihood. Accident Analysis & Prevention, Vol. 165, 2022, p. 106504. https://doi.org/10.1016/j.aap.2021.106504.

54.

Pai

C.-W.

Saleh

Modelling Motorcyclist Injury Severity by Various Crash Types at T-Junctions in the UK. Safety Science, Vol. 46, No. 8, 2008, pp. 1234–1247. https://doi.org/10.1016/j.ssci.2007.07.005.

55.

Orsini

Gecchele

Rossi

Gastaldi

A Conflict-Based Approach for Real-Time Road Safety Analysis: Comparative Evaluation with Crash-Based Models. Accident Analysis & Prevention, Vol. 161, 2021, p. 106382. https://doi.org/10.1016/j.aap.2021.106382.

56.

Shi

Wong

Y. D.

Chai

M. Z.-F.

An Automated Machine Learning (AutoML) Method of Risk Prediction for Decision-Making of Autonomous Vehicles. IEEE Transactions on Intelligent Transportation Systems, Vol. 22, No. 11, 2021, pp. 7145–7154. https://doi.org/10.1109/TITS.2020.3002419.

57.

Pande

Das

Abdel-Aty

Hassan

Estimation of Real-Time Crash Risk. Transportation Research Record: Journal of the Transportation Research Board, 2011. 2237: 60–66.

58.

Xiao

Davies Bouldin Index Based Hierarchical Initialization K-Means. Intelligent Data Analysis, Vol. 21, No. 6, 2017, pp. 1327–1338. https://doi.org/10.3233/IDA-163129.

59.

Abdel-Aty

M. A.

Abdelwahab

H. T.

Predicting Injury Severity Levels in Traffic Crashes: A Modeling Comparison. Journal of Transportation Engineering, Vol. 130, No. 2, 2004, pp. 204–210. https://doi.org/10.1061/(ASCE)0733-947X(2004)130:2(204).

60.

Prato

C. G.

Rasmussen

T. K.

Kaplan

Risk Factors Associated with Crash Severity on Low-Volume Rural Roads in Denmark. Journal of Transportation Safety & Security, Vol. 6, No. 1, 2014, pp. 1–20. https://doi.org/10.1080/19439962.2013.796027.

61.

Lee

Abdel-Aty

Presence of Passengers: Does It Increase or Reduce Driver’s Crash Potential?

Accident Analysis & Prevention, Vol. 40, No. 5, 2008, pp. 1703–1712. https://doi.org/10.1016/j.aap.2008.06.006.

62.

FHWA. KABCO Injury Classification Scale and Definitions. https://safety.fhwa.dot.gov/hsip/spm/conversion_tbl/pdfs/kabco_ctable_by_state.pdf. Accessed September 30, 2022.

Real-Time Framework to Predict Crash Likelihood and Cluster Crash Severity

Abstract

Keywords

Literature Review

Crash Likelihood

Non-Crash Sample Selection

Real-Time Traffic Parameter Duration Selection

Data Preprocessing and Feature Selection

Crash Prediction Models

Crash Severity Clustering

Severity Clustering Validation

Proposed Contributions

Methodology

Study Area

Data Preparation

Framework Development

Crash Likelihood Prediction Model

Crash Severity Clustering Model

Severity Clustering Validation Model

Results and Discussion

Data Statistics

Crash Likelihood Prediction

Crash Severity Clustering

Severity Clustering Validation

Conclusions and Future Research

Footnotes

Author Contributions

Declaration of Conflicting Interests

Funding

ORCID iDs

References