Abstract
Estimating crash severity is crucial for reducing fatality and ensuring road safety; achieving accurate estimation is challenging because of the significant class imbalance and complexity in crash data, which leads to parameter bias and overfitting in the models. To address these challenges, this study employs a novel generative model for data augmentation, the Variational AutoEncoder with Bayesian Gaussian Mixture (VAE–BGM). This model integrates the strengths of Bayesian inference and autoencoder techniques to effectively manage data imbalance and the complexity of mixed data types within crash severity estimation. The VAE–BGM is evaluated using traditional crash-related variables and real time data from adjacent vehicle detector data. The analysis focuses on the receiver operating characteristic–area under the curve (ROC–AUC) to evaluate performance regardless of classification thresholds. The results demonstrate that VAE–BGM yields consistent improvements in the performance of crash severity models compared with the other data augmentation methods. The VAE–BGM achieved the highest average ROC–AUC value (0.813), and other augmentation methods achieved 0.707–0.784. Feature importance analysis identifies the crash type, cause, and nearby traffic volumes as key factors, underscoring the importance of incorporating on-site vehicle detector information in the crash severity model. This study advances methodological approaches in traffic safety analysis and offers an in-depth analysis of the factors influencing crash severity on highways by combining traditional crash-related variables with on-site vehicle detector data.
Keywords
Get full access to this article
View all access options for this article.
