Sage Journals: Discover world-class research

Abstract

Estimating crash severity is crucial for reducing fatality and ensuring road safety; achieving accurate estimation is challenging because of the significant class imbalance and complexity in crash data, which leads to parameter bias and overfitting in the models. To address these challenges, this study employs a novel generative model for data augmentation, the Variational AutoEncoder with Bayesian Gaussian Mixture (VAE–BGM). This model integrates the strengths of Bayesian inference and autoencoder techniques to effectively manage data imbalance and the complexity of mixed data types within crash severity estimation. The VAE–BGM is evaluated using traditional crash-related variables and real time data from adjacent vehicle detector data. The analysis focuses on the receiver operating characteristic–area under the curve (ROC–AUC) to evaluate performance regardless of classification thresholds. The results demonstrate that VAE–BGM yields consistent improvements in the performance of crash severity models compared with the other data augmentation methods. The VAE–BGM achieved the highest average ROC–AUC value (0.813), and other augmentation methods achieved 0.707–0.784. Feature importance analysis identifies the crash type, cause, and nearby traffic volumes as key factors, underscoring the importance of incorporating on-site vehicle detector information in the crash severity model. This study advances methodological approaches in traffic safety analysis and offers an in-depth analysis of the factors influencing crash severity on highways by combining traditional crash-related variables with on-site vehicle detector data.

Keywords

crash data augmentation crash severity generative artificial intelligence variational autoencoder Bayesian Gaussian mixture

Get full access to this article

View all access options for this article.

References

Chen

Huang

Yang

Chen

Analyzing Factors That Influence Expressway Traffic Crashes Based on Association Rules: Using the Shaoyang–Xinhuang Section of the Shanghai–Kunming Expressway as an Example. Journal of Transportation Engineering, Part A: Systems, Vol. 146, No. 9, 2020, p. 05020007.

World Health Organization. Road Traffic Injuries. December 13, 2023. https://www.who.int/news-room/fact-sheets/detail/road-traffic-injuries. Accessed July 29, 2024.

Rezapour

Moomen

Ksaibati

Ordered Logistic Models of Influencing Factors on Crash Injury Severity of Single and Multiple-Vehicle Downgrade Crashes: A Case Study in Wyoming. Journal of Safety Research, Vol. 68, 2019, pp. 107–118.

Xiong

Tobias

J. L.

Mannering

F. L.

The Analysis of Vehicle Crash Injury-Severity Data: A Markov Switching Approach with Road-Segment Heterogeneity. Transportation Research Part B: Methodological, Vol. 67, 2014, pp. 109–128.

Jeong

Jang

Bowman

P. J.

Masoud

Classification of Motor Vehicle Crash Injury Severity: A Hybrid Approach for Imbalanced Data. Accident Analysis & Prevention, Vol. 120, 2018, pp. 250–261.

Lamba

Alsadhan

Hsu

Fitzsimmons

Newmark

Coping with Class Imbalance in Classification of Traffic Crash Severity Based on Sensor and Road Data: A Feature Selection and Data Augmentation Approach. Computer Science & Information Technology (CS & IT), 2019, pp. 125–137.

Lee

Hellinga

Saccomanno

Real-Time Crash Prediction Model for Application to Crash Prevention in Freeway Traffic. Transportation Research Record, Vol. 1840, No. 1, 2003, pp. 67–77.

Ogungbire

Pulugurtha

S. S.

Effectiveness of Data Imbalance Treatment in Weather-Related Crash Severity Analysis. Transportation Research Record, Vol. 2678, No. 11, 2024, pp. 88–105.

Ashraf

M. T.

Dey

Mishra

Identification of High-Risk Roadway Segments for Wrong-Way Driving Crashes Using Rare Event Modeling and Data Augmentation Techniques. Accident Analysis & Prevention, 2023, Vol. 181, p. 106933.

10.

Mei

Cuomo

An Analytic Framework Using Deep Learning for Prediction of Traffic Accident Injury Severity Based on Contributing Factors. Accident Analysis & Prevention, 2021, Vol. 160, p. 106322.

11.

Zhao

Kunar

Birke

Chen

L. Y.

CTAB-GAN: Effective Table Data Synthesizing. Presented at the Asian Conference on Machine Learning, Proceedings of the 13th Asian Conference on Machine Learning, Vol. 157, 2021, pp. 97–112.

12.

Kaplan

Prato

C. G.

Risk Factors Associated with Bus Accident Severity in the United States: A Generalized Ordered Logit Model. Journal of Safety Research, Vol. 43, No. 3, 2012, pp. 171–180.

13.

Wang

Shi

Abdel-Aty

Predicting Crashes on Expressway Ramps with Real-Time Traffic and Weather Data. Transportation Research Record, Vol. 2514, No. 1, 2015, pp. 32–38.

14.

Vajari

M. A.

Aghabayk

Sadeghian

Shiwakoti

A Multinomial Logit Model of Motorcycle Crash Severity at Australian Intersections. Journal of Safety Research, Vol. 73, 2020, pp. 17–24.

15.

Yuan

Yang

Guo

Rasouli

Gan

Ren

Risk Factors Associated with Truck-Involved Fatal Crash Severity: Analyzing Their Impact for Different Groups of Truck Drivers. Journal of Safety Research, Vol. 76, 2021, pp. 154–165.

16.

Santos

Dias

J. P.

Amado

A Literature Review of Machine Learning Algorithms for Crash Injury Severity Prediction. Journal of Safety Research, Vol. 80, 2022, pp. 254–269.

17.

Tang

Liang

Han

Huang

Crash Injury Severity Analysis Using a Two-Layer Stacking Framework. Accident Analysis & Prevention, Vol. 122, 2019, pp. 226–238.

18.

Islam

Abdel-Aty

Cai

Yuan

Crash Data Augmentation Using Variational Autoencoder. Accident Analysis & Prevention, Vol. 151, 2021, p. 105950.

19.

Islam

M. R.

Abdel-Aty

Islam

Abdelraouf

Real-Time Framework to Predict Crash Likelihood and Cluster Crash Severity. Transportation Research Record, Vol. 2678, No. 1, 2024, pp. 202–217.

20.

Yang

Xing

Yuan

Liu

Yang

Crash Injury Severity Prediction Considering Data Imbalance: A Wasserstein Generative Adversarial Network with Gradient Penalty Approach. Accident Analysis & Prevention, Vol. 192, 2023, p. 107271.

21.

Ding

Sze

N. N.

Antoniou

Guo

A Crash Feature-Based Allocation Method for Boundary Crash Problem in Spatial Analysis of Bicycle Crashes. Analytic Methods in Accident Research, Vol. 37, 2023, p. 100251.

22.

Abdel-Aty

Real-Time Crash Likelihood Prediction Using Temporal Attention–Based Deep Learning and Trajectory Fusion. Journal of Transportation Engineering, Part A: Systems, Vol. 148, No. 7, 2022, p. 04022043.

23.

Guo

Wijnands

J. S.

Stevenson

Assessing Injury Severity of Secondary Incidents Using Support Vector Machines. Journal of Transportation Safety & Security, Vol. 14, No. 2, 2022, pp. 197–216.

24.

Ding

Sze

N. N.

Chen

Guo

Lin

A Deep Generative Approach for Crash Frequency Model with Heterogeneous Imbalanced Data. Analytic Methods in Accident Research, Vol. 34, 2022, p. 100212.

25.

Basso

Pezoa

Varas

Villalobos

A Deep Learning Approach for Real-Time Crash Prediction Using Vehicle-by-Vehicle Data. Accident Analysis & Prevention, Vol. 162, 2021, p. 106409.

26.

Yahaya

Guo

Jiang

Bashir

Matara

Ensemble-Based Model Selection for Imbalanced Data to Investigate the Contributing Factors to Multiple Fatality Road Crashes in Ghana. Accident Analysis & Prevention, Vol. 151, 2021, p. 105851.

27.

Zhu

Analysis of the Severity of Vehicle-Bicycle Crashes with Data Mining Techniques. Journal of Safety Research, Vol. 76, 2021, pp. 218–227.

28.

Song

Kou

Wang

Modeling Crash Severity by Considering Risk Indicators of Driver and Roadway: A Bayesian Network Approach. Journal of Safety Research, Vol. 76, 2021, pp. 64–72.

29.

Jia

Conditional Temporal GAN for Intent-Aware Vessel Trajectory Prediction in the Precautionary Area. Engineering Applications of Artificial Intelligence, Vol. 126, 2023, p. 106776.

30.

Wang

Min

Zhao

Wang

Data-Augmentation-Based Cellular Traffic Prediction in Edge-Computing-Enabled Smart City. IEEE Transactions on Industrial Informatics, Vol. 17, No. 6, 2020, pp. 4179–4187.

31.

Chen

Zheng

Wen

Ding

Guo

A Novel Generative Adversarial Network for Improving Crash Severity Modeling with Imbalanced Data. Transportation Research Part C: Emerging Technologies, Vol. 164, 2024, p. 104642.

32.

Skoularidou

Cuesta-Infante

Veeramachaneni

Modeling Tabular Data Using Conditional GAN. Proc., 33rd International Conference on Neural Information Processing Systems, Vol. 32, 2019, pp. 7335–7345.

33.

Korea Road Traffic Authority. Traffic Accident Analysis System (TAAS). https://taas.koroad.or.kr/web/shp/mik/main.do?menuId=WEB_KMP. Accessed July 29, 2024.

34.

Kingma

D. P.

Welling

Auto-Encoding Variational Bayes. ArXiv Preprint arXiv:1312.6114, 2013.

35.

Blei

D. M.

Jordan

M. I.

Variational Inference for Dirichlet Process Mixtures. Bayesian Anal, Vol. 1, No. 1, 2006, pp. 121–143.

36.

Apellániz

P. A.

Parras

Zazo

An Improved Tabular Data Generator with VAE-GMM Integration. ArXiv Preprint arXiv:2404.08434, 2024.

37.

Cai

Abdel-Aty

Yuan

Lee

Real-Time Crash Prediction on Expressways Using Deep Generative Models. Transportation Research Part C: Emerging Technologies, Vol. 117, 2020, p. 102697.

38.

Basso

L. J.

Bravo

Pezoa

Real-Time Crash Prediction in an Urban Expressway Using Disaggregated Data. Transportation Research Part C: Emerging Technologies, Vol. 86, 2018, pp. 202–219.

39.

Arhin

S. A.

Gatiba

Predicting Crash Injury Severity at Unsignalized Intersections Using Support Vector Machines and Naive Bayes Classifiers. Transportation Safety and Environment, Vol. 2, No. 2, 2020, pp. 120–132.

40.

Chawla

N. V.

Bowyer

K. W.

Hall

L. O.

Kegelmeyer

W. P.

SMOTE: Synthetic Minority Over-Sampling Technique. Journal of Artificial Intelligence Research, Vol. 16, 2002, pp. 321–357.

41.

Bai

Garcia

E. A.

ADASYN: Adaptive Synthetic Sampling Approach for Imbalanced Learning. 2008 IEEE International Joint Conference on Neural Networks, 2008, pp. 1322–1328.

42.

Bhowmik

Yasmin

Eluru

A New Econometric Approach for Modeling Several Count Variables: A Case Study of Crash Frequency Analysis by Crash Type and Severity. Transportation Research Part B: Methodological, Vol. 153, 2021, pp. 172–203.

43.

Min

J. H.

Shin

Kim

D. K.

Multimodal Deep Learning for Estimating Lane-Level Urban Traffic by Fusing Closed-Circuit Television and Dedicated Short-Range Communication Data. Transportation Research Record, Vol. 2678, No. 9, 2024, pp. 495–508.

44.

Choudhary

Imprialou

Velaga

N. R.

Choudhary

Impacts of Speed Variations on Freeway Crashes by Severity and Vehicle Type. Accident Analysis & Prevention, Vol. 121, 2018, pp. 213–222.

45.

Yang

Wang

Y.-p.

Yuan

Z.-z.

Yin

Y.-h.

Guo

M.-z.

Identification of Dynamic Traffic Crash Risk for Cross-Area Freeways Based on Statistical and Machine Learning Methods. Physica A: Statistical Mechanics and Its Applications, Vol. 595, 2022, p. 127083.

Generative Artificial Intelligence for Class Imbalance in Crash Severity Estimation with Mixed Data Types

Abstract

Keywords

Get full access to this article

References