Risk prediction and factor analysis of rider’s injury severity in passenger car and E-bike accidents based on interpretable machine learning

Abstract

Casualties and property losses caused by the passenger car and electric bicycle crash accidents increased year by year. Assessment of the relevant risk factors of injury severity in passenger car and electric bicycle crashes could help to mitigate crash severity. This study uses an emerging machine learning method to predict the relationship between the risk factors and bicyclist accident injury severity in passenger car-electric bicycle collision accidents. The model’s performance is compared and evaluated based on accuracy, precision, recall, F1-Score, area under curve (AUC), and receiver operating characteristic curve (ROC). An interpretable machine learning framework Shapley additive explanations (SHAP) is used to further analyze the relationship between risk factors and bicyclist injury severity. It is found that we can adopt the light gradient boosting machine (LightGBM) algorithm after hyper-parameter optimization to get the highest accuracy (94.85%), precision (95.2%), recall (94.9%), F1-Score (95%), and AUC (0.993) based on the accident data of electric bicycles and passenger cars in the China in-depth accident study dataset from 2014 to 2018. The model can be used to assess new accident cases based on the model learning rate. There are some new findings in the aspects of bicyclists’ physical factors and electric vehicle characteristics. The throwing distance of the bicyclist has a positive impact on the injury severity. The bicyclist is more likely to suffer more serious injuries in a crash accident when the bicyclist is male, or shorter. Electric bicycles have a smaller handlebar width. In general, the lower handlebar height is, or lower saddle height is, the more serious the bicyclist’s injury is. Safety training for drivers can help reduce the injury severity in crash accidents and improve traffic safety.

Keywords

Traffic safety electric bicycle crash severity machine learning LightGBM SHAP

Introduction

The traffic congestion and environmental pollution caused by passenger cars plague people’s lives. The government has issued policies to encourage people to travel with green transportation modes. Now, bicycles have become an important tool of transportation for Chinese people to travel short distances with the advantage of being economical, flexible, and environmentally friendly. In the early 21st century, the electric bicycle became more popular. The electric bicycle requires little riding space compared with motor vehicles, which allows them to shuttle through the congested urban road network more quickly and smoothly. Compared with traditional bicycles, electric bicycles rely on electric motors to provide power, which greatly reduces the physical exertion of the rider and allows electric bicycles to travel faster.¹ Small size, fast speed, many available scenarios and good economy make electric bicycles display great advantages in daily travel.^2,3 Motor vehicles and conventional bicycle users are changing their short-distance travel patterns and there are 59 electric bicycles per 100 households in China by 2019.⁴

Electric bicycles not only bring convenience to the riders’ lives but also bring serious pressure to the efficiency of mixed traffic and traffic safety.⁵ Electric bicycles significantly differ from conventional bicycles both in terms of traffic characteristics and micro-behavioral characteristics.^6,7 On the one hand, most of the non-motorized traffic facilities are still used to meet the bicycle traffic demand in urban areas of China, which hardly match the traffic demand for the rapidly growing electric bicycles. On the other hand, the electric bicycles’ steering flexibility make it too easy to sway and discrete in riding to conflict with motor vehicles.⁸ Sweden has classified e-bikes as a separate category of road users in accident statistics since 2011.⁹ A survey in Austria found that about 4% of electric bicyclists have experienced a crash accident and about one in six respondents have been involved in a near-collision.¹⁰ Similarly, an Australian survey find that more than half of participants have experienced security incidents related to electric bicycles.¹¹ E-bikes have higher mass and faster speed compared to traditional bicycles, coupled with the complex road traffic environment. China has the largest number of e-bikes and related accidents in the world, traffic accidents involving e-bikes account for about 36.3% of all two-wheeler accidents.¹² At the same time, similar to traditional bicyclists, e-bikes also expose rider to traffic flow, and the consequences of accidents caused by e-bike may be more serious.^13,14 A statistical analysis of e-bike accidents found that severe head injuries are the most common injury among e-bike riders and could lead to death. In order to reduce the accident rate and ensure the traffic safety of electric bicyclist, it is necessary to combine the actual traffic accident situation in China to conduct research on the factors of riders’ injuries caused by e-bike accidents.

At present, many scholars have studied e-bike traffic accidents. In terms of electric bicyclists, studies have found a high negative correlation between rider age, female gender, and rider deaths in e-bike accidents.^15,16 In terms of motor vehicle drivers, Popovich et al.⁶ found that visually difficulty to distinguish e-bikes from traditional bicycles may be a cause of collision accidents based on qualitative interviews with electric bicyclists. Similarly, many studies have analyzed the causal factors of e-bike accidents. Yang et al.¹⁷ found that the four factors of collision object, collision target speed, direction, and sight obstacle are the main factors based on the accident data of e-bikes at intersections using the principal component analysis method. Hertach et al.¹⁸ found that high riding exposure, male, and travel for work or school are high-risk factors for e-bike accidents. Li et al.¹⁹ found that the important factors affecting the severity of traffic accidents are the type of collision, the injured part of the bicyclist, the type of physical isolation on the road, etc. Raslavičius et al.²⁰ found that reducing the impact of cyclists’ head and neck on the windshield of the car in a crash could effectively reduce the proportion of cyclists suffering serious injury and death. Panicker and Ramadurai²¹ used the variable importance factor measurement method to find that the type of collision vehicle, collision type, driver’s age and road visibility have the greatest influence on collision severity of e-bikes. Based on the analysis, previous studies have analyzed the factors contributing to e-bike accidents. However, the analysis of factors is often limited to fewer characteristic factors due to the difficulty of data collection.

On the other hand, most previous studies have used statistical models, for example, the ordered logit model,²² ordered probit model,^23,24 Bayesian regression model,²⁵ and other methods to predict accident risk. However, the application of statistics to data processing imposes relatively strict assumptions, violating these assumptions will lead to biased estimates and even derive wrong conclusions. The machine learning method is not limited by this problem. The research method based on machine learning has been widely used in the field of road traffic accident research in recent years. Based on weather conditions, accidents and detector data, Parsa et al.²⁶ used the support vector machine (SVM) and probabilistic neural network (PNN) to detect traffic accidents on a highway in Chicago. Wang et al.²⁷ compared the driving risk prediction effects of four kinds of integrated learning models for drivers in the Kunshan area. Theofilatos et al.²⁸ predicted the accident risk of powered two-wheelers (PTW) and the tendency of PTW accident types on urban arterials by using the SVM model. Hadjidimitriou et al.²⁹ explored the accident severity involving e-bikes using nonlinear machine learning methods based on accident characteristics collected by sensors and other onboard devices at the time of the accident. Panicker and Ramadurai²¹ used machine learning methods to predict the injury severity of accidents based on e-bike accidents, the results show that conditional inference forest is better than random forest and ordered probit model in predicting accident injury severity. Montella et al.³⁰ used classification trees and rules to analyze e-bike accidents in Italy and explored the factors and interdependence of e-bike crashes. Guo et al.³¹ used the eXtreme Gradient Boosting (XGBoost) method to analyze the factors related to the injury severity of elderly pedestrians based on the accident data. Zhou et al.³² verified the effectiveness of the XGBoost method in predicting the injury severity of passenger car drivers and truck drivers based on the data of passenger car and truck collision accidents reported by Canadian police. Wang and Liu³³ analyzed the causes of railway traffic accidents by using the random forest method. Zhu³⁴ used the random forest to explore the important factors of drivers’ crash behavior and interdependence in e-bike accidents.

In previous studies, the causes and risk prediction factors of e-bike accidents have been analyzed in detail. However, there are few studies to explore the relationship between the risk factors and the rider’s injury severity from the perspective of the passenger car and electric bicycle crash accident. The passenger car has the characteristics of flexibility and high speed, and is one of the main tools of transportation for residents to travel. With the continuous development of social economy, the traffic conflict between passenger cars and electric two-wheelers is gradually intensified. On the other hand, there are many factors affecting the e-bike accident, and the collection of multidimensional accident data is difficult. It is very few studies to conduct a multi-dimensional analysis of the causal factors of passenger-electric bicycle accidents.

The causal factors of passenger car-electric bicycle accidents were analyzed based on the passenger car-electric bicycle accident cases in the China in-depth accident study (CIDAS) dataset from the aspect of human, vehicle, and road environment, integrating six aspects of accident characteristic factors, including accident information, road information, e-bike information, electric bicyclist information, passenger car information and passenger car driver information. The predictive performance of six machine learning models is compared comprehensively based on multiple evaluation indicators. The machine learning method with the best predictive performance was selected to further explore the factors that affect the serious injury of electric bicyclists. Combined with the interpretable machine learning framework analyzes the impact of factors and their interactions on the injury severity of electric bicyclists. It aims to provide a reference for in-depth research on passenger car-electric bicycle accidents, improve the relevant safety regulations, and optimize the safety performance of e-bikes.

Methodology

Light gradient boosting machine

Light gradient boosting machine (LightGBM) combines the gradient-based one-side sampling (GOSS) and exclusive feature bundling (EFB) to solve the efficiency and scalability of applying gradient boosting decision tree (GBDT) to high-dimensional input features and large data volumes. LightGBM is an integrated regression tree approximation of the final model f_T based on GBDT with the decision tree as base classifier, there are:

f_{T} = \sum_{t = 1}^{T} f_{t} (X) f_{t} \in θ

(1)

Where f_T is the ith decision tree. θ is the set space of all trees.

LightGBM learns the mapping function from input space X_s to gradient space G by using a decision tree. Suppose the train set is {x₁,…,x_n}, where x_i is the ith vector in the X_s. Construct the loss function L(y, H(x)), where y is the output and H(x) is the estimation function. Find the function H*(x) that minimizes the expected loss function by the equation (2):

\begin{matrix} H^{*} (x) = \arctan min_{H} E_{y, x} (L (y, H (x))) \\ = \arctan min_{H} E_{x} (E_{y} (L (y, H (x))) | x) \end{matrix}

(2)

The negative gradient output value of the current model loss function is {g₁,…,g_n} in each iteration. Where g_i is the negative gradient value of the loss function corresponding to x_i in the output model. LightGBM uses the GOSS method to split internal nodes, and the weak learner splits with the largest information gain at the split point. Let O be the dataset within a fixed node. The variance gain $v_{j | O} (d)$ of this feature at node j at split point d is defined as:

v_{j | O} (d) = \frac{1}{n_{O}} [\frac{{(\sum_{x_{i} \in O : x_{ij} \leq d} g_{i})}^{2}}{n_{l | O}^{j} (d)} + \frac{{(\sum_{x_{i} \in O : x_{ij} > d} g_{i})}^{2}}{n_{r | O}^{j} (d)}]

(3)

Where $n_{o} = \sum I (x_{i} \in O)$ . I refer to the training set and n_o is the samples in the training set for a fixed leaf node. The splitting dimension is j and d is the splitting point. The n is the number of samples. The g_i is the value of the negative gradient of the loss function corresponding to x_i at the current model output. $n_{l | O}^{j} (d)$ represents the number of samples whose median value of the jth feature is less than or equal to d. And $n_{l | o}^{j} = \sum I [x_{i} \in o : x_{ij} \leq d]$ , $n_{r | o}^{j} (d)$ represents the number of samples whose median value of the jth feature is more than d. And $n_{r | o}^{j} = \sum I [x_{i} \in o : x_{ij} > d]$ .

Secondly, the input features of many programs are multidimensional and sparse (Most of the data is missing or zero). EFB can make the data size smaller by using the idea of building a graph to determine the bundled features. And it uses the features as nodes to connect edges between non-exclusive features and find the set of all bundled features. EFB adopts an approximate greedy strategy solution to make the model get a good trade-off in accuracy and training efficiency.

In summary, the LightGBM utilizes GOSS to partition internal nodes according to variance gain and utilizes EFB to reduce the dimensionality of input variables. In addition, LightGBM can resist the multicollinearity problem among related independent variables as a decision tree-based model.

Shapley additive explanations (SHAP)

Lundberg and Lee³⁵ developed a unified framework SHAP for post-hoc interpretation of machine learning methods. The biggest advantage of SHAP is that it can reflect the influence of each feature in the sample on the prediction results, and it also could point out the positive and negative predictability of its influence degree. The SHAP value is derived from the Shapley value in game theory, which quantifies the marginal effect of a single variable and the interaction between two variables.

The SHAP model generates a predicted value and provides an interpretable prediction for each test sample. The main thought is to calculate the marginal contribution of the features added to the model, known as SHAP values. It is equivalent to the influence of the feature on the sample. The SHAP value is calculated by equation (4) in cooperative game theory:

ϕ_{m} = \sum_{L \subseteq N {m}} \frac{| L |! (M - | L | - 1)!}{M!} • [v (L \cup {m}) - v (L)]

(4)

Where φ_m is the contribution of the mth feature. L is the feature subset; N{m} is the feature set; M is the total number of input features. V(L∪{m}) is the value of the predicted model if the sample only has the eigenvalues of L∪{M}. The v(L) is the value of the model predicted when the sample has only eigenvalues in L. Based on the additive characteristic attribute method, the linear function g is defined by equation (5).

g (x) = ϕ_{0} + \sum_{m = 1}^{M} ϕ_{m} x_{m}

(5)

Where g(x) is the value of sample × predicted by the post-interpretation model. φ₀ is the average value predicted by the model. The x_m is the mth feature sample.

Data preparation

This study used passenger car-electric bicycle accidents that occurred between 2014 and 2018 collected in the CIDAS dataset. Accident participants included only one passenger car and one e-bike, excluding multi-vehicle collision accidents, focusing on the injuries of electric bicyclists. Based on the correlation analysis method, the data with a high correlation between accident feature factors are removed. The high correlation between feature factors is the correlation coefficient greater than 0.5, and the feature is highly correlated with the target variable. A total of 40 characteristic variables were obtained for research analysis after correlation analysis. The accident information includes the time of crash, the location of scene, the vehicle flux, the month, and the week. Road information includes scene within road network, class of road and characteristic of road. The description is shown in Table 1.

Table 1.

Summary description of accident information variables.

Variable	Possible values (%)
Time of crash	1–24 h
Location of scene	0 for inside built-up area (58.2); 1 for outside built-up area (41.8)
Vehicle flux	0 for several cars (16.2); 1 for fewer cars with a large distance between vehicles (52.9); 2 for many cars, moving at middle speed (29.3); 3 for great number of cars, moving slowly (1.3); 4 for checked (0.3)
Month	0 for Spring (25.3); 1 for Summer (34.4); 2 for Autumn (26); 3 for Winter (14.3)
Weekday	Monday to Sunday
Class of road	0 for freeway (0.5); 1 for federal highway (10.8); 2 for county road (20.4); 3 for rural road (15.1); 4 for city street (53.2)
Characteristic of road	0 for None (16); 1 for cross-section (61.6); 2 for T-junction (10.8); 3 for road or exit within a community factory (7); 4 for uphill or downhill grade (1.8); 5 for bend (2.8)
Scene within road network	0 for straight (17.9); 1 for bend (2.8); 2 for junction (43.5); 3 for crossing(31.3); 4 for roundabout (0.9); 5 for other (0.9); 6 for car park (0.2); 7 for property exit (1.7); 8 for ramp entrance/exist (0.8)

E-bike information includes handlebar height, handlebar width, saddle height, biker’s velocity, and pedal (1 for yes; 2 for no), handlebar design (0 for not specified; 1 for straight; 2 for touring; 3 for bullhorn; 4 for horn; 5 for multifunctional; 6 for other). The summary statistics of variables are shown in Table 2.

Table 2.

Summary statistics of variables.

Variable	Mean	STD	Min	Max
E-bike
Handlebar height (cm)	98.852	7.436	10	133
Handlebar width (cm)	63.538	4.038	43	100
Saddle height (cm)	73.693	5.127	55	125
Biker’s velocity (km/h)	24.538	8.193	0	50
Rider
Age group	4.088	–	0	7
Stature	0.994	–	0	2
Body height (cm)	166.901	5.871	145	180
Gender	0.379	–	0	1
Throwing distance (cm)	5.13	5.19	0	37.97
Driver
Age group	3.107	–	2	6
Stature	0.996	–	0	2
Body height (cm)	170.432	5.004	154	186
Gender	0.134	–	0	1

The information of the participants in the accident includes the age group, stature, body height, gender, occupation, whether braking measures were taken before the collision (brake), the mood before the accident (mind level), the purpose of the journey, the familiarity of the accident place (familiarity), and the willingness to take risks (willingness_take risks). The classification variables are shown in Table 3.

Table 3.

Classification variables of human factor.

Variable	Values
Age group	0 for 6–12 years; 1 for adolescents until 17 years; 2 for 18–30 years old; 3 for 31–40 years old; 4 for 41–50 years old; 5 for 51–60 years old; 6 for 61–70 years old; 7 for over 71 years
Stature	0 for slim; 1 for normal; 2 for fat
Gender	0 for male; 1 for female
Occupation	0 for worker; 1 for other; 2 for employed; 3 for self-employed; 4 for pensioner
Brake	0 for no; 1 for full brake; 2 for partly brake; 3 for declaration; 4 for acceleration; 5 for other
Willingness_take risks	0 for defensive; 1 for neutral; 2 for risky
Familiarity	0 for almost daily; 1 for several times a week; 2 for seldom
Mind level	0 for very good mood; 1 for good mood; 2 for normal mood; 3 for bad mood
Purpose of the journey	0 for official channel; 1 for purchase; 2 for leisure time; 3 for other; 4 for way to work

In addition, the electric bicyclist information also includes the throwing distance (shown in Table 2) and whether wear a helmet (0 for yes, 1 for no). Passenger car driver information also includes the frequency of driver safety training, the number of kilometers driven per year (Km per year), the driving time, the driver’s intention of doing before the collision, and whether wears glasses. The passenger car driver variables are shown in Table 4.

Table 4.

Summary of passenger car driver variables.

Variable	Possible values
Driver safety training	0 for never; 1 for yes, not specified; 2 for more than a year ago; 3 for last year; 4 for last half year; 5 for last 3 months; 6 for last month
Km per year	0 for 0 km; 1 for 1–1000 km; 2 for 1001–5000 km; 3 for 5001–10,000 km; 4 for 10,001–15,000 km; 5 for 15,001–20,000 km; 6 for 20,001–50,000 km; 7 for more than 50,000 km; 8 for unknown
Driving time	0 for 0–30 min; 1 for more than 30 min to 1.5 h; 2 for more than 1.5–2.5 h; 3 for more than 2.5–3.5 h; 4 for more than 3.5–4.5 h; 5 for more than 4.5 h
Intention of doing	0 for normal continuation of drive; 1 for overtaking on lane of oncoming traffic; 2 for change of lanes; 3 for turn to the right; 4 for turn to the left; 5 for stop; 6 for other; 7 for turn around; 8 for avoidance maneuver
Braking response	0 for brake; 1 for did not brake; 2 for engine brake

The information of drivers and bicyclists involved in the accident mainly comes from questionnaires and voice records. The final dataset includes 989 accident cases after removing some data missing cases and misinformation. Accidents that electric bicyclists only suffered property damage (PDO) accounted for 5.8%, accidents that electric bicyclists suffered slight injuries (SI) account for 79.1%, accidents that electric bicyclists suffered serious injuries account for 12.1%, and accidents that electric bicyclists suffered fatalities account for 3.0%. This study combined serious injuries and fatalities into one injury category, because serious injury also causes large accident losses. Modeling is performed with the dependent variables of property damage only, slight injury, serious injury and fatalities (SIAF) for the three injury severity categories of accidents for electric bicyclists.

There is an imbalance proportion of injury severity in the crash accident data, the methods of oversampling and under sampling are usually used to deal with unbalanced data. However, the under sampling technique often leads to the loss of a large part of the data, which leads to a decrease in the model accuracy. Therefore, the over-sampling method was selected in this study. The SMOTE method that developed by Chawla et al.³⁶ was adopted, which is an improved scheme based on the random oversampling algorithm. To achieve the purpose of class balance in the data, the SMOTE uses k near neighbor (KNN) technology to analyze and simulate the minority class samples and adds artificially simulated new samples to the data set. Additionally, SMOTE could generate larger and less specific decision areas³⁷ and it is widely used in research.^26,38 Python 3.8 and regular SMOTE are used to balance the data in this study.

Modeling was done in the Jupyter lab using Python code. About 80% of the total data was randomly selected as the training data of the model, and 20% as the test data of the model in this study. In addition, 10-fold cross validation was used to test the stability of model performance on the training data. Specifically, The training set data is randomly divided into 10 sub-samples, 9 sub-samples are used to train the model each time, and 1 sub-sample is used for model testing.

Model results

Evaluation indicators of the model

Confusion matrices are regularly utilized to degree the classification performance on test tests. True positive (TP) refers to the positive example that is correctly predicted and it indicates that the true value is positive and is additionally accurately predicted to be positive. True negative (TN) refers to a correctly predicted counterexample and it indicates that the true value is a negative case and is accurately predicted. False-positive (FP) implies that the true value is a negative example, and is wrong predicted to be a positive example. False-negative (FN) means that the positive example is wrong predicted to a negative example.

The accuracy refers to the ratio of correctly predicted samples to the total sample size. In general, the higher the accuracy, the better the classification performance of the model. Its calculation is equation (6).

Accuracy = \frac{TP + TN}{TP + FP + TN + FN}

(6)

Precision is appropriate when the cost of negative examples being wrongly predicted as positive examples is high. The precision refers to the proportion of the samples whose prediction results are positive examples, and the true situation is also a positive example. High precision means a low false-positive rate or false positive. Its calculation formula is equation (7).

Precision = \frac{TP}{TP + FP}

(7)

The recall is often used when positive examples are incorrectly predicted to be negative examples. The recall represents the proportion of the positive samples in the predicted results that are also positive in the real situation. High recall means low false negative or false negative. Its calculation formula is equation (8).

Re call = \frac{TP}{TP + FN}

(8)

F1-score refers to the weighted average of precision and recall. It takes false negatives and false positives into account, and it is more applicable than accuracy, especially when the class distribution of the dataset is uneven. Its calculation formula is equation (9).

F 1 - Score = \frac{2 TP}{2 TP + (FP + FN)}

(9)

Receiver operating characteristic curve (ROC) uses the parameters in the confusion matrix to gradually increase the classification threshold, and it is calculated by the true positive rate (TPR) and false positive rate (FPR), where: TPR = TP/(TP + FN), and FPR = FP/(FP + TN).

Area under curve (AUC) is defined as the area under the ROC curve, with a value not greater than 1, and its meaning is to take a positive sample and a negative sample in the data set, and the probability that the predicted positive example is in front of the negative example. In other words, AUC gives a measure of the sample quality sorting. The larger the AUC is, the better the classification performance of the model is.

Five commonly used model performance evaluation indicators were used. The value of each evaluation indicator was obtained by calculating the weighted average of each evaluation indicator in each class according to the proportion of the data set occupied by each class for the prediction of the multi-classification problem in this study.

Model comparison

The 10-fold cross validation method was used to verify the applicability of the LightGBM model to the passenger car-electric bicycle crash accident data. The performance of LightGBM was compared with that of logistic regression (LR), KNN, SVM, decision tree (DT), and XGBoost. The performance of each machine learning algorithm was preliminarily evaluated with the accuracy value as the evaluation indicator (see Figure 1). Compared with other machine learning algorithms, the LightGBM algorithm has higher accuracy in classifying and predicting the injury severity of electric bicyclists.

Figure 1.

Comparison of machine learning algorithms.

Model parameter

For the LightGBM algorithm, several main hyper-parameters of the model were optimized to maximize model performance in this study. Table 5 lists several key hyper-parameters. Larger n_estimators generally improve prediction accuracy, max depth and min_child_weight are used to prevent overfitting. A smaller learning_rate can improve the predictive performance of the model, and num_leaves is used to control the complexity of the model. This study adopts the grid search method to evaluate model performance on the test set.³⁹ The main hyper-parameters of the LightGBM model were traversed and optimized, and the hyper-parameter values of the optimal model are obtained and shown in Table 5.

Table 5.

LightGBM optimal parameters and default values.

Parameter	Default value	Optimum value	Parameter meaning
learning_rate	0.1	0.03	The step size of gradient descent
max_depth	3	8	Maximum depth of each base learner
n_estimators	10	1000	The number of training base learners
num_leaves	31	50	Maximum leaf node per base learner
min_child_weight	1	10	Determine the minimum threshold for the sum of the minimum leaf node sample weights

A total of five evaluation indicators including accuracy, precision, recall, F1-Score, and AUC were used to comprehensively evaluate the classification and prediction performance of the model. The results are shown in Table 6. It is found that the LightGBM model classification accuracy of the injury severity of electric bicyclists is 93.6%, precision is 94.1%, recall is 93.5%, F1-score is 93.8%, AUC value is 0.993. Each value of the evaluation indicator is higher than other machine learning algorithm indicator values.

Table 6.

Model performance comparison.

Model	Accuracy	Precision	Recall	F1-score	AUC
LR	0.614	0.616	0.615	0.613	0.780
KNN	0.708	0.744	0.728	0.689	0.903
SVM	0.856	0.907	0.849	0.860	0.976
DT	0.751	0.752	0.761	0.750	0.819
XGBoost	0.871	0.872	0.871	0.871	0.971
LightGBM	0.936	0.941	0.935	0.938	0.983
PLightGBM	0.949	0.952	0.949	0.95	0.990

We compared the prediction performance of the rider injury severity before and after the hyper-parameter optimization. The accuracy of the optimal LightGBM model (PLightGBM) is 94.9%, precision is 95.2%, recall is 94.9%, and F1-Score is 95.0%. It is better than the accuracy of LightGBM algorithm 93.6% under the default parameters. The AUC of PLightGBM for the injury severity of electric bicyclists is 99.3%. 0.998 AUC for PDO, 98.9% for SI and 99.3% for SIAF. Based on the comparative analysis of various evaluation indicators, the performance of the PLightGBM model is dominant in this study.

Accuracy focuses on the validity of the prediction results. Accuracy is often used as a direct evaluation indicator for the performance of the model. However, when the distribution of variables in the sample data is not balanced, the accuracy will have certain limitations when the accident of electric bicyclists with only property damage occurs less than other types of injury accidents. ROC was introduced to further analyze the classification and prediction performance of each machine learning algorithm in this study. True positive rate (TPR) and false positive rate (FPR) were used to draw ROC curves of each machine learning algorithm. If the ROC curve of one learner A is completely covered by the ROC curve of another learner B, then the performance of B is better than that of A. If the curves of A and B intersect, it is impossible to conclude which model is better, it is necessary to compare the AUC value based on the area under the ROC curve. The weighted average of three categories of ROC curves under the same machine learning algorithm was taken to obtain the final ROC curve of the algorithm. The ROC curve of each algorithm is shown in Figure 2. The ROC curve of the LightGBM model is closest to the upper left corner compared with other models, and fully wraps the ROC curves of the other algorithms used in this study. Combined with the analysis results in Table 6, it is concluded that the LightGBM algorithm has a good ability to classify and predict the injury severity of the electric bicyclist through the comprehensive evaluation and analysis of each machine learning algorithm.

Figure 2.

Performance comparison of ROC curves.

Analysis of model performance and generalization ability

To further observe the classification and prediction performance of PLightGBM model on the injury severity of electric bicyclists, the confusion matrix with optimal hyper-parameters is shown in Figure 3 based on the test set data. It can be seen that the electric bicyclist’s accident injury severity prediction model based on the LightGBM algorithm shows a good classification and prediction performance.

Figure 3.

Confusion matrix.

In terms of model performance evaluation, a good model not only needs a good performance on the training set data, but also needs good generalization ability in cross-validation. In order to verify the adaptability of the LightGBM optimal model to new samples, the 10-fold cross validation method was used, and the learning curve is drawn to analyze the convergence of the model in Figure 4, observe the adaptability of the LightGBM model to new samples. It can be seen from the fitting trend in Figure 4 that the cross-validation score of the model is relatively stable with the continuous increase of the sample size. And the model achieves a better fitting effect. It shows that the PLightGBM model has better adaptability to new accident samples.

Figure 4.

Learning curve of LightGBM.

Results and discussion

Importance of risk factors

An accurate model can capture better the relationship between rider accident injury severity and risk factors. And the interpretation of model results is another important aspect. Factor importance refers to the degree that a factor affects the performance of the model. In addition to identifying important factors, the importance of factors also shows the degree of influence on the output variables. The study conducted two analyses. On the one hand, the importance of the risk factors affecting the injury severity of electric bicyclists was analyzed based on the PLightGBM model. On the other hand, the impact of risk factors and their interactions on the injury severity of various types of riders was analyzed based on the interpretable machine learning framework SHAP. The factor importance of the LightGBM model is obtained by the number of times the training data is split across all trees according to the ith variable. The factor importance score F_i is calculated according to equation (10).

F_{i} = {s | s = w_{i} z_{i}}

(10)

Where w_i represents the weight of each feature factor. z_i represents the ith factor among all feature factors. Figure 5 shows the top 15 risk factor scores that have a greater impact on e-bike accident injury severity based on the PLightGBM model (The number 1 after the variable represents the characteristic variable belonging to the e-bike, and the number 2 after the variable represents the characteristic variable belonging to the passenger car). The results show that the throwing distance has the greatest importance to the model. However, the feature importance does not reflect the contribution of the characteristic factors to the injury severity of the electric bicyclists.

Figure 5.

Feature variable importance based on PLightGBM model.

To explain the feature importance and how it affects the injury severity of electric two-wheeler riders, this study uses the SHAP method to interpret the feature importance of the LightGBM model. Figure 6 shows the SHAP trends in the prediction process of three injury severities for riders. The first–third lines represent property damage only, slight injuries, serious injuries and fatalities, respectively. The base value is the base logit value of the injury level in the training dataset. The output value f(x) is the model predicted value. When f(x) is greater than the base value, it means that such injury will occur, otherwise, it means that the injury will not occur. The sample shown in Figure 6 indicates that this sample will result in the only property damage accident. Trends in SHAP values show the impact of factors on cyclist injury. For example, when the driving purpose of the journey is other, the throwing distance of the rider is 437 cm, and the location of scene is inside built-up area, it has a positive impact on the cyclist’s property damage only. When the cyclist does not brake, the driver’s occupation is employed, which means that there is a negative impact on the cyclist’s only property damage.

Figure 6.

SHAP model modeling prediction progress plots.

Figure 7 shows the SHAP summary plots that the importance of three types of bicyclist accident injury severity based on characteristic factors. Risk factors are ranked from top to bottom according to the importance of SHAP value. The top 15 important risk factors are plotted and displayed. The color bars in each row indicate how each risk factor contributes to the injury severity. Red dots represent high values of risk factors, and blue dots represent low values of risk factors.

Figure 7.

SHAP summary plot. The number 1 after the variable represents the characteristic variable belonging to the e-bike, the number 2 after the variable represents the characteristic variable belonging to the passenger car: (a) PDO, (b) SI, and (c) SIAF.

Figure 7(a) shows the SHAP summary plots of only property damage. The risk factors that have a positive impact on the consequences of only property damage in the accident are that the passenger car driver is more familiar with the surrounding area of the accident. The rider of the e-bike is relatively young and is familiar with the surrounding area of the accident. In terms of the factors of the e-bike, the e-bike travels at a slower speed, and it has a larger handlebar width, it is more likely to be classified as the only property damage accident. The detailed analysis of handlebar width is in section 5.2.

Young drivers can react relatively quickly and better avoid collision accidents in an emergency event. And for older drivers, the driver’s response to the danger in the risk scene is relatively delayed, which is likely to cause a crash accident. Considering the age of the rider, older riders are vulnerable to head injuries and greater physical decline.⁴⁰ People are more likely to have traffic accidents in familiar locations. The 45.36% of cases in places the rider pass by almost every day among all accident cases. About 50.76% of cases occurred at locations where riders passed several times a week. The number of cases in which a passenger car driver was involved in a collision in places they pass by almost every day accounted for 27.32%. About 57.39% of accidents occurred at locations that drivers passed several times a week. Research has shown that the driver’s familiarity with the route has an impact on driving behavior. Drivers usually travel faster on familiar routes, which could lead to distraction and greater self-confidence, and are more prone to being involved in collision accidents.^41,42 Riders are also prone to collision accidents on familiar routes according to this study.

Figure 7(b) shows the SHAP summary plots of the importance of risk factors for riders with slight injury. Passenger car drivers often make a U-turn and an evasive maneuver before the collision. The rider is a woman, who is taller. The e-bike has a relatively fast speed. When the accident occurs on the weekend and the road type is below the rural road, the rider is more likely to be classified as a slight injury accident.

When the driver is driving the vehicle in a U-turn or taking an evasive maneuver before the collision, and when the vehicle is driving on a road type of country road and below, the speed of the passenger car is generally lower, causing slight injuries to the rider in the event of a crash accident. Riders tend to take risky driving behaviors, which can easily cause injury incidents. And research has shown that there is a high correlation between mood and driving comfort and safety.⁴³ A bad mood may cause unsafe riding factors such as aggressive riding or distraction, which may easily lead to crashes.

Figure 7(c) shows the SHAP summary plots of the risk factors for riders with serious injury and fatalities. The driver is relatively young, and his driving intention before the collision is general continuous driving or borrowing the opposite lane to overtake. The rider is male. When the handlebar width of the e-bike is small, the handlebar height and the saddle height are low. Electric bikes have pedals. That makes the riders have a greater likelihood of being classified as SIAF.

The speed of passenger car drivers is generally higher in the process of general continuous driving and overtaking. In addition, the electric bicyclist lacks passive protection measures, and the high collision speed is likely to cause SIAF to the bicyclist. Studies have shown that the gender of electric bicyclists has a greater impact on their long-term overspeeding behavior, and male bicyclists are more likely to overspeed than females.⁴⁴ And the accident rate of men among electric bicyclists is higher than that of females.⁴⁵ When the height of the driver is low, the height of the handlebar is low or the saddle height is low, that results in a lower head position for the bicyclist when colliding with the passenger car. And it is easy to directly collide with the front of the passenger car, causing more head serious injuries, and increasing the probability of bicyclists being seriously injury and fatalities in an accident.

Combined with the risk factors of SHAP (Figure 7) and the LightGBM model (Figure 5) analysis of the rider injury severity types. It can be found that the throwing distance has an important impact on the injury severity of the electric bicyclist. Figure 8 shows the throwing distance distribution of the SHAP value under the different injury severity types of e-bike accidents. Specifically, the rider has a greater probability of suffering PDO or SI when throwing a short distance. Electric bicyclists are more likely to suffer serious or even fatal injuries as throwing distance increases. This is consistent with cognition. The importance of other risk factors is different in the three SHAP models. This shows that in the case of different degrees of electric bicyclist injury, the impact of risk factors on accidents is different.

Figure 8.

SHAP main effects plots of throwing distance: (a) PDO, (b) SI, and (c) SIAF.

The influence of risk factors

The SHAP summary plot in Figure 7 shows the qualitative relationship between rider injury severity and accident factors. Figure 8 shows the SHAP dependency plot of some important features to further investigate how feature factors affect the injury of the electric bicyclist. It not only captures the main effects of individual factors but also reflects the impact of the interaction effects between two factors on the injury of the electric bicyclist.

Figure 9(a) to (c) shows the interaction effect of passenger car driver occupation and rider throwing distance on injury severity. The horizontal axis represents the specific value of the factors and the value meaning is shown in Table 3. Occupation has a certain influence on rider speed behavior.⁴⁴ The results in Figure 9(b) show that when the passenger car driver is a worker or other occupation, a collision with slight injuries is less likely to occur. And when the throwing distance is long between these two types of occupations in a collision accident, the probability of serious injury is high. Based on this study, we found that when the driver’s occupation is employed, self-employed, and pensioner, the probability of rider with slight injury collision is higher in the accident.

Figure 9.

SHAP interaction effects plots: (a) PDO, (b) SI, (c) SIAF, (d) PDO, (e) SI, and (f) SIAF.

Figure 9(d) to (f) shows that the handlebar width of an e-bike is usually between 60 and 70 cm. When the handlebar width of the e-bike is <63 cm in accidents, the bicyclist is prone to serious injury and fatalities. When the handlebar width is >63 cm, the bicyclist has a higher probability of slight injury or only property damage. This shows that the larger handlebar width has a certain safety protection effect on the rider. But what needs to be considered is that when the handlebar width of the e-bike is larger, it will affect the driving it on narrow roads and narrower lanes. Therefore, it is necessary to comprehensively consider the influence of factors such as driving safety and the design of the e-bike to choose the handlebar width.

Further discussion on serious injuries and fatalities crashes

Figure 10 shows the SHAP interaction plot of the risk factors of riders in serious injuries and fatal accidents. It can be seen from Figure 10(a) that when the height of the rider is less than 165 cm, the rider is more likely to be seriously injured or fatally injured in a crash accident. When the rider’s height is higher than 170 cm, it has a negative impact on the probability of serious injury and fatal collision of the rider, but the interaction between the rider’s throwing distance and height is not obvious.

Figure 10.

SHAP interaction plot: (a) body height of bicyclist, (b) driver’s intention of doing, (c) age group of driver, and (d) gender of bicyclist.

It could be found that when the driver’s driving intention before the collision is general continuous driving or using the opposite lane to overtake and change lanes. SIAF will occur with a high probability from Figure 10(b). This may be because the driving intentions of the driver usually maintain the speed at a relatively high level. A large impact speed will be generated once a collision accident occurs, which is likely to cause a large injury to the bicyclist. When the driver’s driving intention before the collision is to turn, stop, U-turn and other low-speed operations, resulting in a lower throwing distance for the bicyclist. In this case, the probability of serious or fatal injury to the bicyclist is low.

It can be seen that the probability of serious injury and fatal injury is high during working days from Figure 10(c). Drivers aged from 18 to 40 in e-bike accidents have a higher probability of serious injury and fatal collisions with the rider during weekday days. This may be due to the impact of morning and evening peak traffic flow on weekdays.⁴⁶ The increasing age of passenger car drivers has a negative impact on ridder’s serious injuries and fatal crashes. The rider’s probability of serious or fatal injury decreases as passenger car drivers age. This may be due to the more cautious and patient driving behavior of older drivers when driving the vehicle.⁴⁷

Studies have shown that in vehicle-pedestrian collisions, male pedestrians have a higher fatality rate per accident.⁴⁸ This study found that this phenomenon also exists in riders in passenger car-electric bicycle collision accidents. It can be seen from Figure 10(d) that when the rider is male, SHAP is a positive value, which means that the probability of serious injury or even fatal collision of male bicyclists is higher. Female riders have a negative SHAP value, meaning that female riders are less likely to be involved in serious or fatal crashes and are more likely to be involved in slight injuries in crashes (as shown in Figure 7(b)).⁴⁹ This may be because female riders are more likely to maintain a regular living and good driving habits, which can effectively reduce inappropriate driving behavior.⁵⁰ Making female drivers of e-bikes less likely to suffer from serious collisions.

Conclusion

This study takes an interpretable machine learning framework to study and analyze the risk factors that affect bicyclist crash severity in passenger car-electric bicycle collision accidents. The 2014–2018 passenger car-electric bicycle collision accidents were collected through the CIDAS dataset. After data screening and deletion of missing values, 989 cases that the accidents only involved passenger cars and electric bicycles were used for this study analysis. The injury severity of electric bicyclist accidents is divided into three types: only property damage, slight injury, and serious injury and fatalities. Through the comparative analysis of six machine learning methods, accuracy, precision, recall, F1-Score, AUC evaluation indicators, and ROC curve are used to comprehensively evaluate the performance of the algorithm. Finally, the LightGBM algorithm is selected with better classification performance, and its better data mining and optimization algorithm is used to predict the injury severity of electric bicyclists with higher accuracy based on fewer accident data. The interpretability of the LightGBM algorithm is further extended by using the SHAP method. The effects of different risk factors and their interaction effects on bicyclist injury severity were analyzed by identifying important risk factors.

The risk factor analysis results of injury severity of bicyclists in passenger car-electric bicycle accidents can provide references and suggestions for traffic management departments to prevent the occurrence of electric bicyclist collision accidents and reduce accident injuries. The results found that the throwing distance of the bicyclist is an important factor to predict the injury severity of the bicyclist. The farther the throwing distance is, the greater the chance of the bicyclist being seriously injured or even killed. Human factors play an important role in accidents. Bicyclists who are male, and shorter are more likely to be seriously injured in a crash accident. E-bikes have a smaller handlebar width, lower handlebar height, and lower saddle height, which have a positive impact on the bicyclist’s injury severity.

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the National Key R&D Program of China (Grant No. 2019YFE0108000) and Scientific Innovation Practice Project of Postgraduates of Chang’an University (Grant No. 300103722024).

ORCID iD

Tianzheng Wei

Data availability

The data used to support the findings of this study are available from the corresponding author upon request.

References

Schleinitz

Petzoldt

Franke-Bartholdt

, et al. The German naturalistic cycling study – comparing cycling speed of riders of different e-bikes and conventional bicycles. Saf Sci 2017; 92: 290–297.

Vlakveld

Twisk

Christoph

, et al. Speed choice and mental workload of elderly cyclists on e-bikes in simple and complex traffic situations: a field experiment. Accid Anal Prev 2015; 74: 97–106.

Langford

Chen

Cherry

. Risky riding: naturalistic methods comparing safety behavior from conventional bicycle riders and electric bike riders. Accid Anal Prev 2015; 82: 220–226.

Zhu

Zhang

, et al. Electric bicyclist injury severity during peak traffic periods: a random-parameters approach with heterogeneity in means and variances. Int J Environ Res Public Health 2021; 18(21): 11131.

Scaramuzza

Clausen

. Elektrofahrräder (E-bikes), 2010.

Popovich

Gordon

Shao

, et al. Experiences of electric bicycle users in the Sacramento, California area. Travel Behav Soc 2014; 1(2): 37–44.

Lin

Tan

, et al. Comparison study on operating speeds of electric bicycles and bicycles: experience from field investigation in Kunming, China. Transp Res Rec 2008; 2048(1): 52–59.

Dozza

Bianchi Piccinini

Werneke

. Using naturalistic data to assess e-cyclist behavior. Transp Res Part F Traffic Psychol Behav 2016; 41: 217–226.

Beratungsstelle für Unfallverhütung-bfu B. SINUS-report 2017. Sicherheitsniveau und Unfallgeschehen im Strassenverkehr 2016, 2017.

10.

Jellinek

Hildebrandt

Pfaffenbichler

, et al. MERKUR-Auswirkungen der Entwicklung des Marktes für E-Fahrräder auf Risiken, Konflikte und Unfälle auf Radinfrastrukturen. Forschungsarbeiten des oesterreichischen Verkehrssicherheitsfonds 2013; 19: 322.

11.

Johnson

Rose

. Safety implications of e-bikes, 2015.

12.

Chen

Dai

. Characteristics and casualty analysis of two-wheeler accidents in china, data source: the China in-depth accident study (CIDAS). Report no: 0148-7191, SAE technical paper, 2018.

13.

Schepers

Fishman

den Hertog

, et al. The safety of electrically assisted bicycles compared to classic bicycles. Accid Anal Prev 2014; 73: 174–180.

14.

Lawinger

Bastian

. Neue Formen der Zweiradmobilität. Eine empirische Tiefenanalyse von Pedelec-Unfällen in Baden-Württemberg. Zeitschrift für Verkehrssicherheit 2013; 59(2): 99–106.

15.

Haustein

Møller

. E-bike safety: individual-level factors and incident characteristics. J Transp Health 2016; 3(3): 386–394.

16.

Uhr

Ewert

Scaramuzza

, et al. Sicherheit älterer Verkehrsteilnehmer (bfu-Report nr. 14). Bern: bfu-Beratungsstelle für Unfallverhütung, 2016.

17.

Yang

Liu

, et al. Analysis of fatal factors influencing accidents involving two-wheel electric vehicle drivers at intersections. Leg Med 2020; 45: 101696.

18.

Hertach

Uhr

Niemann

, et al. Characteristics of single-vehicle crashes with e-bikes in Switzerland. Accid Anal Prev 2018; 117: 232–238.

19.

Zhang

Wang

, et al. Factors affecting electric bicycle rider injury in accident based on random forest model. J Transp Syst Eng Inf Technol 2021; 21: 196–200.

20.

Raslavičius

Bazaras

Keršys

, et al. Assessment of bicycle–car accidents under four different types of collision. Proc IMechE Part H: J Engineering in Medicine 2017; 231(3): 222–234.

21.

Panicker

Ramadurai

. Injury severity prediction model for two-wheeler crashes at mid-block road sections. Int J Crashworthiness 2022; 27: 328–336.

22.

Bogue

Paleti

Balan

. A modified rank ordered logit model to analyze injury severity of occupants in multivehicle crashes. Anal Methods Accid Res 2017; 14: 22–40.

23.

Abdel-Aty

. Analysis of driver injury severity levels at multiple locations using ordered probit models. J Saf Res 2003; 34(5): 597–603.

24.

Quddus

Noland

Chin

. An analysis of motorcycle injury and vehicle damage severity using ordered probit models. J Saf Res 2002; 33(4): 445–462.

25.

Chen

Zhang

Yang

, et al. An explanatory analysis of driver injury severity in rear-end crashes using a decision table/Naïve Bayes (DTNB) hybrid classifier. Accid Anal Prev 2016; 90: 95–107.

26.

Parsa

Taghipour

Derrible

, et al. Real-time accident detection: coping with imbalanced data. Accid Anal Prev 2019; 129: 202–210.

27.

Wang

Liu

, et al. Predicting future driving risk of crash-involved drivers based on a systematic machine learning framework. Int J Environ Res Public Health 2019; 16(3): 334.

28.

Theofilatos

Yannis

Antoniou

, et al. Time series and support vector machines to predict powered-two-wheeler accident risk and accident type propensity: a combined approach. J Transp Saf Secur 2018; 10(5): 471–490.

29.

Hadjidimitriou

Lippi

Dell’Amico

, et al. Machine learning for severity classification of accidents involving powered two wheelers. IEEE Trans Intell Transp Syst 2020; 21(10): 4308–4317.

30.

Montella

Aria

D’Ambrosio

, et al. Analysis of powered two-wheeler crashes in Italy by classification trees and rules discovery. Accid Anal Prev 2012; 49: 58–72.

31.

Guo

Yuan

Janson

, et al. Older pedestrian traffic crashes severity analysis based on an emerging machine learning XGBoost. Sustainability 2021; 13(2): 926.

32.

Zhou

Wang

Zhang

, et al. Comparing factors affecting injury severity of passenger car and truck drivers. IEEE Access 2020; 8: 153849–153861.

33.

Wang

Liu

(eds). Cause analysis of railway traffic accidents based on random forest. J Phys Conf Ser 2021; 1910(1): 012017. IOP Publishing.

34.

Zhu

. Investigation of vehicle-bicycle hit-and-run crashes. Traffic Inj Prev 2020; 21(7): 506–511.

35.

Lundberg

Lee

S-I

. A unified approach to interpreting model predictions. Adv Neural Inf Process Syst 2017; 30: 4765–4774.

36.

Chawla

Bowyer

Hall

, et al. SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 2002; 16: 321–357.

37.

Han

Wang

W-Y

Mao

B-H

(eds). Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: International conference on intelligent computing, 2005. Springer.

38.

Vanhoeyveld

Martens

. Imbalanced classification in sparse and large behaviour datasets. Data Min Knowl Discov 2018; 32(1): 25–82.

39.

Jiang

Xie

Wen

, et al. Modeling highly imbalanced crash severity data by ensemble methods and global sensitivity analysis. J Transp Saf Secur 2022; 14: 562–584.

40.

Wagner

Rai

Ituarte

, et al. Two-wheel vehicular trauma: an age-based analysis. Am Surg 2012; 78(10): 1066–1070.

41.

Intini

Colonna

Olaussen Ryeng

. Route familiarity in road safety: a literature review and an identification proposal. Transp Res Part F Traffic Psychol Behav 2019; 62: 651–671.

42.

Colonna

Intini

Berloco

, et al. The influence of memory on driving behavior: how route familiarity is related to speed choice. An on-road study. Saf Sci 2016; 82: 456–468.

43.

Çano

Coppola

Gargiulo

, et al. (eds). Mood-based on-car music recommendations. In: International conference on industrial networks and intelligent systems, 2016. Springer.

44.

Zhou

Yang

, et al. Research on the relationship between the individual characteristics of electric bike riders and illegal speeding behavior: a questionnaire-based study. Sustainability 2020; 12(3): 799.

45.

Jain

Menezes

Kanchan

, et al. Two wheeler accidents on Indian roads – a study from Mangalore, India. J Forensic Leg Med 2009; 16(3): 130–133.

46.

Abdel-Aty

. Investigating the different characteristics of weekday and weekend crashes. J Saf Res 2013; 46: 91–97.

47.

Taubman-Ben-Ari

Mikulincer

Gillath

. The multidimensional driving style inventory—scale construct and validation. Accid Anal Prev 2004; 36(3): 323–332.

48.

Zhu

Zhao

Coben

, et al. Why more male pedestrians die in vehicle-pedestrian collisions than female pedestrians: a decompositional analysis. Inj Prev 2013; 19(4): 227–231.

49.

Adanu

Hainen

Jones

. Latent class analysis of factors that influence weekday and weekend single-vehicle crash severities. Accid Anal Prev 2018; 113: 187–192.

50.

Kirkwood

. Why women live longer. Stress alone does not explain the longevity gap. Sci Am 2010; 303(5): '34–35.