Abstract
There are different types of severe crashes that occur on divided rural multilane highway segments. However, few research studies focused on the severity analysis of crashes on rural multilane segments, especially divided segments. Moreover, very few studies considered within-segment and within-crash correlations in the past, which would result in biased estimates in coefficients of contributing factors of the crashes occurring on divided rural multilane highway segments. Therefore, it is meaningful to figure out the risk factors for the severity of this kind of traffic crashes using multilevel ordinal logistic models to identify the effect of contributing factors more accurately. Crash data in California were employed to calibrate the model. Model fit assessment and comparison were employed to ensure the suitability of introducing the random effects. The data set was divided into three levels: segment level, crash level, and occupant level. Five major types of crashes, including head-on, sideswipe, angle, rear-end, and single-vehicle crashes, that occurred on divided rural multilane highway segments were investigated separately to figure out the contributing factors related to road geometrics, vehicle, and occupant variables. The results show that multilevel ordinal logistic modeling provides significantly better results compared to the model of traditional ordinal logistic modeling without considering the random effects. The contributing factors of different types of crashes are different. For example, the head-on crashes occurring on rural segments with street light are associated with a lower likelihood of severe injury of crashes. The sideswipe crashes on rural segments without curbs or with unpaved median types are associated with a higher likelihood of severe injury. The rear-end crashes on rural segments with higher design speed are associated with a higher likelihood of severe injuries. It is also shown from the modeling results that the use of cell phone while driving would increase the likelihood of severe injury of sideswipe, angle, and single-vehicle crashes.
Keywords
Introduction
Divided rural multilane segment is among the most common road types, which accounts for a high portion of severe traffic crashes. There are tens of thousands of severe traffic crashes occurring on this specific type of segment each year. 1 A number of previous research studies have been conducted to examine the risk factors of crash frequencies on segments.2,3 Park et al. 4 explored and compared crash modification factors for multiple treatments on rural multilane roadways. In addition, considerable efforts have been made to analyze the risk factors of different crash severity levels. Ulfarsson and Mannering 5 employed a multinomial logit model for investigating crash severity of different types of traffic crashes. Conroy et al. 6 applied a multinomial logit model to analyze the crash severity of rollover crashes. Eluru et al. 7 focused on predicting crash frequencies with different severity levels to investigate the various risk factors that affect crash severity. Brinkman and Mak 8 proposed that a high percentage of all crashes occurred on bridges and were more likely to result in fatal crashes. Retting et al. 9 investigated police-reported crashes of four-lane highways and found that more crashes occurred on bridges than on other road segments. However, few research studies focused on rural multilane highway segments, especially on divided segments. There are many severe crashes that occur on divided rural multilane highway segments each year. Divided rural multilane highways have special characteristics compared to other types of road segments. They usually have posted relatively high speed limits, with physical medians or two-way right-turn lanes. The traffic volumes as well as severe traffic crashes are usually large on these segments. The contributing risk factors of those crashes are different compared to other types of segments. Therefore, it is necessary and meaningful to figure out the contributing risk factors of those severe traffic crashes occurring on these specific segments for the purpose of developing cost-effective safety countermeasures to mitigate the cost of those crashes.
Investigating risk factors of those traffic crashes occurring on divided rural segments can be conducted in different ways. However, the response variable crash severity is considered more properly as ordinal in nature by some other traffic safety researchers. For example, O’Donnell and Connor 10 employed the ordered logit model to analyze motor vehicle crash injuries. Wang and Kockelman 11 proposed the ordered logit model to investigate the contributing factors of occupant injury severity, including speed limit and vehicle type. Wang and Abdel-Aty 12 examined the left-turn crash injury severity using partial proportional odds model. Eluru et al. 7 and Yamamoto et al. 13 modified standard ordered logit model by allowing separate parameter coefficients for variables to analyze the crash severity. However, the techniques used in most past studies assumed independence between different observations. They normally considered each crash as the independent unit. These techniques may not be adequate in modeling severity of occupant injury and vehicle damage in the presence of potential correlations between those involved in the same multi-vehicle crashes. This correlation between samples has already been identified in some earlier studies; for example, Evans and Frick14,15 found that in a multiple vehicle crash, the risk of fatality was dependent on the characteristics of the other vehicle. Crashes within a given segment are randomly assigned to risk factors and prospectively studied over time. Here, the repeated measurements are nested within crashes which are nested within segments to create different levels of severity. It is better to account for all levels of clustering when analyzing these multilevel crash data if possible. Without considering the variability either in or between different levels, especially when the covariance is significant, would lead to invalid tests of hypothesis and misleading inferences and conclusions regarding the overall significance of the risk factors and corresponding countermeasures.
Therefore, multilevel ordinal logistic models were employed in this study to analyze the effects of risk factors. Some applications of the multilevel modeling technique have already been found in traffic safety research. Jones and Jørgensen 16 presented a good exploration and discussion on the potential applications of the multilevel models. They developed multilevel models to identify factors affecting crash severity, while Kim et al. 17 employed the multilevel crash prediction models for different crash types at rural intersections. Huang and colleagues18,19 employed a Bayesian multilevel binomial logistic model to analyze the severity of occupant injury and vehicle damage in traffic crashes at intersection. However, very few previous research studies applied multilevel ordinal logistic models to investigate the crash severity. The crash severity in traffic collisions should be better considered as ordered discrete variables and analyzed using ordered logistic models. 20 This multilevel model will allow random effects at three or more levels of clustering.
In an analysis of occupant severity of crashes at divided rural multilane highway segments in California State, a multilevel ordinal logistic model was estimated to examine the significant risk factors related to severity of traffic crashes by considering both within-segment and within-crash correlation. Crashes occurring on these segments are considered as several clusters, and there are subclusters per cluster, that is, occupant–vehicle units involved in a crash. A logistic response function is assumed and a full maximum marginal likelihood solution is used to estimate the parameters in the model. Gauss–Hermite quadrature method is used to directly calculate the log likelihood. This method has the advantage of permitting likelihood ratio tests for comparing nested models and can be quite accurate for minimizing bias to explicitly model the three-level data structure including segment level, crash level, and occupant level. The likelihood ratio (LR) test is applied in model assessment and comparison and then random effects on segment level and crash level in the model is further confirmed to be significant in this study in accounting for the within-segment and within-crash correlation. In the remainder of this article, a description of methodological framework and data description is given. Model estimation and comparison are then summarized to illustrate the significant risk factors on occupant severities by comparing traditional ordinal logistic models and three-level ordinal logistic models. Five different types of crashes were compared, including single-vehicle, angle, head-on, sideswipe, and rear-end crashes. The summary and conclusions of this study are then presented.
Data collection and methodology
Multilevel ordinal logistic model
To describe the multilevel ordinal logistic model applied in this research, the three-level data structure is constructed as follows. It is assumed that there are k = 1,…, nij Level 1 units that are nested within j = 1,…, ni Level 2 units that are in turn nested within i = 1,…, n Level 3 units. A latent dependent variable y is related to the ordinal response Y when considering the threshold concept. The outcome is ordinal instead of one threshold value in this research. There are three ordered categories in total representing three levels of crash severity. The crash severity labeled as fatal and severe injury is combined as one level of crash severity. The crash severity labeled as other visible injury and complaint of pain is combined as another level of crash severity. Property damage only is considered as the third level of crash severity. Therefore, three values on the boundary separate occupants into the various response categories. The value of the response variable is determined by the interval in which their unobserved response falls. In other words, the crash severity is in category A (Y = A) when y exceeds the threshold value α1, but does not exceed the threshold value α2. A series of thresholds α1, α2, and α3 are assumed in this multilevel ordinal logistic model, where A is equal to the number of ordered categories which is in this study. It is assumed that a logistic distribution for the underlying latent variable leads to an ordinal logistic regression model accordingly. In terms of the latent response strength for subject j in cluster i on occasion k (yijk), the three-level ordinal logistic model can be written as 21
where xijk is the covariate vector,
where
Then, the multilevel ordinal logistic model can be written in terms of the cumulative logits as
The meologit in STATA was employed to fit multilevel ordinal logistic model. The actual values taken on by the response are irrelevant except that larger values are assumed to be related to “higher” outcomes. The conditional distribution of the response given the random effects is assumed to be multinomial, with success probability determined by the logistic cumulative distribution function. This multilevel ordinal logistic model contains both fixed effects and random effects. Meologit allows for many levels of nested clusters of random effects. In this model, the occupants comprise the first level, the crashes comprise the second level, and the segments comprise the third, which means the model has two random effects equations. The first is a random intercept at the segment level and the second is a random intercept at the crash level.
Description of the crash data set
The data were collected from the Highway Safety Information System (HSIS) for crashes that occurred on rural divided multilane highways in California from 2005 to 2010. There are four types of files which can be merged into the same file using linked variables. The accident file has detailed information about each crash, including the number of control section, accident type, primary collision factor, lighting, and weather condition. The variables providing information about the location of crashes are included in the road file. Some other important roadway-related variables such as Median Type, Terrain, Design Speed, and Surface Type are also included in the road file. Some major characteristics of these segments are shown in Table 1. There are 1349 rural divided multilane highway segments comprising 709.83 miles in total.
Road segments data descriptive statistics.
AADT: average annual daily traffic; vpd: vehicles per day.
The vehicle file includes information about the type of vehicles and some other information about occupants such as age, sex, fatigue, and whether using cell phone or not. The occupant file includes information about the occupants involved in the crash other than occupants, including type of occupants and their seat position when the accident occurred. After preliminary checking of all the variables including correlations, Table 2 shows all the possible contributing variables selected by experienced traffic safety researchers. All these variables are confirmed as independent variables. After that, all these possible contributing variables have been included in the preliminary severity analysis model. There are 26 independent variables in total. A total of 9194 observations were included in the model. The observations recorded “other” or “not stated” were excluded from the following severity modeling analysis.
List of possible contributing variables.
Modeling results with and without controlling for multilevel structure
As mentioned in the above section, a total of 9194 crashes were identified in the divided rural segments from 2005 to 2010. There are 192 head-on crashes, 1070 sideswipe crashes, 2770 rear-end crashes, 1198 angle crashes, and 3964 single-vehicle crashes. The modeling results of each type of crashes are shown in Tables 3–7. It is noted that the correlation between independent variables are checked before being included in the final model to ensure that these independent variables are not correlated.
Comparison of modeling results of head-on crashes.
LR: likelihood ratio.
Comparison of modeling results of sideswipe crashes.
LR: likelihood ratio.
Comparison of modeling results of rear-end crashes.
LR: likelihood ratio.
Comparison of modeling results of angle crashes.
LR: likelihood ratio.
Comparison of modeling results of single-vehicle crashes.
LR: likelihood ratio.
Modeling results of head-on crashes
In order to illustrate the impact of using multilevel modeling techniques, all of the crash models were compared with traditional ordinal logistic models. Table 3 shows a comparison of the goodness of fit with and without controlling for multilevel structure for the head-on crashes. There are only 192 head-on crashes occurring on the divided rural multilane highway segment. The goodness of fit of the modeling results show that the multilevel ordinal logistic modeling provides a significantly better result compared to the traditional ordinal logistic modeling without considering the random effects, which can be seen from the results of the both values of log likelihood. The log likelihood of the multilevel model labeled as log likelihood1 is −514.5881, which is larger than log likelihood2 using the traditional model. The LR test further confirms that the difference is significant at 0.05 level, which verifies the better goodness of fit for the multilevel logistic modeling. The variance-component estimates are also shown in Table 3. The variance component for crash is labeled segment > crash to emphasize that crashes are nested within segments. The estimate of the variance of the random intercept at the segment level is 1.21, while the estimate of the variance of the random intercept at the crash level is 1.84. It means that parts of unexplained variations in severity were resulted from segment variance and crash variance, which confirm the usefulness of the multilevel structure. Moreover, six variables were found to be statistically significant at the 0.05 level when multilevel ordinal logistic modeling was applied. The four variables mountain, concrete surface type, clear weather, and without any access control were found to be positively associated with the increase of crash severity, which means that the head-on crashes occurring on these rural segments (when terrain is mountainous, surface type is concrete instead of asphalt, the weather is clear, and without any access control) are more likely to be severe. The other two variables with light and principle arterial were found to be negatively associated with the increase of crash severity, which means that the head-on crashes occurring on segments with street lights and constructed as rural principal arterials are associated with a lower likelihood of severe injury crashes.
Modeling results of sideswipe crashes
Table 4 shows a comparison of the modeling results with and without controlling for multilevel structure for the sideswipe crashes. The goodness of fit results and LR test results also show that multilevel ordinal logistic modeling provides a significantly better result compared to the traditional ordinal logistic modeling. The log likelihood1 is −1635, which is larger than log likelihood2−1659 using the traditional model. The estimate of the variance of the random intercept at the segment level is 1.12, and the estimate of the variance of the random intercept at the crash level is 1.53. Moreover, 10 variables were identified to be statistically significant at the 0.05 level when multilevel ordinal logistic modeling was applied. The five variables mountain, rolling, rural principal arterial, minor arterial, and with light were found to be negatively associated with the increase of crash severity, which means that the sideswipe crashes occurring on these rural segments (when terrain is mountainous or rolling instead of flat terrain, with light, on principal arterial or minor arterials instead of on collectors or local segments) are less likely to be severe. The other five variables no curb, motorcycle involved, unpaved median type, cell phone in use, and the primary collision factor failure to yield were found to be positively associated with the increase of crash severity, which means that the sideswipe crashes occurring on rural segments without curbs are associated with a higher likelihood of severe injury crashes. In addition, the likelihood of severe injury of crashes would increase when a motorcycle is involved and the primary collision factor is failure to yield. The severe injury of sideswipe crashes is also more likely to occur on rural segments with unpaved median type and the use of cell phone while driving.
Modeling results of rear-end crashes
Table 5 shows a comparison of the modeling results with and without controlling for multilevel structure for the rear-end crashes. The number of rear-end crashes that occurred on the divided rural multilane highway segment is 2770, which is the largest type of multi-vehicle crashes. The goodness of fit results and LR test results show that multilevel ordinal logistic modeling provides a significant better result compared to the traditional ordinal logistic modeling. The log likelihood1 is −2735, which is larger than log likelihood2−2769 using the traditional model. The estimate of the variance of the random intercept at the segment level is 2.12, while the estimate of the variance of the random intercept at the crash level is 3.03. Moreover, multilevel modeling results show that five variables were identified to be statistically significant at the 0.05 level. The two variables daylight and had not been drinking were found to be negatively associated with the increase of crash severity, which means that the rear-end crashes occurring during daytime instead of dusk or nighttime and in the situation when the driver involved in the crash had not been drinking are less likely to be involved in severe crashes. The other three variables design speed, motorcycle involved, and the primary collision factor failure to yield were found to be positively associated with the increase of crash severity, which means that the rear-end crashes occurring on rural segments with higher design speed are associated with a higher likelihood of severe injuries. The likelihood of severe injury of crashes would increase when a motorcycle is involved and the primary collision factor is failure to yield.
Modeling results of angle crashes
Table 6 shows a comparison of the modeling results with and without controlling for multilevel structure for the angle crashes. There are 1198 angle crashes in total that occurred on the divided rural multilane highway segment. The log likelihood1 is −1325, which is larger than log likelihood2 1369 using the traditional model. The estimate of the variance of the random intercept at the segment level is 0.63, while the estimate of the variance of the random intercept at the crash level is 1.30. Moreover, seven variables were identified to be statistically significant at the 0.05 level when the traditional ordinal logistic modeling was applied. The three variables mountain, principal arterial, and had not been drinking were found to be negatively associated with the increase of crash severity, which means that the angle crashes occurring on those rural mountainous segments, when drivers had not been drinking, are less likely to be involved in severe crashes. The severe injury of angle crashes is less likely to occur on rural segments functioning as principal arterial. The other four variables design speed, motorcycle involved, cell phone in use, and failure to yield were found to be positively associated with the increase of crash severity, which means that the angle crashes occurring on rural segments with higher design speed are associated with a higher likelihood of severe injury of crashes. The use of cell phone while driving would increase the likelihood of severe injury of crashes. In addition, similar to the results of sideswipe and rear-end crashes, the likelihood of severe injury of crashes will increase when a motorcycle is involved and the primary collision factor is failure to yield.
Modeling results of single-vehicle crashes
Compared to the above four types of multi-vehicle crashes, Table 7 shows a comparison of the modeling results with and without controlling for the multilevel structure for single-vehicle crashes. There are 3964 single-vehicle crashes that occurred on the divided rural multilane highway segment. It can also be seen from the goodness of fit results and LR test results that multilevel ordinal logistic modeling provides a significant better result for the modeling of single-vehicle crashes. The estimate of the variance of the random intercept at the segment level is 0.70, while the estimate of the variance of the random intercept at the crash level is 1.02. Moreover, seven variables were identified to be statistically significant at the 0.05 level when multilevel ordinal logistic modeling was applied. The three variables median width, rolling, and had not been drinking were found to be negatively associated with the increase of crash severity, which means that the single-vehicle crashes occurring on those rural rolling segments with larger median width when drivers had not been drinking are less likely to be involved in severe crashes. The other four variables unpaved median type, motorcycle involved, cell phone in use, and the primary collision factor failure to yield were found to be positively associated with the increase of crash severity, which means that the single-vehicle crashes occurring on rural segments with unpaved median type are associated with a higher likelihood of severe injury of crashes. In addition, similar to the results of sideswipe crashes and rear-end crashes, the likelihood of severe injury of crashes will increase when a motorcycle is involved and the primary collision factor is failure to yield.
Summary and conclusions
There are few research studies that focused on severity analysis of crashes occurring on rural multilane highway segments, especially on divided segments. In addition, the crash severity in traffic collisions should be better considered as ordered discrete response variables and analyzed using ordered logistic models. This research identified the contributing factors of the severity of five types of traffic crashes in divided rural multilane highway segments using multilevel ordinal logistic models. Crash data in California were extracted to calibrate the model. The data set was divided into three levels: segment level, crash level, and occupant level. Head-on, sideswipe, angle, rear-end, and single-vehicle crashes that occurred on divided rural multilane highway segments were investigated separately to figure out the contributing factors. The following major conclusions have been reached in this research.
According to the goodness of fit test and LR test, all the modeling results show that the multilevel ordinal logistic modeling provides a significantly better result compared to previous researches that the traditional ordinal logistic modeling was applied without considering the random effects. In addition, it is shown from all the modeling results that parts of unexplained variations in severity resulted from segment variance and crash variance, which confirmed the advantage of accounting for the within-segment and within-crash correlation.
It can be concluded from the results of multilevel ordinal logistic modeling on five different types of crashes that some risk factors, including cell phone in use while driving, failure to yield, unpaved median type, and higher design speed, are associated with a higher likelihood of severe injury of crashes for at least two types of crashes. The other variables such as concrete surface type, without any access control, and without curb are also associated with a higher likelihood of severe injury of one specific type of crash.
It can also be seen from the modeling results that variables such as with street light, on principle arterials or minor arterials, and drivers had not been drinking are associated with a less likelihood of severe injury of crashes for at least two types of crashes. The other variables such as higher median width, rolling terrain, and crashes occurred during daytime are associated with a less likelihood of severe injury of crashes. Sideswipe and angle crashes occurring on segments with mountainous terrain are less likely to be involved in severe crashes, while head-on crashes occurring on these mountainous segments are more severe. Therefore, more cost-effective safety countermeasures can be taken for traffic managers to mitigate the severe injury of crashes occurring on divided rural multilane highway segments according to the above conclusions. For future work, the underreporting issue in crash data should also be examined.22,23
Footnotes
Handling Editor: Hong Yang
Data availability
The data used to support the findings of this study are available from the corresponding author upon request.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The study was jointly sponsored by Chinese National Science Foundation (grant no. 71601143), Jiangsu key laboratory of traffic and transportation security in Huaiyin Institute of Technology (grant no. TTS2018-04), National Key R&D Program of China (no. 2018YFB1201403) and Natural science fund for colleges and universities in Jiangsu Province (grant no. 18KJA580001). All opinions are those only of the authors.
