Abstract
Rear-end collision accounts for the main type of traffic accidents occurring on the freeway. In order to extract the significant influence factors of rear-end collision on the freeway, this study utilized the data of freeway traffic accidents between 2010 and 2015 in China. First, based on quasi-induced exposure theory, the information of driver, vehicle, and road environment was analyzed. Gender, age, driving age, vehicle safety, load, weather, fatigue, driving speed, road alignment, accident time, and visibility were selected as the important factors that might affect rear-end collision. Second, based on logistic regression model, the influencing factors analysis model of freeway rear-end collision was established. In the regression analysis, the possible important factors selected were taken as the independent variables, and the accident responsibility was taken as the dependent variable. Then, the factors that had significant influence on rear-end collision were selected from candidate independent variables by stepwise regression method. Finally, the specific influence of driving age, load, weather, accident time, visibility, fatigue, and driving speed on rear-end collision occurring on the freeway was discussed. The analysis results were explained according to the odds ratio. The research results of this article can provide guidance for the prevention of rear-end collision on the freeway and theoretical support for the development of freeway early warning system.
Introduction
Freeway has the advantages of high running speed, large traffic capacity, and low transportation cost. During the past 10 years, traffic mileage of freeway has increased rapidly in China. It was calculated that traffic mileage of freeway in China had exceeded 130,000 kilometers by the end of 2016. However, according to the traffic accident data provided by the Traffic Management Bureau of the Public Security Ministry, the rear-end collision is a very common type of traffic accident that occurs on the freeway. Figure 1 shows the proportions of numbers, deaths, injuries, and direct property losses of rear-end collision in freeway accidents in China during 2010–2015. 1 From the Figure 1, we can see that in the 6 years of statistics, each attribute of rear-end collision accounts for more than 30%. The proportion is higher and there is no obvious downward trend. Therefore, according to the existing data samples of rear-end collision, this article analyzed the specific influence of significant factors on the freeway rear-end collision based on the quasi-induced exposure theory and the logistic regression model. First, based on the quasi-induced exposure theory, the important factors that might affect the occurrence of rear-end collision were preliminarily screened out. Then, the logistic regression model was used to quantitatively analyze the influencing factors of the preliminary selection, and the stepwise regression method was used to screen out the significant influencing factors of the rear-end collision from the candidate independent variables. Finally, the specific influence on rear-end collision occurring on the freeway was discussed. The analysis results can provide more effective countermeasures for preventing rear-end collision—this situation has notable practical significance.

The proportions of four indexes of rear-end collision in freeway accidents.
Literature review
Existing researches have been helped us to better understand the characteristics of rear-end collision and reduce accident risk more effectively. Based on the accident data, scholars have conducted a lot of researches. For example, in order to examine the accident characteristics, Yan et al. 2 utilized the traffic accident data of Florida in 2001 to investigate the accident propensity for different vehicle roles (striking or struck) that were involved in the accidents and identify the significant risk factors related to the traffic environment, the driver characteristics, and the vehicle types. Padlo and Stamatiadis 3 used the quasi-induced exposure technique with police-reported crashes between 1997 and 2001. Wang and Abdel-Aty 4 used the generalized estimating equations with the negative binomial link function to model rear-end crash frequencies at signalized intersections, accounting for the temporal or spatial correlation among the data. Oh and Kim 5 proposed a methodology for estimating rear-end crash potential, as a probabilistic measure, in real time based on the analysis of vehicular movements. The methodology presented consists of two components. The first estimated the probability that a vehicle’s trajectory belonging to either “changing lane” or “going straight.” A binary logistic regression (BLR) was used to model the lane-changing decision of the subject vehicle. The other component derived crash probability by an exponential decay function using time-to-collision (TTC) between the subject vehicle and the front vehicle. Ouimet et al. 6 examined, per 10 million vehicle trips (VT) and vehicle-miles traveled (VMT), the relative risk of fatal crash involvement in 15–20-year-old male and female drivers as a function of their passenger’s age and gender, using solo driving as the referent. Based on work zone traffic data in Singapore, Meng and Weng 7 developed three rear-end crash risk models to examine the relationship between rear-end crash risk at activity area and its contributing factors. The fourth rear-end crash risk model was developed to examine the effects of merging behavior on crash risk at merging area. Haque et al. 8 had calculated motorcycle crash risks in different circumstances after controlling for the exposure estimated by the induced exposure technique. Huggins 9 adapted the ideas behind the well-known induced exposure methods and used available summary data on speeding detections and fatalities for motorcycle riders and car drivers to estimate the relative risk of a fatality for motorcyclists compared to car drivers under mild assumptions. Yunteng et al. 10 formulated a generalized non-linear model (GNM)–based approach for modeling highway rear-end crash risk using Washington State traffic safety data—this model better elaborated non-monotonic relationships between the independent and dependent variables. Jiang et al. 11 conducted a comprehensive review on approximately 45 published papers relevant to quasi-induced exposure regarding four key topics of interest: applications, responsibility assignment, validation of assumptions, and methodological development. Chen et al. 12 developed a hybrid approach to combine multinomial logit models and Bayesian network methods for comprehensively analyzing driver injury severities in rear-end crashes based on state-wide crash data collected in New Mexico from 2010 to 2011. Based on a 2-year rear-end crash dataset, Chen et al. 13 applied a decision table/Naïve Bayes (DTNB) hybrid classifier to select the deterministic attributes and predict driver injury outcomes in rear-end crashes. Carney et al. 14 examined over 400 teen driver rear-end crashes captured by in-vehicle event recorders. A secondary data analysis was conducted, paying specific attention to driver behaviors, eyes-off-road time, and response times to lead-vehicle braking. Fleiter et al. 15 aimed to better understand perceptions of safe following distance, inviting 495 licensed Queensland drivers (42% male; mean age = 46.2 years; range = 16–81 years) to complete an online questionnaire. Han and Rho 16 constructed a total of five scenarios by using a statistical analysis based on the National Automotive Sampling System Crashworthiness Data System and by setting the gap between the cars based on the travel distance calculated from the reaction time of the driver, the speed, and/or the deceleration of the car. In order to investigate the contributing factors that affect the occurrence and severity of the rear-end crashes in Abu Dhabi. Mohamed et al. 17 applied descriptive statistical analysis and binary logit model approaches to achieve the study objectives. The results showed that seven variables were significantly affecting the severity of rear-end crashes. Four variables belonged to drivers’ characteristic and behavior including tailgating, driving too fast, years of experience, and the issue location of the driving licenses. Two variables related to road characteristics: road type and the number of lanes. One variable related to vehicle type. Li et al. 18 aimed to reduce multi-vehicle rear-end (MVRE) crash risks during small-scale inclement (SSI) weather using different longitudinal driver assistance systems (LDAS). Simulation experiments were designed and a large number of simulations were then conducted to evaluate safety effects of different LDAS. Dimitriou et al. 19 estimated rear-end crash possibility based on stopping distance between two consecutive vehicles, and four rear-end crash potential cases were developed.
Moreover, recent studies also showed that some new techniques were used in the field of safety analysis. For example, the random parameters model could account for the heterogeneity in crash modeling. 20 The traffic simulation was used to generate surrogate safety measures. 21
Above all, the research status showed that based on rear-end collision data, most of the researches focused on analyzing influencing factors by taking accident severity as dependent variable. The factors affecting the severity of rear-end collision could be explained accurately and comprehensively. It was helpful to reduce the severity of the rear-end collision. However, the occurrence of rear-end collision may also be developed from the hidden dangers. In these analyses, taking accident liability as the dependent variable to reveal the causes of rear-end collision on the freeway, the relevant research is insufficient. Based on the current research situation, taking the real data of freeway rear-end accidents as samples, the quasi-induced exposure and the logistic regression analysis as theoretical basis, and the accident liability as dependent variable, the significant influencing factors of the cause of freeway rear-end collision accidents were explored by this article.
Methodology
Quasi-induced exposure
Quasi-induced exposure theory is suitable to discuss risk exposure of traffic subject at micro level. It is widely used to evaluate the effectiveness of security policy, as well as analyze accident tendency, safety behavior, and accident causation of the traffic subject under the special temporal and spatial agglomeration environment.
The concept of risk exposure was first proposed by DeSilva in 1942. At present, risk exposure is commonly defined as the frequency of accident occurrence of the certain traffic subject in the special time and space. Frequently used calculation parameters of risk exposure at macro level involve vehicle mileage, trip duration, and traffic flow. However, with the passage of time, the differences between different types of drivers and vehicles using different elements of road system cannot be accurately reflected by traditional exposure indicators (such as vehicle mileage), due to the fact that it is difficult to collect a large number of the data needed.
Therefore, Thorpe 22 first proposed theoretical framework of risk exposure based on accident database at micro level. The next step in the use of induced exposure was taken by Carr who introduced the notion of being able to identify the driver responsible for the occurrence of a multiple-vehicle accident based on the investigating police officer’s report. Haight 23 called this approach quasi-induced exposure. Both Thorpe and Carr measured the relative involvement to exposure ratio using as the numerator the percentage of accidents for a given driver/vehicle group and as the denominator the exposure as calculated by their models. Stamatiadis and Deacon 24 presented a critical examination of an induced exposure technique, based on the non-responsible driver/vehicle of a two-vehicle accident (quasi-induced exposure). They concluded that the quasi-induced exposure was a powerful technique for measuring relative exposure of drivers or vehicles when real exposure data are missing. Deyoung et al. 25 applied the quasi-induced exposure method to fatal crash data obtained from the National Highway Traffic Safety Administration’s Fatal Accident Reporting System, generating exposure and crash rate estimates for S/R drivers in California.
The use of quasi-induced exposure can provide another way to stratify data with location and time, thus reflecting the differences among driver/vehicle characteristics of each combination. By using the actual vehicle classification data, the quasi-induced exposure theory proves that the use of non-responsible vehicles is a reasonable and accurate exposure measure, and explores the relative tendency of various drivers and vehicles related to the single-vehicle and multi-vehicle accidents.
Meanwhile, quasi-induced exposure is defined on the basis of two hypotheses. First, the traffic accident occurs between two cars, one of which is fully responsible for the accident, while the other is not. Second, it is assumed that irresponsible accident participants are typical or random representatives of all drivers throughout the whole two-car accidents. Compared with the macro-level method, the biggest advantage of this method is that it estimates exposure only based on the accident data rather than traffic flow and driving distance. Therefore, the core of the quasi-induced exposure theory is the measurement of specific characteristics exposure. Subsequently, the ratio of two-car-rear-end collision participants with specific characteristics to irresponsible accident participants with the same queue is defined as the relative rear-end collision involvement ratio (RCIR). The parameter is an indicator of whether a driving population causes disproportionately more or fewer crashes as to its presence on the road. Obviously, the way of measuring relative exposure and crash propensity inherently avoids the assumption of linear relationship between crash counts and the distance traveled as confronted by VMT.
Based on the quasi-induced exposure theory, 638 two-car-rear-end collision accidents in China during 2010–2015 were selected, provided by the Traffic Management Bureau of the Public Security Ministry. Based on the quasi-induced exposure theory, this article studied the risk exposure of traffic subjects from the micro level using the real data.
In the past researches, Ouimet et al. 6 used accident responsibility identification by traffic police as the basis for the division of responsibility. Therefore, the accident responsibility was determined based on the responsibility identification of traffic police in this article. Meanwhile, the application of quasi-induced exposure theory was supplemented by assuming that irresponsible rear-end accident participants and the corresponding attributes (vehicles, roads, and environment) were typical, or random representatives of all driving groups on the freeway.
In order to reduce the irregularity of the accident data as much as possible, improve the “clean” degree of samples, and increase the reliability of analysis results, the selected accident data had the following basic characteristics:
All the accident data were based on two-car-rear-end collision accidents on the freeway.
The responsibility of rear-end collision was certified by traffic police, and the responsibility was clearly divided.
The driver in a rear-end accident shall be liable for engaging in dangerous driving behavior.
At present, the traffic management departments of China regularly obtain traffic accident data from the road traffic accident statistics. There are 56 information items in the Traffic Accident Information Collection Item List. According to the quasi-induced exposure theory, by comparing the relative RCIRs of various information items, the factors that might have significant impacts on the occurrence of rear-end collision were preliminarily screened out: gender, age, driving age, vehicle safety, load, weather, fatigue, driving speed, road alignment, accident time, and visibility. The RCIR value of each information item was shown in Table 1.
RCIR value of each information item.
RCIR: rear-end collision involvement ratio.
Logistic regression analysis
Since the study of factors that might have significant impacts is conducted in a qualitative manner without any modeling practice, the analysis was not sufficient only within the scope of the quasi-exposure theory, and the analysis lacked quantitative evaluation of influence direction and influence extent of each possible factor. Therefore, it is necessary to select appropriate research methods to further analyze these factors and screen out the significant influencing factors for rear-end collision quantitatively.
As a special form of logarithmic linear model, the logistic regression model is a mature theoretical model for statistical analysis of multiple variables. 26 Most scholars analyzed the influencing factors of accident severity by the logistic regression analysis and had achieved a series of research results. Choi et al. 27 applied a BLR technique to identify causal factors affecting truck crash severity under normal and adverse weather conditions. Feng et al. 28 investigated the underlying risk factors of fatal bus accident severity to different types of drivers in the United States by estimating an ordered logistic model. At present, most of the studies are based on traffic accident data and take accident severity as the dependent variable.
According to the quasi-induced exposure theory, gender, age, driving age, vehicle safety, load, weather, fatigue, driving speed, road alignment, accident time, and visibility were taken as the candidate independent variables for the logistic regression analysis. Accident responsibility (namely, rear-end collision occurrence;
Equation (1) was conducted with Logit transform to derive Equation (2)
where
where
After that, the independent variables were encoded in Table 2.
Description of candidate independent variables.
Not all influencing factors had significant effects to freeway rear-end collision. Therefore, the factors with significant effects to rear-end collision from the candidate independent variables needed to be screened. The independent variables could be screened by forward, backward, and mixed stepwise regression methods. In this article, the stepwise regression method was used to screen the significant influencing factors associated with rear-end collision on the freeway.
According to the thought of the forward method, the influencing factors with higher significant levels were gradually introduced into analysis model. The introduction process did not stop until all the factors were introduced. The specific procedure was as follows:
Step 1: For 11 regression independent variables
The values of F-test statistic, which respectively belonged to the corresponding regression coefficient of variable
For a given saliency level
Step 2: The binary regression models of dependent variable Y and independent variable subset
Similar to the Step 1, the values of F-test statistic, which respectively belonged to the corresponding regression coefficient of variable subset, were calculated and then denoted as
For a given saliency level
Step 3: Considering the regression process of dependent variable to the subset of independent variables, step 2 was repeated.
This method was repeated, and one of the independent variables never introduced into the regression model was selected at a time until there was no variable introduced after testing.
The backward method was just the opposite of the forward method. It chose all independent variables into the regression model beforehand and eliminated any independent variables that contributed less to the sum of squares of residual errors one by one.
Both the forward and backward methods had certain disadvantages. The implementation process of stepwise regression analysis was to calculate the sum of partial regression squares (i.e. contribution) of the variables that had been introduced into the regression equation at each step, and then selected a variable with the smallest sum of partial regression squares to test the significance at a given level. If it was significant, the variable did not need to be removed from the regression equation, and then other variables in the equation did not need to be removed. On the contrary, if it was not significant, the variable needed to be excluded, and then other variables in the equation were tested by the sum of partial regression squares from small to large. All the variables that had no significant effect would be excluded and the reservations would be significant. Then, the sum of squares of partial regression was calculated for the variables that were not introduced into the regression equation, and the variable had the largest sum of squares of partial regression were selected. Meanwhile, the significance test was done at a given level. If it was significant, the variable would be introduced into the regression equation. This process would continue until the variables in the regression equation could not be eliminated and no new variables could be introduced. This was the end of the stepwise regression process.
Results and discussions
In the article, SPSS software was applied for the logistic regression analysis. It was denoted that under the significance levels
Logistic regression analysis result.
DOF: degrees of freedom.
The result showed the conclusions as follows:
Driving age
The results showed that the drivers with 6–10 years of driving experience had the highest probability of rear-end collision. The drivers with more than 16 years of experience were 60% lower than those with 1–5 years in the probability of rear-end collision. The drivers with 11–15 years of experience were 4% lower than those with 1–5 years in the probability of rear-end collision. However, the drivers with 6–10 years of experience were 8% higher than those with 1–5 years in the probability of rear-end collision.
Causal analysis
For drivers, longer driving age leads to more abundant driving experience. Driving experience can affect the behaviors, psychological, and physiological characteristics of drivers. Regulations on Application and Use of Driving License issued by the Ministry of Public Security of People’s Republic of China described the following provisions. During the internship, when driving a car on the freeway, the novice must be accompanied by a driver with a license of 3 years or more. There was a low ratio of novices on the freeway; thus, the highest probability of rear-end collision was caused by the drivers with 6–10 years of experience.
Load
The results showed that under the significance level of 0.026, the probability of rear-end collision for the non-overload vehicles was 30% lower than that for the overload vehicles.
Causal analysis
Due to the increase of transportation cost, economic benefit, insufficient understanding of overloading, and the difficulty of law enforcement, rear-end collision often occurs with serious overloading problems on the freeway. Vehicle overload causes tire overload; under this action, tire burst phenomenon occurs from time to time; it causes engine overload, which is prone to overheating; it causes steering system overload, increasing turning centrifugal force, giving rise to vehicle rollover; moreover, it increases the inertia and braking distance of vehicles. Long-time braking is easy to cause brake overheating, leading to rear-end collision. Besides, overload will also affect road capacity by destroying road infrastructure.
Weather
The results showed that under the significance level of 0.001, the probability of rear-end collision under the non-fine weather was 1.53 times higher than that under the fine weather.
Causal analysis
With the deterioration of weather and pavement conditions, the possibility of rear-end collision on the freeway increases significantly. Bad weather (snow, rain, etc.) is often the direct reason for rear-end collision on the freeway. Snow and rain make the road smooth and reduce the road adhesion coefficient, thus increasing the braking distance. When the front vehicle suddenly brakes or there is an illegal vehicle in the front, rear-end collision will easily occur.
Accident time
The results showed that under the significance level of 0.001, the probability of rear-end collision at night was 2.14 times higher than that of daytime.
Causal analysis
When driving at night, drivers often suffer from poor vision and fatigue due to the insufficient light—this situation can easily lead to rear-end collision.
Visibility
The results showed that when the visibility was less than 50 miles, the probability of rear-end collision was the highest. The probability of rear-end collision with the visibility of 50–100 miles was 46% lower than that with the visibility of less than 50 miles. The probability of rear-end collision with the visibility more than 200 miles was 52% lower than that with the visibility less than 50 miles. And the probability of rear-end collision with the visibility greater than 200 miles was 65% lower than that with the visibility less than 50 miles.
Causal analysis
Foggy weather is the main cause of low visibility. Visibility reduction shortens the driver’s visual distance, makes it difficult to control the vehicle, and easily leads to the rear-end collision. Low visibility can also make the driver misjudge the distance of the vehicle around. Under the certain temperature and humidity conditions, the road surface is very easy to form fog. And vehicle skidding will cause the rear-end collision. In winter, there will be the water vapor condensation inside the window—this situation will hinder the driver’s sight.
Fatigue
The results showed that under the significance level of 0.017, the probability of rear-end collision caused by fatigue driving was 55% higher than that caused by non-fatigue driving.
Causal analysis
Fatigue driving can greatly reduce the driver’s reaction speed, which is mainly manifested in distraction, judgment disorders, and sleepiness in severe cases. Therefore, the fatigue driver’s estimation of the safe distance to be maintained with the vehicle ahead will also be out of order—this situation increases the probability of the rear-end collision.
Driving speed
The results showed that under the significance level of 0.001, the probability of rear-end collision caused by overspeed was 40% higher than that caused by non-overspeed.
Causal analysis
Overspeed driving reduces the driver’s vision and judgment. Tunnel vision is easy to form, especially on the freeway. Long-term high-speed running results in the fatigue and the sleepiness of the drivers. They cannot see the signs and landscapes on the roadside clearly, and their emergency response ability becomes worse. In case of an emergency, the possibility of rear-end collision occurs due to the reduction of the time for taking measures. The braking distance of the vehicle increases greatly because of overspeed driving. Whenever the speed of the vehicle doubles, the braking distance increases about four times. Especially when the road condition is not good, the braking distance is longer. Once the front vehicle suddenly decelerates, it is very easy to cause rear-end collision.
Conclusions
Based on the quasi-induced exposure theory, 638 two-car-rear-end collision accidents in China during 2010–2015 were selected as the research object in this article, and gender, age, driving age, vehicle safety, load, weather, fatigue, driving speed, road alignment, accident time, and visibility were preliminarily selected as the important factors that might affect the rear-end collision. Then, based on the logistic regression model, the influencing factors analysis model of the rear-end collision on the freeway is established, and the significant influencing factors were screened out from the candidate independent variables by the stepwise regression method. Finally, the specific influence of driving age, overload, weather, accident time, visibility, fatigue, and driving speed on rear-end collision on the freeway was discussed.
The research showed that the drivers with 6–10 years of driving experience had the highest probability of rear-end collision. Under the significance level of 0.026, the probability of rear-end collision for the non-overload vehicles was 30% lower than that for the overload vehicles. Under the significance level of 0.001, the probability of rear-end collision under the non-fine weather was 1.53 times higher than that under the fine weather. Under the significance level of 0.001, the probability of rear-end collision at night was 2.14 times higher than that of daytime. When the visibility was less than 50 miles, the probability of rear-end collision was the highest. Moreover, fatigue driving and overspeed driving could increase the probability of rear-end collision by 50%.
According to the analysis of the significant influencing factors of freeway rear-end collision, we can see that in order to reduce the occurrence of freeway rear-end collision as much as possible, the drivers should always abide by traffic regulations, standardize their driving behavior, try their best to choose the road environment with clear sight for traveling, control continuous driving time, and arrange reasonable rest time. Traffic management departments should continue to improve the effective signs and markings of each section of the freeway, strengthen the law enforcement, use emerging media to expand the publicity of civilized driving, strengthen patrols, and severely crack down on the illegal driving such as speeding and overloading on the freeway. Moreover, in order to strengthen the traffic management under the severe weather conditions, management departments are supposed to formulate feasible emergency plans as soon as possible, establish linkage mechanism with the meteorological departments, and provide early warning information to drivers. This article might provide a new idea for the cause analysis of freeway rear-end collision. The conclusion of the article provided theoretical basis for the scientific countermeasures to prevent the rear-end collision. It is propitious to further improve the traffic safety level on the freeway.
Future studies
The accident data were derived from the statistical report of road traffic accidents. The responsibility division of the two-car accidents was determined by the traffic police department, and the distribution of non-responsibility drivers might have the natural bias (such as the proportion of defensive drivers was large). This article did not find a suitable method to ensure the complete fairness of the responsibility division. In addition, there were still some deficiencies in breadth and depth of the road traffic accident information collection in China. In the form of the texts and the images, most of the accident information was stored in the accident file—this situation was failed to form a standardized data structure. Therefore, after using the quasi-exposure theory, this article lacked the underlying test of the result data screened in an aggregation level, but solved it by the regression analysis. This action might affect the reliability and the validity of the results to a certain extent. In the future, it is necessary to further analyze the quantitative evaluation of the deviation direction and the magnitude caused by each factor affecting the rear-end collision, on the basis of improving the pretreatment, the depth, and the scope of the accident data.
Footnotes
Handling Editor: Liping Jiang
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
