Sage Journals: Discover world-class research

Abstract

The study of expressway rear-end conflicts is of great significance to analyze driving behaviors and improve traffic safety. However, research on the classification and modeling of conflict patterns is still lacking. This study aimed to explore conflict patterns and their relationship with influencing factors. The conflict data used in this study was extracted from a trajectory data set that collected 3 h of data during a morning peak hour on an expressway in Shanghai, China. An improved k-means algorithm, which can automatically obtain the optimal number of clusters, was used to classify the conflict events into six conflict patterns. The conflict patterns were interpreted from five aspects: risk level, speed of risk-changing, risk-avoidance response, risk-avoidance attitude, and risk-avoidance action. Furthermore, a multivariate Poisson-lognormal (MVPLN) model considering spatial–temporal correlation was applied. The relationship between the independent variables and the number of each conflict pattern within the spatial–temporal unit was obtained. The root mean square error of the MVPLN model was 0.81. Compared with univariate Poisson model, univariate negative binomial model, and univariate Poisson-lognormal model, the MVPLN model improved by 73.8%, 81.3%, and 29.6% in accuracy respectively. The results of this study can classify expressway rear-end conflict patterns and obtain the number of each conflict pattern within spatial–temporal units using available traffic data.

Keywords

rear-end conflict conflict pattern improved K-Means multivariate Poissonlognormal model spatial-temporal correlation

Expressways are of great importance to a transportation system because they support the majority of long-trip traffic in cities ( 1 ). In China, an expressway refers to a two-way carriageway with a divider in the center. To provide urban long-distance and rapid transportation services, the design speed of expressways is between 60 km/h and 100 km/h. Because of the high traffic volume and speed, traffic crashes occur frequently on expressways. The traffic flow characteristics at merging, diverging, and weaving sections are completely different from those of the basic roadway. Traffic flows of different road options frequently change lanes and intertwine with each other, leading to high crash risks. The higher speed of the expressway than that of basic roads also makes the severity of crashes higher ( 2 ). Reducing traffic crashes on expressways plays an important role in traffic safety and efficiency.

However, because of the long observation period and difficult acquisition of crash data, the application of traffic safety methods is limited. Currently, traffic conflict is widely used as a surrogate safety measure (SSM) for traffic crashes ( 3 , 4 ). Traffic conflict data is relatively easy to access and can be used for safety evaluation without relying on a huge amount of crash data ( 5 ). Traffic conflicts are influenced by traffic flow, road design, and other factors. Building a traffic conflict model can obtain the quantitative relationship between the number of traffic conflicts and the influencing factors. It provides the theoretical and technical basis for traffic safety design.

Meanwhile, previous research has shown a strong correlation between traffic safety and driving behaviors ( 6 , 7 ). Because of external circumstances and individual differences, drivers’ risk-avoidance behaviors vary in many aspects, such as the attitude, speed, and intensity of reaction. This leads to conflicts taking on different patterns. There is considerable potential for increasing road safety using behavior modification ( 8 ). Considering pattern classification in the conflict model can help provide more targeted driving assistance and safer driving experiences.

Research on Pattern Classification

The classification of data into one of several categories is the problem addressed by pattern classification theory ( 9 ). In traffic safety research, distinct driving behaviors can be identified by exploring driving patterns. Commonly used methods can be divided into model-based and learning-based. Model-based methods classify driving patterns based on establishing driving models, whereas learning-based methods directly analyze driving data. Different terms and methods have been used to classify driving style based on individual, social, and technological factors ( 8 , 10 , 11 ). According to driving and driver characteristics, longitudinal driving behavior types have been determined and distinguished ( 12 ). The efficient k-means clustering algorithm was used to classify longitudinal driving behavior into driving patterns ( 13 ). The impacts of driving tendency on the dilemma zone distribution and the probability of rear-end crashes were explored. Three types of driving tendencies, which were conservative, normal, and aggressive, were classified based on driving variables ( 14 ). More researchers have studied specific driving styles such as longitudinal behavior, lateral control, gap acceptance, and others ( 15 – 17 ). According to the above discussions, studies of driving patterns focus on the irregular behaviors of the driver in some specific driving scenarios such as car following behaviors and intersection dilemma selection behaviors. The lack of research on conflict patterns has led to a lack of precision in traffic safety measures.

Feature parameter selection is a key issue in driving pattern classification. It is difficult to select a pair of feature parameters that can fully represent or define all aggressive (or normal) drivers, though the rule-based strategies can classify most drivers into different categories ( 17 ). Average speed, speed standard deviation, and the percentage of time over speed are commonly used feature parameters in driving style classification ( 18 – 20 ). Other microscopic features such as average acceleration/deceleration, acceleration/deceleration standard deviation, headway time distance, and other parameters are also used in some studies ( 21 , 22 ). The impact of sociodemographic and trip generation parameters on real-time traffic safety has been explored to predict real-time collision risk on highways ( 23 ). Based on the above analysis, the driving parameters were chosen with little consideration of conflict risk-changing characteristics. The lack of risk-changing parameters made driving pattern classification not detailed enough. In addition, the road and traffic environment are also important factors that affect driving patterns. Neglecting these external factors can affect the performance of the driving pattern model.

Research on Conflict Models

Traffic conflict models are designed to model the number of conflicts within a specific period at a certain location or segment. The methods widely used are various statistic models and machine learning methods. Nested logit models, multinomial logit model, and random parameters logistic models were calibrated to estimate the probabilities of the rear-end crash and determine the relevant variables ( 24 – 26 ). Generalized linear regression models such as Poisson model and negative binomial model are widely used in conflict and crash prediction ( 27 , 28 ). Poisson-lognormal model was another generalized Poisson model used to estimate crash counts by severity ( 29 , 30 ). Poisson-lognormal models have advantages in handling outliers ( 31 , 32 ). The application of multivariate Poisson-lognormal model provided a superior fit to the independent univariate models ( 33 ). Some other models such as Bayesian Tobit model, hierarchical Bayesian model, multilevel Bayesian logistic regression model, and the extreme value theory were utilized to estimate conflict or crash frequency ( 34 – 37 ). Machine learning methods such as neural networks and hidden Markov model are also receiving increasing attention and application in traffic conflict models (3, 38 –41). Other methods such as combining simulated trajectories with real-time safety models were also applied to predict conflicts ( 42 ). Nowadays traffic conflict models mainly focus on predicting the number of conflicts. However, traffic conflicts are usually caused by irregular behaviors of road users, so the role of road users in traffic conflict models cannot be ignored. Research on driver behavior and characteristics has been studied to help reduce the probability of rear-end collisions, thereby improving vehicle safety ( 26 ). Because of various driving behaviors and reactions to risks between drivers, conflicts can have different patterns. Driving patterns should be incorporated into conflict models.

Spatial–temporal correlation has received increasing attention in traffic conflict models. Because of the limited conflict observation time, traffic conflicts are usually studied in small time intervals. Because the observed targets and other traffic objects are correlated in different spatial–temporal units, the conflicts and the corresponding characteristics in different units also have spatial–temporal correlations. A negative binomial model was chosen to study the spatial relationship between conflicts and crashes ( 5 ). A temporal and spatial analysis of rear-end collision data at signalized intersections revealed a high correlation between longitudinally or spatially correlated rear-end collisions ( 43 ). Considering the possible temporal correlation of conflict-frequency data from the same road entity in short time intervals, random effects models were used to predict the number of conflicts ( 44 ). Some research has been carried out using Bayesian hierarchical models to incorporate spatial and temporal correlations in traffic conflicts ( 45 – 47 ). Taking exogenous variables, such as climate, seasonal events, and road properties, into consideration, a log-Gaussian Cox model was also used to describe the spatial–temporal stochastic process ( 48 ). Machine learning models such as convolutional neural network and long short-term memory were also widely used to simulate the correlation ( 49 , 50 ). Based on the above research, the number of conflict patterns within spatial–temporal units is not completely independent from each other. Thus, the independence between conflict patterns at different times and road segments needs more attention. Further research is needed on the spatial–temporal correlations between patterns.

This study will focus on the following two components:

Classify conflict patterns based on risk formation characteristics and risk-avoidance behavior characteristics.

Explore the relationship between conflict pattern frequency and influencing factors considering spatial–temporal correlation.

This paper is organized into five parts. The second part presents the data preparation. The third part presents the methodologies of the improved k-means algorithm and multivariate Poisson-lognormal model considering spatial–temporal correlation. The fourth part provides conflict pattern clustering results and regression model results. The fifth part summarizes the conclusions and limitations of this study.

Data Preparation

Data Source

The data used in this study is the MAGIC data set. The MAGIC data set is a trajectory data set extracted from an aerial video by a group of six unmanned aerial vehicles (UAV). The experiment was conducted from 7:40 to 10:40 a.m. on a section of the Shanghai Inner Ring, Shanghai, China, with a total length of 4000 m in both directions including a large radius curve and six ramps. This section contains merge segments, diverge segments, weaving segments, and basic segments. The widest part of the main road contains three lanes in one direction, and the narrowest part contains two lanes in one direction ( 51 ). From the MAGIC data set, vehicle characteristics, such as vehicle id, vehicle type, speed, acceleration, lane, and location, can be extracted. Traffic conflicts can be measured using this data set, and no crash data is observed in this data set. According to the division of the aerial camera area, the road section is divided into six sections with the corresponding numbers as shown in Figure 1. More details can be obtained at: https://magic.tongji.edu.cn/english/ACHIEVEMENTS/MAGIC_Dataset.htm.

Figure 1.

Road section numbers.

Data Preprocessing

Time to collision (TTC) was utilized as the SSM for traffic safety performance evaluation in this study. As a time-based SSM, TTC was easy to measure and most commonly used in traffic conflict studies ( 52 ). In many traffic conflict studies, traffic events with TTC less than 3 s were considered as conflicts ( 53 – 56 ). Therefore, TTC from falling below 3 s to rising above 3 s was considered as one conflict event in this paper. Based on previous studies ( 57 – 59 ), 2 s for TTC was used as the threshold for high-risk conflicts.

TTCs of rear-end conflicts were extracted from the MAGIC data set. When the TTC was less than 3 s, it was recorded as a traffic conflict event. After the data extraction and denoising process, the number of rear-end conflict events in this data set was 7847. The numbers of minimal points of TTC curves were calculated by polynomial fitting. An eighth-degree polynomial was used to fit each TTC curve. By finding the extreme point of the curve, the fluctuation of the TTC curve could be simulated. As presented in Figure 2, 7262 curves with one minimal point, 549 curves with two minimal points, and 39 curves with more than two minimal points were obtained.

Figure 2.

Classification of the number of minimal points.

Conflict event curves differed in the minimum TTC, the slope of decline/rise, and other characteristics. Therefore, it was necessary to explore the differences between conflict patterns. In this paper, only curves with only one minimal point were analyzed.

Conflict Characteristics

To explore conflict patterns in depth, the conflict characteristics were defined in more detail as shown in Figure 3. The definitions and symbols of characteristics are presented in Table 1.

Figure 3.

Conflict characteristics.

Table 1.

Conflict Characteristics

Symbol	Definition	Max	Min	Mean	Std
Risk formation characteristic
Risk_level	Conflict risk level, the minimum value of TTC in a conflict event (s)	3.00	0.53	2.54	0.34
Risk_duration	Duration of a conflict event (s)	4.88	0.08	0.90	0.59
Risk_deteriorate	Risk deterioration rate, the slope of the descent of TTC from 3 s to the minimum in a conflict event (s/s)	−0.05	−3.87	−0.99	0.50
Risk_disengage	Risk disengagement rate, the slope of the ascent of TTC from minimum to 3 s in a conflict event (s/s)	4.07	0.00	1.02	0.54
Risk-avoidance behavior characteristic
Avoid_response	Risk-avoidance response speed, the time when the maximum deceleration occurs from the start of the conflict event (s)	3.92	0.00	0.63	0.46
Avoid_stability	Risk-avoidance stability, the standard deviation of deceleration in a conflict event (m/s²)	4.83	0.00	0.53	0.46
Avoid_short_intensity	Short-term risk-avoidance intensity, the maximum deceleration in a conflict event (m/s²)	0.98	−9.90	−2.37	1.17
Avoid_long_intensity	Long-term risk-avoidance intensity, the average deceleration in a conflict event (m/s²)	1.35	−5.04	−1.46	0.74

Note: TTC = time to conflict.

The risk characteristics of a conflict event were described by conflict risk level, duration of conflict, risk deterioration rate, and risk disengagement rate. The minimum value of TTC was an important characteristic to measure the risk severity of a conflict event. The duration of conflict indicated how long the high risk lasted. The risk deterioration rate and risk disengagement rate indicated how quickly the risk level changed over a conflict event. Risk characteristics could be influenced by external road and traffic environmental factors, as well as by internal driver factors.

The selected variables were mainly related to deceleration. By defining these four characteristics, a driver’s response action to the risk during a conflict event was depicted. Risk-avoidance reaction speed reflected the time for a driver to take braking action after perceiving a high risk. The short-term risk-avoidance intensity was reflected by the maximum value of deceleration during a conflict event. The long-term risk-avoidance intensity was captured by the average deceleration during a conflict event. In this study, the risk-avoidance characteristics were considered to be related to the internal driver factors.

Independent Variable Definition and Test for Multicollinearity

Defining and testing the independent variables was the basis for building a generalized linear regression model. The spatial–temporal unit was defined as 100 m and 10 min in this study. Variables such as distance to the ramp, traffic volume, and average speed in the table were calculated for spatial–temporal units. As shown in Table 2, seven independent variables reflecting road characteristics and traffic environment were selected. These variables could be obtained from the velocity, acceleration, vehicle type, and other information in the trajectory data. The correlation between independent variables may negatively affect the model result. Therefore, it was necessary to test the independent variables for multicollinearity before modeling. Variance inflating factor (VIF) was used to test for multicollinearity in this study. All VIFs were smaller than 10 as shown in Table 2, so the multicollinearity between independent variables was acceptable.

Table 2.

Independent Variables

Symbol	Definition	Max	Min	Mean	Std	VIF
Road characteristic
Seg_type	Type of segment:(0: basic; 1: weaving; 2: merge; 3: diverge)	3	0	–	–	1.14
Lane_num	Number of lanes	3.00	2.00	–	–	1.70
Ramp_dist	Distance from the start of the segment to the nearest ramp (km)	1.38	0.02	0.52	0.43	2.52
Traffic environment
Heavy_percent	Percent of heavy vehicle in the spatial–temporal unit (%)	0.07	0.00	0.03	0.02	2.40
Volume	Volume in the spatial–temporal unit (vehicles per hour)	2027.00	0.00	534.08	338.61	2.92
Avg_speed	Average speed in the spatial–temporal unit (m/s)	62.22	0.00	20.88	14.65	2.44
Std_speed	Standard deviation of speed in the spatial–temporal unit (m/s)	24.19	0.00	9.38	6.11	1.48

Note: VIF = variance inflating factor. Means and standard deviations for nominal and interval variables are not given and dashes are used instead.

Methodology

This study consists of conflict pattern classification and modeling. In the first part, conflict patterns are classified based on risk formation characteristics and risk-avoidance behavior characteristics extracted from trajectory data. In the second part, a conflict pattern model is built based on the count data of conflict patterns obtained in the first part. The complete algorithm flowchart is shown in Figure 4. The improved k-means algorithm and the multivariate Poisson-lognormal model in the two red outline boxes are introduced in detail in this section of the paper. The processing of other parts is introduced in the sections on data processing and results.

Figure 4.

Flowchart of the algorithm.

In the first part, the clustering is based on TTC curves of conflict events. Each curve represents the process of risk changing over a conflict event, which means the curve starts with TTC below the threshold, declines to the lowest point of TTC, and ends with TTC rising above the threshold. Based on the observation of the TTC curves, the relevant characteristic parameters are extracted for clustering.

In the second part, a multivariate Poisson-lognormal model considering spatial–temporal correlation is applied to the conflict pattern count data. Spatial–temporal correlation is considered as the basis function using a spatial–temporal prediction method called fixed rank kriging (FRK). The relationship between influencing factors and the number of each conflict pattern in spatial–temporal units is described. Accuracy and goodness of fit are compared between MVPLN and three other generalized linear regression models.

Improved K-Means Algorithm

Original K-Means Algorithm

Clustering is the process of dividing the dataset into several categories such that objects in the same category are more similar than other objects in different categories. The k-means algorithm is a commonly used unsupervised learning algorithm for clustering. The basic idea of the k-means algorithm is to update the clustering center until the objective function achieves the minimum value. The objective function here is the sum of squares of the distances from each point to the nearest-cluster center, as shown by

E = \sum_{i = 1}^{K} \sum_{x \in C_{i}} {| | x - μ_{i} | |}^{2}

(1)

μ_{i} = \frac{1}{C_{i}} (\sum_{x \in C_{i}} x)

(2)

where $K$ is the number of clusters; $C_{i}$ is the $i$ th cluster; $x$ is the point vector in cluster $C_{i}$ ; and $μ_{i}$ is the mean vector of cluster $C_{i}$ .

Improved K-Means Algorithm

In the original k-means algorithm, k is a given number and cannot be determined by the algorithm. Therefore, the silhouette score, which is the mean silhouette coefficient of all samples, is used to decide the optimal number of clusters. The silhouette score is calculated as

silhouette_score = \frac{y - x}{\max (x, y)}

(3)

where $x$ is the mean intra-cluster distance and $y$ is the mean nearest-cluster distance. A larger profile coefficient indicates better clustering. Therefore, $k$ at the maximum of the silhouette score is considered as the optimal number of clusters. The flow chart of the improved k-means algorithm is shown in Figure 5.

Figure 5.

Flowchart of improved k-means algorithm.

Multivariate Poisson-Lognormal Model Considering Spatial–Temporal Correlation

Different road characteristics and traffic states in different spatial–temporal units result in variability in the number of different conflict patterns that occur. The number of different conflict patterns within a spatial–temporal unit is not completely independent from each other. Therefore, in the generalized linear regression model, it is necessary to consider the influence of different spatial–temporal factors. The multivariate Poisson-lognormal (MVPLN) model considering spatial–temporal correlation proposed in this paper can solve this problem.

FRK

In this model, spatial–temporal correlation is incorporated into the model by basis functions. To measure the effect of spatial–temporal correlation on the dependent variable, FRK spatial–temporal prediction method is applied. FRK is a modification of a spatial prediction method called kriging. Kriging allows for unbiased optimal estimation of regionalized variables in a finite region.

FRK reduces the complexity of the kriging algorithm for large-scale data by dividing the fitted variables into two parts: fixed effects and random effects. FRK hinges on the use of a spatial random effects (SRE) model, in which a spatially correlated mean-zero random process is decomposed using a linear combination of spatial basis functions with random coefficients plus a term that captures the random process’s fine-scale variation ( 60 ). The SRE model has a spatial covariance function that is always nonnegative-definite and, because any basis functions can be used, it can be constructed to approximate standard families of covariance functions ( 61 ). The FRK prediction formula is

{\hat{Y}}_{S_{0}} = T_{S_{0}} β + k_{S_{0}}^{'} \sum^{- 1} \tilde{Z}

(4)

where $Y_{S_{0}}$ is the regionalized random variable to be estimated; $S_{0}$ is regionalized space; $T_{S_{0}}$ is a vector of known coefficients; $β$ is a vector of unknown coefficients; $k_{S_{0}}$ is a spatial covariance parameter; $Σ$ is the covariance matrix; and $Z$ is the observed value of random variables.

The establishment of the spatial covariance function is the core of the kriging method. The traditional kriging method is to fit the covariance function of the whole region by selecting a suitable model based on the covariance values at different lag distances. There can be many different types of base function choices such as exponential function, wavelet function, and harmonization functions. Based on the idea of basis functions in FRK, several spatial–temporal basis functions are constructed to be incorporated into the model as independent variables in this study. The number of automatically generated basis functions is 36. Basis functions are represented as B1– B36 in the independent variables, which is a single resolution of the default Gaussian radial function.

MVPLN Model

There are two approaches for modeling count data with multiple categorical variables. One is univariate generalized linear regression models for each type of variable separately. The other is a multivariate generalized linear regression model. Not considering the correlation of different conflict patterns may cause bias in the fitting results or result in wrong statistical inference. The MVPLN model is a generalized Poisson model for multi-categorical count data and can be described by

f (y_{ik} | λ_{ik}) = \frac{λ_{ik}^{y_{ik}}}{y_{ik}!} e^{- λ_{ik}}

(5)

\ln (λ_{ik}) = \ln (μ_{ik}) + ε_{ik}

(6)

\ln (μ_{ik}) = θ_{k 0} + θ_{k 1} x_{i 1} + \dots + θ_{kn} x_{in}

(7)

ε_{i} ~ N_{k} (1, Σ)

(8)

ε_{i} = (\begin{matrix} ε_{i 1} \\ ε_{i 2} \\ \dots \\ ε_{ik} \end{matrix}), Σ = (\begin{matrix} σ_{11} & σ_{12} & \dots & σ_{1 k} \\ σ_{21} & σ_{22} & \dots & σ_{2 k} \\ \dots & \dots & \dots & \dots \\ σ_{k 1} & σ_{k 2} & \dots & σ_{kk} \end{matrix})

(9)

where $k$ is the number of conflict patterns; $y_{ik}$ is the frequency of the $k$ th conflict patterns for the $i$ th spatial–temporal unit; $λ_{ik}$ is the expected value of the number of the $k$ th conflict pattern in the $i$ th spatial–temporal unit; $f (y_{ik} | λ_{ik})$ is the probability of the number of the $k$ th conflict patterns for the $i$ th spatial–temporal unit; $θ_{kn}$ is the coefficient of the $n$ th independent variable for the $k$ th conflict pattern; and $ε_{i}$ is a Gaussian-distributed error term with mean 1 and variance $Σ$ .

Let $X$ and $Θ$ denote the matrix of covariates and the vector of regression coefficients, respectively. Thus, given $(X, Θ, Σ)$ , the $λ_{i}$ is independently distributed as a K-dimensional log-normal distribution, which is described as

f (λ_{i} | X, Θ, Σ) = \frac{\exp [- 0.5 {(λ_{i}^{*} - μ_{i}^{*})}^{' Σ^{- 1}} (λ_{i}^{*} - μ_{i}^{*})]}{{(2 π)}^{k / 2} (Π_{k = 1}^{K} λ_{ik}) {| Σ |}^{1 / 2}}

(10)

where $λ_{i}^{*} = (\ln (λ_{i 1}) \ln (λ_{i 2}) \dots \ln (λ_{iK}))'$ ; $μ_{i}^{*} = (\ln (μ_{i 1}) \ln (μ_{i 2}) \dots \ln (μ_{iK}))'$ .

Bayesian estimating method is used for parameter estimation, which is one of the most common methods used in conflict-frequency models ( 62 ). The deviance information criteria (DIC) is used to compare the goodness of fit of Poisson-lognormal models. The DIC is considered the Bayesian equivalent of the Akaike information criterion. The smaller the DIC, the better the model fit ( 63 ). DIC is defined as

DIC = D (\bar{θ}) + 2 p_{D} = \bar{D} + p_{D}

(11)

where $D (\bar{θ})$ is the deviance evaluated at $\bar{θ}$ ; $p_{D}$ is the effective number of parameters in the model; and $\bar{D}$ is the posterior mean of the deviance.

For comparison, univariate Poisson regression model, univariate negative binomial regression model, and univariate Poisson-lognormal regression model are considered in this study. The root mean square error (RMSE) is used to compare the accuracy of fit between different models, which is expressed as

RMSE = \sqrt{\frac{1}{N} {\sum_{i = 1}^{N} \sum_{k = 1}^{K} (p_{ik} - a_{ik})}^{2}}

(12)

where $N$ is the number of spatial–temporal units; $p_{ik}$ is the predicted frequency of the $k$ th conflict pattern in the $i$ th spatial–temporal unit; and $a_{ik}$ is the actual frequency of the $k$ th conflict pattern in the $i$ th spatial–temporal unit.

Results

Conflict Pattern Clustering Results

First, a distinction was made between low-risk conflict events and high-risk conflict events. The minimal points of all curves were divided into two categories as shown in Figure 6.

Figure 6.

Conflict event severity classification results.

The high-risk and low-risk conflict events were clustered separately using improved k-means algorithm as previously introduced. The feature parameters to measure the similarity between clusters included seven conflict characteristics except for Risk_level, which were Risk_duration, Risk_deteriorate, Risk_disengage, Avoid_response, Avoid_stability, Avoid_short_intensity, Avoid_long_intensity.

For the low-risk conflict events, the silhouette score corresponding to $k$ taken from 2 to 6 can be seen in Figure 7. The silhouette score dropped when $k$ turned to 4, thus the optimal clustering number was 3. Then TTC curves were classified into three clusters. For the high-risk conflicts, the silhouette score corresponding to $k$ taken from 2 to 6 can be seen in Figure 7. The silhouette score dropped when k turned to 4, thus the optimal clustering number was also 3. Then TTC curves were also classified into three clusters. Finally, the total six conflict patterns are shown in Figure 7. As a reminder, the clustering center curve used the Euclidean average of all curves in the cluster, whereas the minimum TTC in the conflict characteristics was the average of the minimum TTC of all curves in the cluster, so there was a difference. In addition, because the conflict characteristics used in clustering cannot all be reflected on the graph, such as the risk-avoidance behavior characteristics, the distance between the curves in Figure 7 cannot fully explain the similarity between clusters.

Figure 7.

Clustering results.

The inter-cluster distances between the six clusters were calculated by Euclidean distance as shown in Table 3.

Table 3.

Inter-Cluster Distances

	Pattern 1	Pattern 2	Pattern 3	Pattern 4	Pattern 5	Pattern 6
Pattern 1	–	1.77	2.78	2.60	1.93	1.59
Pattern 2	1.77	–	4.22	3.99	3.47	2.04
Pattern 3	2.78	4.22	–	1.69	2.51	3.51
Pattern 4	2.60	3.99	1.69	–	1.85	2.62
Pattern 5	1.93	3.47	2.51	1.85	–	1.99
Pattern 6	1.59	2.04	3.51	2.62	1.99	–

After min-max normalization, the radar map of characteristics in each cluster is shown in Figure 8.

Figure 8.

Radar map of characteristics.

After obtaining the clusters, the traffic meanings of the six clusters needed to be interpreted. To obtain a more intuitive explanation, fuzzy C-means algorithm was used to classify characteristic levels. The silhouette score was also used to determine the optimal number of clusters. The characteristic values and the corresponding levels are shown in Table 4.

Table 4.

Characteristic Levels of Conflict Patterns

Conflict pattern	Number	Percentage	Risk formation characteristic				Risk-avoidance behavior characteristic
			Risk		Change		Response	Attitude	Action
			Risk_level	Risk_duration	Risk_deteriorate	Risk_disengage	Avoid_response	Avoid_stability	Avoid_short_intensity	Avoid_long_intensity
1	2891	43.41	2.52	1.07	−0.89	0.99	0.74	0.53	−2.87	−1.95
1	2891	43.41	Low	Short	Slow	Slow	Medium	High	Weak	Strong
2	3423	51.40	2.70	0.61	−0.99	0.92	0.43	0.36	−1.54	−0.95
2	3423	51.40	Low	Short	Slow	Slow	Fast	High	Weak	Weak
3	346	5.20	2.49	0.99	−1.14	1.14	0.60	1.78	−5.32	−2.04
3	346	5.20	Low	Short	Fast	Fast	Medium	Low	Strong	Strong
4	114	1.71	1.76	1.33	−1.78	2.13	1.02	1.51	−4.57	−2.21
4	114	1.71	High	Long	Fast	Fast	Medium	Low	Strong	Strong
5	175	2.63	1.81	2.21	−0.96	1.40	1.54	0.78	−3.81	−2.34
5	175	2.63	High	Long	Slow	Fast	Slow	High	Strong	Strong
6	313	4.70	1.81	1.63	−1.46	1.74	1.16	0.61	−2.35	−1.35
6	313	4.70	High	Long	Fast	Fast	Slow	High	Weak	Weak

A correlation can be seen between different conflict characteristics. The higher the level of conflict risk, the longer the duration of the conflict event. The risk deterioration rate and the risk disengagement rate roughly correspond to each other. The short-term risk-avoidance intensity and long-term risk-avoidance intensity also remain mostly consistent.

Based on the above clustering of patterns and grading of characteristics, the conflict patterns were finally named in five aspects: risk level, speed of risk-changing, risk-avoidance response, risk-avoidance attitude, and risk-avoidance action, as shown in Figure 9.

Figure 9.

Naming of conflict patterns.

From the perspective of risk formation characteristics, the above six conflict patterns can be summarized into four risk formation patterns. Take the first risk formation pattern (Low risk - Slow change) as an example and the heat maps on the road section are shown in Figure 10a. From the perspective of risk-avoidance behavior characteristics, there are five different risk-avoidance behavior patterns. Take the fourth risk-avoidance behavior pattern (Neutral response - Panic attitude - Tough action) as an example and the heat map on the road section is shown in Figure 10b. There is variability in the distribution of conflict patterns, but each conflict pattern appears most in section 6. Focus on road section 6 for further analysis, and the distribution of conflict pattern 1 is shown in Figure 10c. There are differences in frequency between different flow directions and different lanes in the same direction.

Figure 10.

Heat map of conflict patterns: (a) first risk formation pattern, (b) fourth risk-avoidance behavior pattern, and (c) road section 6 distribution.

To summarize the above discussion, the frequency of conflict patterns is related to road and traffic environment characteristics as introduced earlier. In the next part, the regression model is built to fit the influencing factors and the frequency of conflict patterns.

Multivariate Poisson-Lognormal Model Considering Spatial–Temporal Correlation

Model Parameters

This study divides 100 m and 10 min into one spatial–temporal unit. The upward direction (from east to west direction) is Section 1–20, and the downward direction (from west to east direction) is Section 21–40. Section 17–19, and 36–39 are weaving segments, Section 1–5 are merge segments, and Section 21–26 are diverge segments.

The categorical variable seg_type was incorporated into the model as dummy variables D1–D3, and the spatial–temporal basis functions B1–B36 were incorporated into the model as independent variables. The parameters of the MVPLN model are shown in Table 5. Numbers in the table without brackets are coefficients of independent variables, while those in brackets are standard errors.

Table 5.

Coefficients and Standard Errors of Independent Variables

Conflict pattern	1	2	3	4	5	6
(Intercept)	3.3651 (0.4299)	3.0623 (0.6314)	−3.7153 (0.7367)	−2.2363 (1.3761)	−0.4548 (1.8966)	1.2078 (2.1443)
D1 (seg_type=1)	1.3193 (0.2055)**	1.6410 (0.3042)**	−0.3127 (0.2816)	1.0145 (0.5786)*	0.9904 (0.9132)	1.6339 (1.0374)
D2 (seg_type=2)	0.9997 (0.2250)**	1.0805 (0.3277)**	−0.5841 (0.4586)	−0.9197 (0.8029)	0.8781 (1.0066)	0.7495 (1.1181)
D3 (seg_type=3)	−0.2247 (0.1071)**	−0.1684 (0.1412)	0.3609 (0.4375)	−0.1058 (0.5862)	0.7467 (0.4582)	−0.4261 (0.4378)
Lane_num	−0.6599 (0.2050)**	−0.8257 (0.3037)**	1.4095 (0.2906)**	0.5831 (0.5905)	−0.5127 (0.9062)	−0.9221 (1.0349)
Ramp_dist	0.3122 (0.0927)**	0.5775 (0.1219)**	−0.6024 (0.3556)*	1.2394 (0.5207)**	−0.5254 (0.4199)	0.7738 (0.4111)*
Volume	0.0010 (0.0001)**	0.0012 (0.0001)**	0.0019 (0.0002)**	0.0021 (0.0004)**	0.0001 (0.0004)	0.0010 (0.0003)**
Avg_speed	−0.0613 (0.0033)**	−0.0765 (0.0045)**	−0.0342 (0.0108)**	−0.1060 (0.0200)**	−0.0240 (0.0137)*	−0.0738 (0.0149)**
Std_speed	0.0462 (0.0051)**	0.0574 (0.0068)**	0.0070 (0.0190)	0.0571 (0.0270)**	0.0849 (0.0249)**	0.0559 (0.0225)**
Heavy_percent	−2.5643 (1.5136)*	−2.8717 (1.9980)	−25.7793 (6.1425)**	−37.1840 (9.2704)**	−6.2691 (7.4157)	−10.4661 (6.9846)
B1–B36	15.4416 (3.4916)	12.6766 (4.8607)	31.7138 (12.5462)	−6.0857 (24.1561)	−5.7977 (15.3146)	22.9623 (12.9939)

Note: Numbers in the table without brackets are coefficients of independent variables and those in brackets are standard errors; conflict patterns:

1: low risk - slow change - neutral response - calm attitude - tough action;

2: low risk - slow change - sensitive response - calm attitude - tender action;

3: low risk - intense change - neutral response - panic attitude - tough action;

4: high risk - intense change - neutral response - panic attitude - tough action;

5: high risk - moderate change - insensitive response - calm attitude - tough action;

6: high risk - intense change - insensitive response - calm attitude - tender action.

variable significant at 90% interval; **variable significant at 95% interval.

Road characteristics independent variables mainly affect the three conflict patterns of low risk. There is a lot of acceleration, deceleration, and lane-changing behaviors in the weaving and merge segments on expressways, which may lead to traffic disorder and traffic conflicts. Conflict pattern 1 and 2, which are the two conflict patterns that account for the largest percentage, are more likely to occur in the weaving segments. When there are fewer lanes, the conflict pattern tends to be more of a Low risk - Slow change pattern, and the risk-avoidance attitude is calmer. The reason may be that when there are fewer lanes the driver is exposed to fewer distracting factors. As for the number of lanes, it is not significant in the high-risk patterns. It is assumed that in high-risk situations, traffic volume and speed will directly affect the driver’s short-term reactions. But in low-risk situations, the indirect factors of road characteristics also have an influence. The risk-avoidance behaviors are more moderate and the conflict patterns are milder on sections further away from the ramp for the same reason.

The traffic environment independent variables have a significant effect on almost all conflict types, especially the average speed of road segments. In conflict pattern 1, 2, 3, 4, and 6, the effects of traffic volume are similar. Rear-end conflicts are more likely to occur when the traffic volume is higher. When the speed is lower, there is more congestion and frequent traffic bottlenecks, which also increases the number of conflicts. Conflict pattern 4 increases the most, by about 10.6%, when the speed is reduced by 1%. The larger speed standard deviation means that the current road speed stability decreases and the probability of conflict increases. Relatively speaking drivers’ risk-avoidance attitude is calmer. Finally, the decrease in the proportion of heavy vehicles has a great effect on conflict pattern 1, 3, and 4. When the proportion of heavy vehicles is small, lane-changing behaviors may be more often and quick, thus conflicts changing intensely are more likely to happen. Drivers also tend to have medium response speed, more panic attitude, and stronger braking measures in such scenarios.

Model Comparison

Three other generalized linear regression models were applied to compare fitting accuracy; these were univariate Poisson model, univariate negative binomial (NB) model, and univariate Poisson-lognormal (PLN) model. RMSE was used to evaluate accuracy. Model RMSE results are shown in Table 6.

Table 6.

Root Mean Square Error for Four Models

Conflict pattern	Univariate Poisson model	Univariate negative binomial model	Univariate Poisson-lognormal model	Multivariate Poisson-lognormal model
Weighted average	3.09	4.33	1.15	0.81
1	5.93	8.86	1.42	1.31
2	4.34	5.36	1.33	1.10
3	1.41	2.01	1.47	0.54
4	0.74	0.78	0.99	0.43
5	0.71	0.73	0.88	0.50
6	0.52	0.52	0.53	0.51

By comparing the four models, the MVPLN had a significant advantage in fitting accuracy. The total weighted average accuracy was 73.8%, 81.3%, and 29.6% improved compared with the other three models respectively. Compared with the other three models, the MVPLN model had the most obvious advantage in conflict pattern 1 and 2. For conflict pattern 1, the fitting accuracy of MVPLN was 77.9% and 85.2% better than the first two models. For conflict pattern 2, the fitting accuracy was 74.7% and 79.5% better than the first two models. For conflict pattern 3, the optimizations were also all above 50%.

In addition, the DIC values for Poisson-lognormal models were calculated to measure the goodness of fit. The DIC value was 6853.39 under the MVPLN. The DIC values of the univariate PLN models for each conflict pattern were 2241.21, 1844.48, 952.60, 477.88, 619.18, and 810.03, respectively. Thus, the MVPLN model had an advantage in the goodness of fit over the six univariate PLN models since the DIC was less than the sum of the DICs of the univariate models. Therefore, among the models applied in this study, MVPLN was the most suitable model.

Conclusions

This study explored rear-end conflict pattern classification and modeling on an expressway in China. In the first part, a parameter system of conflict pattern characteristics was constructed of both risk formation characteristics and risk-avoidance behavior characteristics. An improved k-means algorithm was used to classify the conflict patterns, and six risk patterns were finally obtained. These six risk patterns differed in five aspects and were named based on risk formation and risk-avoidance behavior characteristics. The number of conflict patterns within the spatial–temporal units was counted and heat maps were drawn. In the second part, seven independent variables were selected from both road characteristics and traffic environment perspectives. The spatial–temporal correlation was added to the model by the basis function derived from FRK. An MVPLN model was constructed for quantitative analysis. Model parameters were obtained and their traffic meanings were interpreted. The results showed that the MVPLN model performed better than the other three univariate models.

The difference between conflict patterns was described by risk level, speed of risk-changing, risk-avoidance response, risk-avoidance attitude, and risk-avoidance action. Among the six conflict patterns, conflict pattern 1 and 2 have a much larger proportion than the other four patterns. In most conflict events, the vehicle speed is moderate and the deceleration process is stable. When the risk changes drastically, drivers may be more panicked. There is variability in the distribution of the six conflict patterns over the expressway, which can be attributed to differences in roadway type, number of lanes, traffic volume, average speed, and so on. Through qualitative analysis, the possible influencing factors were summarized in road characteristics and traffic environment.

Road user behavioral variability was considered by modeling with conflict pattern count data. The effects of seven independent variables on the number of six conflict patterns within spatial–temporal units were quantified using the MVPLN model. Results showed that road characteristics had a more significant effect on conflict patterns with low risk. The traffic environment independent variables had a significant effect on most of the conflict patterns. The significant independent variables differed for different conflict pattern models. All independent variables were significant in at least one of the conflict patterns. The significant independent variables that had more influence were traffic volume, average speed, and the standard deviation of speed.

The other three generalized linear regression models were used to fit the conflict pattern count data. These three models were univariate whereas the MVPLN model was a multivariate model. The multivariate model considered the correlation between the number of different conflict patterns. By comparing RMSE, the MVPLN model fitted much better than the other three. The total fitting accuracy of MVPLN was 73.8%, 81.3%, and 29.6% improved compared with the other three models, respectively. By comparing the DIC values, the MVPLN model had an advantage over the univariate PLN model. Therefore, the MVPLN model was optimal among the models applied in this study.

The conclusions and extensions of this study can provide a basis for the development of the functions in advanced driver assistance systems. Since the independent variables selected in this study, such as volume, average speed, the standard deviation of speed, and percent of heavy vehicles, can be obtained by roadside detectors, the real-time data collected by detectors can be used to predict the number of specific conflict patterns in practical applications. In future research, more efforts are needed to classify the driver’s risk-avoidance style in the whole process. By recording the historical style of a specific driver and designing corresponding driving assistance functions, drivers can then be provided with more accurate driving traffic guidance in the connected environment. This provides new ideas for the application of traffic conflict studies.

There were some limitations in this study. This study only analyzed conflict events where TTC curves had only one minimal point in high risk. More research is needed to explore conflict events with two or more minimal points. In addition, this study was conducted based on trajectory data, but the specific driver’s conflict style has not been discussed. In future studies, more attention needs to be paid to the conflict pattern of the same vehicle on multiple road segments.

Footnotes

Author Contributions

The authors confirm their contribution to the paper as follows: study conception and design: Ling Wang, Yunting Miao, and Wanjing Ma; analysis and interpretation of results: Yunting Miao and Ling Wang; draft manuscript preparation: Yunting Miao; revision: Ziliang He, Ling Wang, Wanjing Ma, and Mohamed Abdel-Aty. All authors reviewed the results and approved the final version of the manuscript.

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This study is supported by National Natural Science Foundation of China (52131204, 52102415), Fundamental Research Funds for the Central Universities (22120220137).

ORCID iDs

Wanjing Ma

Yunting Miao

Ling Wang

Mohamed Abdel-Aty

Supplemental Material

The MAGIC data set is available online at: .

References

DOT, US. Highway functional classification concepts, criteria and procedures. FHWA-PL-13-026, US Department of Transportation, Federal Highway Administration, 2013.

Twomey

J. M.

Heckman

M. L.

Hayward

J. C.

Zuk

R. J.

Accidents and Safety Associated With Interchanges. Transportation Research Record: Journal of the Transportation Research Board, 1993. 1385: 100–105.

Formosa

Quddus

Ison

Abdel-Aty

Yuan

Predicting Real-Time Traffic Conflicts Using Deep Learning. Accident Analysis & Prevention, Vol. 136, 2020, p. 105429.

Sayed

Zheng

Multi-Type Bayesian Hierarchical Modeling of Traffic Conflict Extremes for Crash Estimation. Accident Analysis & Prevention, Vol. 160, 2021, p. 106309.

Charly

Mathew

T. V.

Estimation of Traffic Conflicts Using Precise Lateral Position and Width of Vehicles for Safety Assessment. Accident Analysis & Prevention, Vol. 132, 2019, p. 105264.

Mallia

Lazuras

Violani

Lucidi

Crash Risk and Aberrant Driving Behaviors Among Bus Drivers: The Role of Personality and Attitudes Towards Traffic Safety. Accident Analysis & Prevention, Vol. 79, 2015, pp. 145–151.

Ali

Sharma

Haque

M. M.

Zheng

Saifuzzaman

The Impact of the Connected Environment on Driving Behavior and Safety: A Driving Simulator Study. Accident Analysis & Prevention, Vol. 144, 2020, p. 105643.

Sagberg

Bianchi Piccinini

Selpi G. F.

Engström

A Review of Research on Driving Styles and Road Safety. Human Factors, Vol. 57, No. 7, 2015, pp. 1248–1275.

Chiang

L. H.

Russell

E. L.

Braatz

R. D.

Pattern Classification. In Fault Detection and Diagnosis in Industrial Systems. Advanced Textbooks in Control and Signal Processing, Springer, London, 2001, pp. 27–31.

10.

Bejani

M. M.

Ghatee

A Context Aware System for Driving Style Evaluation by an Ensemble Learning on Smartphone Sensors Data. Transportation Research Part C: Emerging Technologies, Vol. 89, 2018, pp. 303–320.

11.

Vaitkus

Lengvenis

Žylius

Driving Style Classification Using Long-Term Accelerometer Information. 19th International Conference on Methods and Models in Automation and Robotics (MMAR). IEEE, Miedzyzdroje, 2014, pp. 641–644.

12.

Wang

Characterization of Longitudinal Driving Behavior by Measurable Parameters. Transportation Research Record: Journal of the Transportation Research Board, 2010. 2185: 15–23.

13.

Higgs

Abbas

M. M.

Segmentation and Clustering of Car-Following Behavior: Recognition of Driving Patterns. IEEE Transactions on Intelligent Transportation Systems, Vol. 16, 2015, pp. 81–90.

14.

Wang

Wan

Estimating Rear-End Accident Probabilities With Different Driving Tendencies at Signalized Intersections in China. Journal of Advanced Transportation, Vol. 2019, 2019, p. 4836908.

15.

Crundall

Chapman

Trawley

Collins

Van Loon

Andrews

Underwood

Some Hazards are More Attractive Than Others: Drivers of Varying Experience Respond Differently to Different Types of Hazard. Accident Analysis & Prevention, Vol. 45, 2012, pp. 600–609.

16.

Farah

Bekhor

Polus

Toledo

A Passing Gap Acceptance Model for Two-Lane Rural Highways. Transportmetrica, Vol. 5, No. 3, 2009, pp. 159–172.

17.

Han

Wang

Statistical-Based Approach for Driving Style Recognition Using Bayesian Probability With Kernel Density Estimation. IET Intelligent Transport Systems, Vol. 13, No. 1, 2019, pp. 22–30.

18.

Constantinescu

Marinoiu

Vladoiu

Driving Style Analysis Using Data Mining Techniques. International Journal of Computers, Communications & Control (IJCCC), Vol. 5, 2010, pp. 654–663.

19.

Manzoni

Corti

De Luca

Savaresi

S. M.

Driving Style Estimation Via Inertial Measurements. 13th International IEEE Conference on Intelligent Transportation Systems, Funchal, Portugal, 2010, pp. 777–782.

20.

Miyajima

Nishiwaki

Ozawa

Wakita

Itou

Takeda

Itakura

Driver Modeling Based on Driving Behavior and Its Evaluation in Driver Identification. Proceedings of the IEEE, Vol. 95, No. 2, 2007, pp. 427–437.

21.

Doshi

Trivedi

Examining the Impact of Driving Style on the Predictability and Responsiveness of the Driver: Real-World and Simulator Analysis. 2010 IEEE Intelligent Vehicles Symposium, IEEE, La Jolla, CA, 2010, pp. 232–237.

22.

Žylius

Vaitkus

Lengvenis

Driving Style Classification using Long-Term Accelerometer Information. 19th International Conference on Methods and Models in Automation and Robotics (MMAR), Międzyzdroje, Poland, 2014, pp. 641–644.

23.

Wang

Abdel-Aty

Lee

Shi

Analysis of Real-Time Crash Risk for Expressway Ramps Using Traffic, Geometric, Trip Generation, and Socio-Demographic Predictors. Accident Analysis & Prevention, Vol. 122, 2019, pp. 378–384.

24.

Abdel-Aty

Abdelwahab

Modeling Rear-End Collisions Including the Role of Driver’s Visibility and Light Truck Vehicles Using a Nested Logit Structure. Accident Analysis & Prevention, Vol. 36, No. 3, 2004, pp. 447–456.

25.

Dimitriou

Stylianou

Abdel-Aty

M. A.

Assessing Rear-End Crash Potential in Urban Locations Based on Vehicle-by-Vehicle Interactions, Geometric Characteristics and Operational Conditions. Accident Analysis & Prevention, Vol. 118, 2018, pp. 221–235.

26.

Abdel-Aty

Park

Zhu

Effects of Crash Warning Systems on Rear-End Crash Avoidance Behavior Under Fog Conditions. Transportation Research Part C: Emerging Technologies, Vol. 95, 2018, pp. 481–492.

27.

Katrakazas

Theofilatos

Islam

M. A.

Papadimitriou

Dimitriou

Antoniou

Prediction of Rear-End Conflict Frequency Using Multiple-Location Traffic Parameters. Accident Analysis & Prevention, Vol. 152, 2021, p. 106007.

28.

Khattak

M. W.

Pirdavani

De Winne

Brijs

De Backer

Estimation of Safety Performance Functions for Urban Intersections Using Various Functional Forms of the Negative Binomial Regression Model and a Generalized Poisson Regression Model. Accident Analysis & Prevention, Vol. 151, 2021, p. 105964.

29.

Wang

Bhowmik

Zhao

Eluru

Jackson

Highway Safety Assessment and Improvement Through Crash Prediction by Injury Severity and Vehicle Damage Using Multivariate Poisson-Lognormal Model and Joint Negative Binomial-Generalized Ordered Probit Fractional Split Model. Journal of Safety Research, Vol. 76, 2021, pp. 44–55.

30.

Chen

Multivariate Space-Time Modeling of Crash Frequencies by Injury Severity Levels. Analytic Methods in Accident Research, Vol. 15, 2017, pp. 29–40.

31.

Sacchi

Sayed

Bayesian Estimation of Conflict-Based Safety Performance Functions. Journal of Transportation Safety & Security, Vol. 8, No. 3, 2016, pp. 266–279.

32.

Zheng

Sayed

Mannering

Modeling Traffic Conflicts for Use in Road Safety Analysis: A Review of Analytic Methods and Future Directions. Analytic Methods in Accident Research, Vol. 29, 2021, p. 100142.

33.

El-Basyouny

Sayed

Collision Prediction Models Using Multivariate Poisson-Lognormal Regression. Accident Analysis & Prevention, Vol. 41, No. 4, 2009, pp. 820–828.

34.

Arun

Haque

M. M.

Washington

Sayed

Mannering

How Many Are Enough?: Investigating the Effectiveness of Multiple Conflict Indicators for Crash Frequency-by-Severity Estimation by Automated Traffic Conflict Analysis. Transportation Research Part C: Emerging Technologies, Vol. 138, 2022, p. 103653.

35.

Guo

Sayed

Essa

Real-Time Conflict-Based Bayesian Tobit Models for Safety Evaluation of Signalized Intersections. Accident Analysis & Prevention, Vol. 144, 2020, p. 105660.

36.

Guo

Sayed

Zheng

A Hierarchical Bayesian Peak Over Threshold Approach for Conflict-Based Before-After Safety Evaluation of Leading Pedestrian Intervals. Accident Analysis & Prevention, Vol. 147, 2020, p. 105772.

37.

Wang

Abdel-Aty

Shi

Park

Real-Time Crash Prediction for Expressway Weaving Segments. Transportation Research Part C: Emerging Technologies, Vol. 61, 2015, pp. 1–10.

38.

Rahim

M. A.

Hassan

H. M.

A Deep Learning Based Traffic Crash Severity Prediction Framework. Accident Analysis & Prevention, Vol. 154, 2021, p. 106090.

39.

Xie

Yang

Ozbay

Yang

Use of Real-World Connected Vehicle Data in Identifying High-Risk Locations Based on a New Surrogate Safety Measure. Accident Analysis & Prevention, Vol. 125, 2019, pp. 311–319.

40.

Yuan

Huang

Wang

Sun

Using Traffic Flow Characteristics to Predict Real-Time Conflict Risk: A Novel Method for Trajectory Data Analysis. Analytic Methods in Accident Research, Vol. 35, 2022, p. 100217.

41.

Ziakopoulos

Vlahogianni

Antoniou

Yannis

Spatial Predictions of Harsh Driving Events Using Statistical and Machine Learning Methods. Safety Science, Vol. 150, 2022, p. 105722.

42.

Essa

Sayed

Comparison Between Surrogate Safety Assessment Model and Real-Time Safety Models in Predicting Field-Measured Conflicts at Signalized Intersections. Transportation Research Record: Journal of the Transportation Research Board, 2020. 2674: 100–112.

43.

Wang

Abdel-Aty

Temporal and Spatial Analyses of Rear-End Crashes at Signalized Intersections. Accident Analysis & Prevention, Vol. 38, No. 6, 2006, pp. 1137–1150.

44.

Essa

Sayed

Full Bayesian Conflict-Based Models for Real Time Safety Evaluation of Signalized Intersections. Accident Analysis & Prevention, Vol. 129, 2019, pp. 367–381.

45.

Mannering

F. L.

Shankar

Bhat

C. R.

Unobserved Heterogeneity and the Statistical Analysis of Highway Accident Data. Analytic Methods in Accident Research, Vol. 11, 2016, pp. 1–16.

46.

Osama

Sayed

Investigating the Effect of Spatial and Mode Correlations on Active Transportation Safety Modeling. Analytic Methods in Accident Research, Vol. 16, 2017, pp. 60–74.

47.

Zheng

Sayed

A Bivariate Bayesian Hierarchical Extreme Value Model for Traffic Conflict-Based Crash Estimation. Analytic Methods in Accident Research, Vol. 25, 2020, p. 100111.

48.

Ramírez

A. F.

Valencia

Spatiotemporal Correlation Study of Traffic Accidents With Fatalities and Injuries in Bogota (Colombia). Accident Analysis & Prevention, Vol. 149, 2021; p. 105848.

49.

Honglei

Wang

Lei

A Deep Learning Approach to the Citywide Traffic Accident Risk Prediction. 21st International Conference on Intelligent Transportation Systems (ITSC), Maui, HI, 2018, pp. 3346–3351.

50.

Yuan

Zhou

Yang

Hetero-ConvLSTM: A Deep Learning Approach to Traffic Accident Prediction on Heterogeneous Spatio-Temporal Data. Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining 2018, New York, NY, 2018, pp. 984–992.

51.

Zhong

Wang

Jiang

Abdel-Aty

MAGIC Dataset: Multiple Conditions Unmanned Aerial Vehicle Group-Based High-Fidelity Comprehensive Vehicle Trajectory Dataset. Transportation Research Record: Journal of the Transportation Research Board, 2022. 2676: 793–805.

52.

Wang

Xie

Huang

Liu

A Review of Surrogate Safety Measures and Their Applications in Connected and Automated Vehicles Safety Modeling. Accident Analysis & Prevention, Vol. 157, 2021, p. 106157.

53.

Autey

Sayed

Zaki

M. H.

Safety Evaluation of Right-Turn Smart Channels Using Automated Traffic Conflict Analysis. Accident Analysis & Prevention, Vol. 45, 2012, pp. 120–130.

54.

Chai

Zeng

Wang

Safety Evaluation of Responsibility-Sensitive Safety (RSS) on Autonomous Car-Following Maneuvers Based on Surrogate Safety Measurements. 2019 IEEE Intelligent Transportation Systems Conference (ITSC), IEEE, Auckland, New Zealand, 2019, pp. 175–180.

55.

Chen

Bai

Sze

Factors Affecting the Severity of Rear-End Conflicts: A Driving Simulator Study. 2019 5th International Conference on Transportation Information and Safety (ICTIS), IEEE, Liverpool, 2019, pp. 1182–1187.

56.

Shariat

Kashani

Nosrati

Ranjbari

Identifying Significant Predictors of Head-on Conflicts on Two-Lane Rural Roads Using Inductive Loop Detectors Data. Traffic Injury Prevention, Vol. 12, 2011, pp. 636–641.

57.

Debnath

Wilson

Haworth

Proactive Safety Assessment in Roadwork Zones: A Synthesis of Surrogate Measures of Safety. In Proceedings of the 2nd Occupational Safety in Transport Conference, CARRS-Q, Queensland University of Technology, Australia, 2014, pp. 1–10.

58.

Guo

Zhou

Chen

In-Depth Analysis of Traffic Conflicts at Interchange Merging Areas Based on Bayesian Network and Conflict Chain. 2022 IEEE 25th International Conference on Intelligent Transportation Systems (ITSC), IEEE, Macau, China, 2022, pp. 2258–2264.

59.

Zheng

Meng

X. H.

Research on Traffic Characteristics and Traffic Conflicts of the One-Way-Closure Work Zone on Freeway. Advanced Engineering Forum, Vol. 5, 2012, pp. 26–31.

60.

Zammit-Mangion

Cressie

FRK: An R Package for Spatial and Spatio-Temporal Prediction With Large Datasets. Journal of Statistical Software, Vol. 98, 2021, pp. 1–48.

61.

Cressie

Johannesson

Fixed Rank Kriging for Very Large Spatial Data Sets. Journal of the Royal Statistical Society Series B (Statistical Methodology), Vol. 70, No. 1, 2008, pp. 209–226.

62.

Lord

Mannering

The Statistical Analysis of Crash-Frequency Data: A Review and Assessment of Methodological Alternatives. Transportation Research Part A: Policy and Practice, Vol. 44, No. 5, 2010, pp. 291–305.

63.

Spiegelhalter

D. J.

Best

N. G.

Carlin

B. P.

Van der Linde

The Deviance Information Criterion: 12 Years On. Journal of the Royal Statistical Society: Series B (Statistical Methodology), Vol. 76, No. 3, 2014, pp. 485–493.

Expressway Rear-End Conflict Pattern Classification and Modeling

Abstract

Keywords

Research on Pattern Classification

Research on Conflict Models

Data Preparation

Data Source

Data Preprocessing

Conflict Characteristics

Independent Variable Definition and Test for Multicollinearity

Methodology

Improved K-Means Algorithm

Original K-Means Algorithm

Improved K-Means Algorithm

Multivariate Poisson-Lognormal Model Considering Spatial–Temporal Correlation

FRK

MVPLN Model

Results

Conflict Pattern Clustering Results

Multivariate Poisson-Lognormal Model Considering Spatial–Temporal Correlation

Model Parameters

Model Comparison

Conclusions

Footnotes

Author Contributions

Declaration of Conflicting Interests

Funding

ORCID iDs

Supplemental Material

References