Analysis of traffic state variation patterns for urban road network based on spectral clustering

Abstract

The traffic state evolution of urban road network is complicated and varies significantly with different roads, functional zones, and social activities. Considering the regularity of human travel activities, from a long-term perspective, typical traffic state variation patterns for road network could be extracted. In order to extract traffic variation features, spectral clustering technique, an unsupervised learning method, is applied to analyze daily traffic state variation for the region road network based on section-based traffic speed dataset. The proposed method transforms traditional clustering problems into graph partition problems, which is suitable for the clustering problems with multiple attributes by dimension reduction. In this study, five daily traffic state variation clusters are efficiently grouped with different regularities and ranges. The frequency distributions of the sections in each cluster are related with hierarchies, locations, and functions of roads. Long-term heavy-traffic road sections and abnormal traffic state caused by festivals are detected based on the analysis of clusters. The knowledge on the spatiotemporal diversity, similarity, and relativity for network traffic state variation can be naturally discovered. Traffic variation patterns could be incorporated into network-level traffic-prediction and route-guidance algorithms in intelligent transportation systems.

Keywords

Traffic state spectral clustering road network pattern recognition spatiotemporal analysis

Introduction

With the rapid growth of motor vehicles, recurrent and non-recurrent traffic congestion in urban road network brings about time delay, traffic pollution, and accident risks which are usually caused by the fact that road capacity cannot meet the increasing traffic demand. The conventional methods, like constructing new roads or widening congested roads to improve the road capacity, are becoming costly and improper considering the limited land resources in the cities.¹ Intelligent transportation systems (ITSs), such as advanced traffic management systems (ATMSs),² can effectively ease traffic congestion by monitoring and controlling urban traffic condition.

Due to strong recurrent daily activities, urban traffic state variation are actually in regular patterns with repeated emergence,³ for example the morning and evening traffic peak hours caused by work or school. With the increasing development of advanced traffic-sensing technologies, massive various traffic data can be collected by multiple sensors including microwave radars, video cameras, inductive loop detectors, global positioning system (GPS)-equipped vehicles, mobile phones,⁴ from which hidden compact patterns and knowledge can be extracted to support urban traffic management.⁵ The extracted traffic patterns are significant for the input of the macroscopic traffic models, traffic flow prediction, and imputation of missing data.⁶

Some techniques have been applied for traffic pattern identification, including principal component analysis (PCA), matrix factorization, and clustering. PCA was used to extract features for the network-level traffic pattern recognition.⁷ Non-negative matrix factorization was proposed to extract spatiotemporal traffic patterns and conduct long-term prediction.⁸ Clustering techniques are widely utilized in analyzing traffic state and uncovering the hidden structures in huge traffic data, realizing traffic jam classification and traffic volume prediction,⁹ identifying road traffic state dynamically,¹⁰ extracting daily traffic congestion patterns,¹¹ and detecting non-recurrent traffic congestion on urban road networks.¹² The dimensions of input data would significantly influence the performance of clustering, especially for the partitional clustering algorithm, and the density of points in the high-dimensional space could be low, which is not appropriate to identify clusters in the sparse space.¹³ As for the cluster algorithm, distance functions covering many dimensions of data could be ineffective caused by the noise or uniform distribution of values.¹⁴

K-means, as the most common partitional algorithm, has been applied to extract traffic patterns from historical data due to simplicity, efficiency, and empirical success.¹⁵ For example, K-means and expectation maximization (EM) algorithms were employed to extract an eigenvector from loop detector data describing the traffic condition of the intersection.¹⁶ Freeway traffic flow was divided into five different states with certain safety levels based on traffic occupancy data using K-means, identifying traffic states with high accuracy.¹⁷ However, K-means clustering algorithm, as a greedy algorithm,¹⁵ falls into local optimum when the sample space is not convex and gets unsatisfactory clusters.¹⁸ The computational efficiency of K-means is greatly affected by the dimensions of samples.

In this study, considering the complexity and dynamic of traffic variation for urban road network, the daily traffic patterns are extracted from time-series dataset, and represent the general variation regularity over time. Given the high dimensions of input time-series for daily traffic speed variation, dimension reduction needs to be executed before using the classical clustering methods, such as K-means. To this end, spectral clustering method, which can converge to global optimal solution effectively independent of the sample space shape, is proposed to identify the typical daily traffic variation patterns by transforming traditional clustering problems into graph partition problems and reducing the dimensionality of original time-series speed data via feature extraction.¹⁹

Spectral clustering has been mainly applied in trajectory analysis before, and, for example, it was used to extract trajectory patterns²⁰ and learn traffic patterns and layout of intersections automatically from vehicle trajectories.²¹ The computation efficiency and performance of spectral clustering is better than K-means.²² Moreover, spectral clustering does not make any assumptions on the distribution of data and mainly focuses on eigendecomposition of similarity matrix.^18,22

The main contribution of this study is to extract typical traffic variation patterns from section-based traffic speed data of urban road network, demonstrating the underlying variation characteristic and regularity. Spectral clustering is applied to analyze traffic variation patterns for the regional road network in Beijing during different time periods. Based on the clusters, spatiotemporal distribution features for these patterns are identified, including differences between weekdays, holidays, and road section hierarchies. Long-term heavy-traffic road sections and abnormal traffic state could also be detected based on the clusters. The traffic state variation patterns for road network can be incorporated into network-level traffic-prediction and route-guidance algorithms in ITS.

Characteristics of traffic state variation

Urban traffic system is a complex system with randomness, fuzziness, dynamic, and uncertainty attributes.¹⁰ Regular patterns demonstrating spatiotemporal features could be extracted from the historical traffic data of road network, on account of the strong recurrent activity regularity followed by urban residents.

Traffic state on the road network can be described by a series of traffic parameters including traffic volume, speed, and occupancy.²³ The standards of quantitative identification for the traffic state are not exclusive, and average traffic speed is utilized to describe traffic operation state in this study.

The daily traffic speed variation of weekdays and weekends are compared for three representative sections (a, b, and c) for in Beijing road network, as shown in Figure 1. Section a (East Third Ring Road) is the arterial road, while Section b (South Sanlitun Road) and Section c (Sanlitun Road) are the collector roads. The traffic variation of Section a and b is similar, and the traffic speed on weekend is higher than weekdays, and there exist obvious morning and evening rush hours for the workdays caused by school and work activities. Considering different speed limits, the traffic speed of Section b is generally lower than Section 1. Section c is located in the recreational area, of which the average speed variation is totally different from Sections a and b. There are no morning rush hours for Section c, and generally, the traffic condition in the afternoon and evening is more congested than other time, which is in accordance with the intensity of recreational activities nearby. The statistical analysis for these three sections is shown in Table 1.

Figure 1.

Daily traffic speed variation of weekday and weekend for three sections.

Table 1.

Statistical analysis for the hourly average speed of Sections a, b, and c.

Section	Mean	Standard deviation	Minimum	Maximum	25th	50th	75th
Section a (weekday)	37.70	4.97	28.78	45.21	33.42	37.14	42.17
Section a (weekend)	42.74	4.13	33.37	50.96	39.60	42.42	45.65
Section b (weekday)	20.09	4.91	11.10	28.11	16.10	20.14	25.33
Section b (weekend)	23.60	4.05	17.58	29.91	19.79	22.97	27.71
Section c (weekday)	24.75	5.17	15.25	34.90	20.94	24.26	29.39
Section c (weekend)	24.37	4.66	18.25	36.31	19.98	23.82	27.64

Sections in the road network present various daily traffic variations in space and time, which are related with various factors, including daily activities (e.g. work, school, business, and recreation), large-scale social activities (e.g. sports event and vocal concert), holidays, and weather.

Spectral clustering method

Traffic state variation for road network is described by hourly average section-based traffic speed data, and the speed time-series of daily traffic variation for each section could be treated as a single object.²⁴ The time-series with similar traffic state variation are grouped into clusters using spectral clustering. Sample points in high-dimensional space can be mapped to a low-dimensional space through the eigenvectors of Laplacian matrix derived from similarity graphs²⁵ and achieve a better clustering performance and a lower computational complexity.¹⁹ In this study, the normalized spectral clustering algorithm (Ng–Jordan–Weiss (NJW) algorithm)¹⁸ is applied for the pattern analysis of daily traffic state variation.

Basic theories

As for spectral clustering, similarity graph is constructed to describe the relationships between any two data points. The dimensions of original data can be reduced through the eigenvectors of graph Laplacian. The transformation of clustering problems can be explained by graph cut theory.

Similarity graphs

First, data points in the original dataset $X = {x_{1}, x_{2}, \dots, x_{n}}$ need to be transformed into the corresponding vertex set $V = {v_{1}, v_{2}, \dots, v_{n}}$ of graph structure, and the edge between vertex $v_{i}$ and $v_{j}$ is weighted by similarity $w_{ij} (w_{ij} \geq 0)$ , reflecting the relationships between the data points. The vertex set V and edges with weighted adjacency matrix $W = (w_{ij})_{i, j = 1, \dots, n} (w_{ij} = w_{ji})$ constitute the undirected similarity graph G of spectral clustering. For daily traffic pattern analysis, $x_{i} (i = 0, 1, \dots, n)$ can be treated as the hourly average speed time-series for every day of each section, n denotes the total number of speed time-series, that is, n = number of sections × number of days, $v_{i} (i = 0, 1, \dots, n)$ is the corresponding vertex of $x_{i}$ in graph structure, and $w_{ij}$ represents the similarity between any two hourly average speed time-series.

The goal of spectral clustering is to find efficient graph partition with low-weight edges between different clusters and high-weight edges within the same cluster, namely, the min-cut problem.¹⁹ In this study, fully similarity connected graph is constructed and the weights of edges are calculated by the Gaussian similarity function

w_{ij} = \exp (\frac{- {‖ x_{i} - x_{j} ‖}^{2}}{2 σ^{2}})

(1)

where the parameter $σ$ controls the width of the neighborhoods,¹⁹ and in this study, $σ = 1$ .

Graph Laplacians

The degree of vertex $v_{i} \in V$ is defined as $d_{i} = \sum_{j = 1}^{n} w_{ij}$ , and the degree matrix D is a diagonal matrix with the degrees $d_{1}, \dots, d_{n}$ of all vertexes on the diagonal. The unnormalized graph Laplacian L and normalized graph Laplacians $L_{norm}$ are defined as

L = D - W

(2)

L_{norm} = D^{- 1 / 2} L D^{- 1 / 2} = I - D^{- 1 / 2} W D^{- 1 / 2} = I - L'

(3)

$L_{norm}$ is positive semi-definite and has n non-negative real-valued eigenvalues, that is $0 = λ_{1} \leq \dots \leq λ_{n}$ . For every $f \in R^{n}$ , $L_{norm}$ satisfies equation (4)²⁶

f' L_{norm} f = \frac{1}{2} \sum_{i, j = 1}^{n} w_{ij} (\frac{f_{i}}{\sqrt{d_{i}}} - \frac{f_{j}}{\sqrt{d_{j}}})

(4)

The high-dimensional sample points $x_{i}$ can be mapped to a low-dimensional space through the eigenvectors of graph Laplacian matrix.

Graph cut theory

Normalized spectral clustering algorithm is applied for traffic state clustering by relaxing the normalized cut Ncut,²⁷ which is defined as

Ncut (A_{1}, \dots, A_{k}) = \frac{1}{2} \sum_{i = 1}^{k} \frac{W (\bar{A_{i}}, A_{i})}{vol (A_{i})}

(5)

where $\bar{A_{i}}$ is the complement of $A_{i}$ , $W (\bar{A_{i}}, A_{i}) = \sum_{i \in \bar{A_{i}}, j \in A_{i}} w_{ij}$ , and $vol (A_{i})$ are the weights of edges in $A_{i}$ ,which can balance the size of clusters. For a k-way partitioning of the vertices, the minimization problem is turned into a non-deterministic polynomial-time (NP)-hard problem due to the balance condition as follows^19,28

minimize \sum_{i = 1}^{k} \frac{W (\bar{A_{i}}, A_{i})}{vol (A_{i})}

(6)

First, the indicator vectors $h_{j} = (h_{1, j}, \dots, h_{n, j})'$ are defined as

h_{i, j} = {\begin{matrix} 1 / \sqrt{vol (A_{j})} & if v_{i} \in A_{j} \\ 0 & otherwise \end{matrix}

(7)

Given the particular properties of graph Laplacians L

h'_{i} L h_{i} = cut (\bar{A_{i}}, A_{i}) / vol (A_{i})

(8)

Therefore, the minimizing Ncut problem be equivalent to

min_{A_{1}, \dots, A_{k}} Tr (H' LH) subject to H' DH = I

(9)

The discreteness Ncut problem is relaxed by substituting $T = D^{1 / 2} H$ as follows

min_{T \in R^{n \times k}} Tr (T' D^{- 1 / 2} L D^{1 / 2} T) subject to T' T = I

(10)

Thus, we get the standard trace minimization problem, and according to the Rayleigh–Ritz theorem, the first k eigenvectors of $L_{norm} = D^{- 1 / 2} L D^{1 / 2}$ are the columns of matrix T.²⁹ The normalized cut problem can be solved by computing the k smallest eigenvalues and eigenvectors of $L_{norm}$

D^{- 1 / 2} L D^{1 / 2} x = λ x

(11)

which is also equivalent to the k-largest eigenvalues and eigenvectors of $D^{- 1 / 2} W D^{1 / 2}$ as follows

D^{- 1 / 2} (D - W) D^{1 / 2} x = λ x \Rightarrow D^{- 1 / 2} W D^{1 / 2} x = (1 - λ) x

(12)

Normalized spectral clustering algorithm

Ng et al.¹⁸ selected the first k eigenvectors and conducted clustering in the space of $R^{k}$ corresponding to the original data. Given the dataset $X = {x_{1}, x_{2}, \dots, x_{n}}$ to be clustered, and X needs to be clustered into k groups, and parameter k needs to be set in advance. The NJW clustering algorithm is described as follows:

Step 1. Transform original data points into undirected graph G. Construct similarity matrix by Gaussian similarity function in equation (1).

Step 2. Construct the degree matrix D, and the matrix $L' = D^{- 1 / 2} W D^{- 1 / 2}$ .

Step 3. Compute the corresponding eigenvectors $u_{1}, u_{2}, . . ., u_{k}$ of the k-largest eigenvalues $λ_{1}, λ_{2}, \dots, λ_{k}$ and construct matrix $U = [u_{1}, u_{2}, \dots, u_{k}] \in R^{n \times k}$ containing the vectors $u_{1}, u_{2}, . . ., u_{k}$ as columns.

Step 4. Normalize the rows of U to unit vectors and get a new matrix $E \in R^{n \times k}$ , that is, $e_{ij} = {u_{ij} / (\sum_{k} u_{ik}^{2})}^{1 / 2}$ .

Step 5. Cluster the rows $e_{1}, e_{2}, \dots, e_{n}$ of matrix E into k clusters $E_{1}, \dots, E_{k}$ utilizing the K-means algorithm.

Step 6. Based on the clustering results in step 5, if the row $e_{i} (i = 1, 2, \dots, n)$ is grouped to the cluster $E_{j} (j = 1, 2, \dots, k)$ , the original point $x_{i} (i = 1, 2, \dots, n)$ belongs to cluster $X_{j} (j = 1, 2, \dots, k)$ . We can get the final clustering results by this way.

Clustering performance index

Different data partitions can be obtained for different input parameter values in the clustering algorithm. The clustering results with the most compact and well-separated clusters can be recommended for the original dataset. Various indices are proposed to pick the best number of clusters for a specific dataset, for example, Dunn, CH, CI, and Davies–Bouldin (DB).³⁰

In this study, the DB criterion,³¹ serving as a clustering index, is picked to evaluate the performance of spectral clustering with different numbers of clusters, which is defined in equations (13) and (14)

D_{i, j} = \frac{d_{i} + d_{j}}{d_{i, j}}

(13)

DB = \frac{1}{k} \sum_{i = 1}^{k} max_{j \neq i} {D_{i, j}}, (i, j = 1, 2, \dots, k)

(14)

where $d_{i}$ is the average Euclidean distance between each data point of cluster i and the centroid of cluster i, and $d_{i}$ is the average Euclidean distance between each point of cluster j and the centroid of cluster j. $d_{i, j}$ is the Euclidean distance between the centroids of clusters i and j. $D_{i, j}$ is the ratio of within-cluster and between-cluster distances of clusters i and j, which can reflect the aggregation and dispersion degree for various data partitions. DB index is widely applied in picking suitable cluster algorithm and the number of clusters. Obviously, the clustering scheme with the smallest DB value represents the optimal clustering performance.

Case study

Data description

In this study, daily traffic state variation patterns are extracted from section-based traffic speed dataset. The research area contains 55 road sections, including 33 arterial roads and 22 collector roads in the northeast region of Beijing, as shown in Figure 2. Average traffic speed data is obtained from open data online provided by NavInfo Co., Ltd, which integrates different sources of measured data, including the GPS data of taxis and private cars. The traffic speed data of consecutive 28 days (4 weeks from 28 December 2015 to 24 January 2016) is collected. The number of total input data items is n = 1540 (55 × 28). Each record is a tuple in the form of $x_{i} = {s_{0}, s_{1}, s_{2}, \dots, s_{23}}, i = 1, 2, \dots, 1540$ , where $s_{0}, s_{1}, s_{2}, \dots, s_{23}$ represent hourly average traffic speed for each hour in 1 day and also are treated as 24 attributes to describe the daily traffic variation for each road section, forming a the high-dimensional space.

Figure 2.

Regional road network in Beijing.

Before clustering, traffic speed time-series data is normalized with Z-score normalization technique.³² The normalized items with similar variation are grouped together by spectral clustering technique. Each data item can be treated as a vertex in the graph. We construct a similarity connected graph $W_{1540 \times 1540}$ by the Gaussian similarity function. According to the NJW algorithm, the graph is divided into several connected sub-graphs, and daily traffic speed variation clusters are obtained simultaneously. Based on hourly average traffic speed data at network level for a long time scale, traffic state variation can be identified and grouped into typical clusters, presenting the daily traffic fluctuation and congestion duration characteristics.

Number of clusters

The clustering performance is highly dependent on the number of clusters. If the parameter k is set too large, then the difference among some clusters would be small. According to the experiments with different parameter settings, considering the visible differences need to be kept among different patterns, the cluster number is set from 1 to 10 in this study.

The experiments with different numbers of clusters are completed, and the corresponding DB index values are shown in Figure 3. The DB index value reaches the lowest when the cluster number is 5, representing the optimal clustering performance, as shown in Figure 3. Therefore, daily traffic speed variation in the regional road network for 4 weeks is finally clustered into five groups.

Figure 3.

DB index values according to different cluster numbers.

Analysis for clusters

The size of five clusters is not uniform, and the elements for each cluster are significantly heterogeneous, as shown in Tables 2 and 3. The number of daily traffic speed variation time-series for Clusters 2 and 3 occupy 71.7% of the total, signifying that the daily traffic speed variation of patterns 2 and 3 occur most frequently in this road network. Each cluster consists of the daily traffic speed of different sections for different days. The daily traffic state variation of one section may belong to different clusters for different days, for example, Sections 2, 6, and 10 are found in both Clusters 3 and 4. The frequency distributions of the sections in each cluster are different, for example, the frequency distributions of Clusters 2 and 3 are shown in Figure 4.

Table 2.

The number of daily traffic speed variation time-series for each cluster.

Cluster	1	2	3	4	5
Number	109	551	553	284	43

Table 3.

Road section ID for each cluster.

Clusters	Road section ID
1	1, 8, 20, 21, 26, 36, 37, 39, 46, 54
2	7, 11, 14, 15, 16, 18, 21, 22, 23, 26, 27, 30, 32, 33, 34, 35, 36, 37, 38, 39, 42, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55
3	2, 3, 4, 5, 6, 7, 9, 10, 11, 12, 13, 14, 16, 17, 19, 22, 23, 24, 27, 29, 30, 31, 32, 33, 34, 38, 40, 41, 42, 43, 44, 46, 48, 49, 50, 51, 52, 53, 55
4	2, 5, 6, 10, 12, 13, 19, 24, 25, 27, 28, 29, 31, 40, 43, 44, 45
5	25, 28, 45

Figure 4.

The frequency distribution of Cluster 2 and Cluster 3.

Each cluster has a centroid, and the centroid set of clusters is defined as $C = {C_{1}, C_{2}, C_{3}, C_{4}, C_{5}}$ .The coordinates of clustering center $C_{m} (m = 1, 2, 3, 4, 5)$ in multidimensional space are set as vector $Y_{m} = {y_{m, 0}, y_{m, 1}, \dots, y_{m, 23}}$ , which is also the average traffic speed of the whole data points in cluster m for each hour. The vectors of centroids $Y_{1}, Y_{2}, Y_{3}, Y_{4}, and Y_{5}$ reflect the daily traffic speed variation characteristic for the five clusters, respectively, as shown in Figure 5.

Figure 5.

Average traffic speed of clustering centers (km/h).

There exist significant gaps in traffic speed values of different clustering centers, signifying the fluctuation features and ranges of daily traffic state for each cluster, and traffic congestion levels of different clusters. The hourly average traffic speed values of each cluster center drop to the bottom between 18:00 and 19:00 (except the center of Cluster 5), which is in accordance with evening traffic peak hour. In addition, the traffic speed of the cluster centers in the daytime is lower than that of nighttime.

Spectral clustering technique can detect the key features of traffic dynamic for urban road network. Actually, the data points in Figure 5 are just the centers of five clusters, and the speed variation for five clusters is quite different from each other. In order to present the speed variation characteristics for clusters, Sections 1, 15, 3, 5, and 25 with the highest frequency in Clusters 1, 2, 3, 4, and 5 are treated as the representatives to demonstrate the traffic state variation on 11 December, as shown in Figure 6. The speed range and fluctuation trend of five representative sections is obviously different from each other, caused by different locations, road hierarchies, and functions. Section 1 with the lowest speed in Cluster 1 locates in the center of the entertainment districts, while the traffic state of Section 25 is quite smooth the whole day. The traffic speed of Sections 3 and 5 all belonging to main road reach the bottom at evening rush hour on account of huge traffic for working or recreational travels. In general, the traffic speed values vary in a particular variation for each cluster, and the curves do not get crossed with each other. Clusters 1, 2, 3, 4, and 5 can be viewed as five typical daily traffic variation patterns, which are set as patterns 1, 2, 3, 4, and 5, respectively.

Figure 6.

The representative daily traffic state variation for five clusters.

Spatial and temporal distribution of patterns

The spatial and temporal distribution features of the daily traffic variation in each cluster are detected based on the clustering results above. The daily speed variation of one section could be attached to different patterns for different days, and the traffic speed variation of the sections in the regional network for 1 day belongs to different patterns as well. The distribution of patterns provides insight into various long-term or short-term features of the regional road network, which can be applied for traffic prediction and route guidance in the urban road network.

Spatial distribution characteristics

Differences of sections

The frequencies of different sections for each cluster vary greatly, and the sections with the highest frequency for each cluster are shown in Table 4. There exist several sections completely belonging to one pattern, representing that the daily traffic speed fluctuation of these sections keeps steady for 4 weeks. The daily variation in 4 weeks for Sections 1, 8, and 20 totally belongs to pattern 1 and stays quite congested condition in long term with an average traffic speed of 25 km/h, and thus, traffic management measures need to be carried out to ease congestion. Similarly, Sections 15, 18, 35, and 47 totally belong to pattern 2, and Sections 3, 4, 9, 17, and 41 belong to pattern 3.

Table 4.

Sections with the highest frequency for each cluster.

Cluster	Highest frequency	Corresponding sections
1	28	1, 8, 20
2	28	15, 18, 35, 47
3	28	3, 4, 9, 17, 41
4	27	5, 13, 19, 31
5	24	25

The traffic state variation of some sections stay in one pattern in most cases, while that of several days belong to other patterns, which can be treated as abnormal traffic states. For example, the daily variation of 27 days in Sections 34, 49, 50, and 52 belongs to pattern 2, while 1 day left (3 January 2016) for the four sections belongs to pattern 3. The reason is that traffic for work and school is reduced on New Year’s holidays, improving traffic condition. Conversely, the traffic state variation of Section 39 belongs to pattern 2, except that of New Year’s day being attached to pattern 1, because Section 39 is next to a park where celebration activities attract a large number of tourists, seriously affecting the traffic conditions. Therefore, the effect on traffic state caused by holidays vary from sections to some extent, and meanwhile, we can detect the main function of different sections, that is, Sections 34, 49, 50, and 52 are for work or school traffic, while Section 39 is mainly for the tourism traffic.

Section connectivity within the clusters

The sections with the occurrence frequency of 24–28 for Clusters 2 and 3 are treated as highly belonging to patterns 2 and 3, marked in red, as shown in Figure 7(a) and (b), respectively (the study road network area is surrounded by the blue links). Obviously, these sections are connected in part, sharing the same daily traffic state variation pattern. Thus, we can infer that the traffic state variations of adjacent sections are relevant, and thus, the clustering technique can be applied for region division with similar section-based traffic state variation in large-scale road network.

Figure 7.

Sections with high occurrence frequencies in Clusters 2 and 3 (red highlight): (a) pattern 2 and (b) pattern 3.

Road hierarchy composition for each cluster

Roads of each hierarchy play specific roles in the urban road network, and generally present different traffic state variation patterns. The study area includes arterial roads and collector roads, and the frequency proportions of these two hierarchies of roads in each cluster are shown in Figure 8. The proportions of arterial roads in the latter patterns are higher than the former patterns, and pattern 5 consists of arterial roads completely. On the contrary, the collector roads account for the largest proportion in pattern 1. The traffic speed values increase from pattern 1 to pattern 5 on the whole. Thus, the daily traffic state of collector roads tends to keep lower speed than arterial road, which is in accordance with road design specifications.

Figure 8.

Road section proportions of different hierarchies for each cluster.

Temporal distribution characteristics

Different patterns are involved in the daily traffic variation at the road network level for each day. The frequencies of each pattern from Monday to Sunday for 4 weeks are shown in Figure 9 to reflect the temporal distribution. Obviously, there is no apparent difference among the pattern distribution from Monday to Sunday. Patterns 2, 3, and 4 occupy the main components of network traffic condition for 1 day.

Figure 9.

Frequencies of each pattern from Monday to Sunday for 4 weeks.

Considering the impact on traffic state caused by New Year’s holiday, pattern distribution for holidays is compared with that of non-holidays. Taking the first week, for example, as shown in Figure 10, the former 4 days are working days, while the latter 3 days are holidays. Notably, there is no section belonging to pattern 5 on Friday (1 January 2016, New Year’s day). Holidays may lead to the worse traffic state at network level.

Figure 10.

Frequencies of each pattern from Monday to Sunday for the first week.

Conclusion

In this study, spectral clustering is applied to analyze daily traffic state variation and extract typical patterns for the urban region road network, and it transforms traditional clustering problems into graph partition problems and is suitable for extracting traffic variation patterns with multiple attributes in high-dimensional space.

Five typical daily traffic state variation clusters with different regularities are grouped together. The size of five clusters is not uniform, and Clusters 2 and 3 are the most frequent occurrences, occupying 71.7% of the total. The elements for each cluster are heterogeneous, and the frequency distributions of the sections for each cluster are related with road hierarchies and functions. Significant traffic speed gaps exist among different clusters, signifying traffic congestion levels of different clusters. The spatial and temporal similarities of road network traffic state are extracted, and long-term heavy-traffic road sections and abnormal traffic status caused by holidays and festivals are also detected based on the clusters.

The spatiotemporal diversity, similarity, and relativity of urban road network traffic state variation could be uncovered by clustering analysis. The daily traffic state variation patterns for road network provide bases for urban traffic management and control. The spatiotemporal traffic variation patterns for large-scale road network can be incorporated into traffic-prediction and route-guidance algorithms in the future work.

Footnotes

Academic Editor: Xiaobei Jiang

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

References

Zhang

Wang

et al . Data-driven intelligent transportation systems: a survey. IEEE T Intell Transp 2011; 12: 1624–1639.

Lee

Tseng

Shieh

et al . Discovering traffic bottlenecks in an urban network by spatiotemporal data mining on location-based services. IEEE T Intell Transp 2011; 12: 1047–1056.

Soriguera

Deriving traffic flow patterns from historical data. J Transp Eng: ASCE 2012; 138: 1430–1441.

Leduc

Road traffic data: collection methods and applications. Working papers on Energy, Transport and Climate Change, no. 55, 2008, vol. 1, http://ftp.jrc.es/EURdoc/JRC47967.TN.pdf

Elhenawy

Chen

Rakha

HA.

Dynamic travel time prediction using data clustering and genetic programming. Transport Res C: Emer 2014; 42: 82–98.

Weijermars

Van Berkum

Analyzing highway flow patterns using cluster analysis. In: Proceedings of the 2005 IEEE intelligent transportation systems, Vienna, 16 September 2005, pp.308–313. New York: IEEE.

Ren

Zhang

et al . Research on network-level traffic pattern recognition. In: Proceedings of the IEEE 5th international conference on intelligent transportation systems, Singapore, 6 September 2002, pp.500–504. New York: IEEE.

Han

Moutarde

Statistical traffic state analysis in large-scale transportation networks using locality-preserving non-negative matrix factorisation. IET Intell Transp Sy 2013; 7: 283–295.

Stutz

Runkler

TA.

Classification and prediction of road traffic using application-specific fuzzy clustering. IEEE T Fuzzy Syst 2002; 10: 297–308.

10.

Jiang

Wang

Zhang

et al . The study on the application of fuzzy clustering analysis in the dynamic identification of road traffic state. In: Proceedings of the 2003 intelligent transportation systems, Shanghai, China, 12–15 October 2003, vol. 2, pp.1149–1152. New York: IEEE.

11.

Wen

Sun

Zhang

Study on traffic congestion patterns of large city in China taking Beijing as an example. Procd Soc Behv 2014; 138: 482–491.

12.

Anbaroglu

Heydecker

Cheng

Spatio-temporal clustering for non-recurrent traffic congestion detection on urban road networks. Transport Res C: Emer 2014; 48: 47–65.

13.

Berchtold

Böhm

Keim

et al . A cost model for nearest neighbor search in high-dimensional data space. In: Proceedings of the 16th ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems, Tucson, AZ, 11–15 May 1997, pp.78–86. New York: ACM.

14.

Agrawal

Gehrke

Gunopulos

et al . Automatic subspace clustering of high dimensional data for data mining applications. SIGMOD Rec 1998; 27: 94–105.

15.

Jain

AK.

Data clustering: 50 years beyond K-means. Pattern Recogn Lett 2010; 31: 651–666.

16.

Guo

Yao

et al . Study on the method for regional traffic flow feature extraction and traffic status evaluation. J Highw Transport Res Dev 2005; 7: 101–104.

17.

Liu

Wang

et al . Evaluation of the impacts of traffic states on crash risks on freeways. Accident Anal Prev 2012; 47: 162–171.

18.

Jordan

Weiss

On spectral clustering: analysis and an algorithm. Adv Neur In 2002; 2: 849–856.

19.

Von Luxburg

. A tutorial on spectral clustering. Stat Comput 2007; 17: 395–416.

20.

Porikli

Learning object trajectory patterns by spectral clustering. In: Proceedings of the 2004 IEEE international conference on multimedia and expo (ICME’04), Taipei, Taiwan, 27–30 June 2004, vol. 2, pp.1171–1174. New York: IEEE.

21.

Atev

Masoud

Papanikolopoulos

. Learning traffic patterns at intersections by spectral clustering of motion trajectories. In: Proceedings of the 2006 IEEE/RSJ international conference on intelligent robots and systems, Beijing, China, 9–15 October 2006, pp.4851–4856. New York: IEEE.

22.

Morris

Trivedi

Learning trajectory patterns by clustering: experimental studies and comparative evaluation. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR 2009), Miami, FL, 20–25 June 2009, pp.312–319. New York: IEEE.

23.

Stathopoulos

Karlaftis

MG.

A multivariate state space approach for urban traffic flow modeling and prediction. Transport Res C: Emer 2003; 11: 121–135.

24.

Wang

Smith

Hyndman

Characteristic-based clustering for time series data. Data Min Knowl Disc 2006; 13: 335–364.

25.

Dhillon

Guan

Kulis

Kernel k-means: spectral clustering and normalized cuts. In: Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining, Seattle, WA, 22–25 August 2004, pp.551–556. New York: ACM.

26.

Chung

FRK

. Spectral graph theory (CBMS regional conference series in mathematics, no. 92). Providence, RI: American Mathematical Society, 1996.

27.

Shi

Malik

Normalized cuts and image segmentation. IEEE T Pattern Anal 2000; 22: 888–905.

28.

Wagner

Between min cut and graph bisection. Berlin, Heidelberg: Springer, 1993, pp.744–750.

29.

Lutkepohl

Handbook of matrices. Comput Stat Data An 1997; 2: 243.

30.

Zhou

Yang

Ding

et al . On cluster validation. Syst Eng: Theor Pract 2014; 34: 2417–2431.

31.

Davies

Bouldin

DW.

A cluster separation measure. IEEE T Pattern Anal 1979; 1: 224–227.

32.

Al Shalabi

Shaaban

Kasasbeh

. Data mining: a preprocessing engine. J Comput Sci 2006; 2: 735–739.