Sage Journals: Discover world-class research

Abstract

The macroscopic fundamental diagram (MFD) is an effective model for monitoring, predicting, and controlling urban traffic. A critical factor in accurately observing an empirical MFD is the spatial homogeneity of traffic states, which can be achieved through effective clustering of urban networks. This study explores and compares various machine learning (ML) clustering algorithms, including k-means, Gaussian mixture models (GMM), density-based spatial clustering of applications with noise (DBSCAN), and hierarchical agglomerative clustering (HAC), to enhance traffic homogeneity and spatial connectivity. Our methodology evaluates these algorithms using traffic-relevant indicators such as homogeneity, connectivity, MFD shape, and travel time estimation accuracy, contrasting them with clustering methods from the literature, such as snake similarity and sensor-bias-corrected community detection (SBCCD). The results demonstrate that ML algorithms, especially when optimized for hyperparameters, consistently surpass state-of-the-art methods in computational performance and, in most cases, achieve a good balance between homogeneity and connectivity of the resulting clusters. Specifically, GMM produces highly homogeneous clusters ideal for routing and macroscopic travel time estimation, although they lack connectivity. In contrast, HAC ensures connected clusters suitable for traffic management strategies such as perimeter control and congestion pricing. k-means achieve a balanced performance between homogeneity and connectivity, while DBSCAN effectively identifies congestion pockets that may interest transport planners. These findings highlight ML algorithms as promising alternatives to more complex methods.

Keywords

Urban traffic macroscopic fundamental diagram (MFD)traffic homogenity network connectivty clustering machine learning simulation

In analogy to the traffic fundamental diagram (FD) at the link level, Daganzo ( 1 ) and Geroliminis and Daganzo ( 2 , 3 ) have demonstrated the existence of a relationship between the average flow and the average density at the network level, known as the macroscopic fundamental diagram (MFD) or the network fundamental diagram (NFD). The MFD can be approximated analytically ( 4 – 7 ) or measured empirically ( 8 – 10 ). The analytical approximation is valid only for arterial roads without any net turning flows at intersections, or for symmetrical grid networks that can be approximated as arterials. The corresponding MFD is therefore considered independent of demand and influenced solely by supply-related characteristics. As such, the analytical MFD represents the upper bound of traffic performance, observable under regularity conditions, characterized by slowly varying and well-distributed demand. However, real networks are rarely homogeneous and turning flows do affect congestion patterns. Consequently, empirical MFDs are clearly demand-dependent and are particularly influenced by the spatial and temporal distribution of vehicles within the network, as demonstrated, for example, by the hysteresis phenomenon.

Many studies have been devoted to understanding the effect of relaxing the regularity conditions, specifically the heterogeneous spatial distribution of traffic density ( 8 , 11 – 13 ). It has been shown that travel demand ( 14 – 17 ), road topology ( 8 , 18 , 19 ), control strategies ( 17 , 20 ), and route choice ( 5 , 21 ) significantly affect the spatial distribution of traffic density and, therefore, the shape of the MFD.

The MFD provides a basis for efficiently modeling, monitoring, and controlling urban traffic at a low computational cost. As a result, MFD-based traffic management and control applications have gained increasing attention in research. These include perimeter control ( 22 ), macroscopic route guidance ( 21 ), parking management ( 23 ), mobility pricing ( 24 , 25 ), and road space allocation ( 26 ). The fundamental idea behind many MFD applications is to monitor and predict traffic conditions at the regional level under certain constraints, and to react accordingly to maintain traffic densities within a favorable range. However, MFD scatter or hysteresis can reduce the accuracy of traffic state predictions. If the traffic state in a network cannot be predicted accurately, the quality of control based on these predictions will consequently decrease. Therefore, identifying homogeneous regions within the network is crucial to observing a well-defined MFD and ultimately improving the effectiveness of MFD-based applications. Several clustering algorithms have been proposed in the literature for this purpose.

The remainder of this paper is organized as follows. The literature review section outlines the recent state of the art and presents the contribution of this work. The methodology section describes the approach used in our comparative analysis, including the algorithms evaluated. The case studies that form the basis of the analysis are introduced in the case studies section. This is followed by a presentation and interpretation of the results in the results section. Finally, we conclude with a discussion of the implications of our findings in the practical applicability and generalizability and Conclusion sections.

Literature Review

The main goal of any clustering method is to group urban network links based on an underlying measure of similarity to facilitate future applications. A crucial consideration is how to effectively cluster urban networks into regional sub-networks. Ideally, the partitioned regions should balance several factors: they should be of similar size, topologically connected, compact, and well separated. At the same time, traffic conditions within each region should be relatively homogeneous to reduce MFD scatter or hysteresis. These objectives can often be contradictory, requiring trade-offs. Different MFD applications may prioritize one objective over another, depending on their specific needs and goals.

For routing applications and fleet control, it is important that the traffic state within each cluster is homogeneous ( 27 – 29 ). For multi-reservoir MFD modeling, the compactness and connectivity of each region are crucial ( 30 ); although not strictly required, these properties also simplify the estimation of the region’s average trip length ( 31 ). When congestion propagation is modeled with reaction–diffusion methods, partitioning the urban road network into regions that share similar congestion dynamics enables accurate modeling ( 32 ). Finally, for perimeter-control strategies, compact and well-connected regions are again essential ( 33 ).

Various researchers have explored different methodologies for partitioning urban networks for MFD applications. Ji and Geroliminis ( 34 ) proposed a static method involving: (i) initial segmentation based on Gaussian kernel similarity and the normalized cut algorithm, (ii) merging, and (iii) boundary adjustment. While foundational, this method was tested on a symmetric network and did not account for directional flows or similarities between adjacent links. Saeedmanesh and Geroliminis ( 35 ) developed a static clustering approach using “snake” similarities, which, despite capturing spatial correlations, is computationally intensive, particularly for large-scale networks. An et al. ( 36 ) suggested a static methodology combining lambda-connectedness and region-growing segmentation, requiring minimal input data. Nonetheless, uneven cluster sizes affected MFD estimation. Ambühl et al. ( 37 ) proposed a Monte-Carlo-like partitioning method based on community detection and random-walk search to generate homogeneous and connected partitions using stationary sensor data. Batista et al. ( 38 ) employed Gaussian mixture models (GMM) for static clustering based on topological characteristics, resulting in connected clusters; however, the homogeneity of the resulting clusters was not examined.

Incorporating the temporal dimension and thereby accounting for the dynamic aspect of urban traffic, Zhou et al. ( 39 ) proposed a method to calculate the correlation between adjacent intersections, considering both physical features and dynamic traffic measurements. They then employed a community detection method to cluster the network. Although the method produced homogeneous, connected, and compact regions, it was tested only on a synthetic grid-type network. Ji et al. ( 40 ) utilized the concept of maximum connected components from graph theory to dynamically identify congested parts of the network, providing insights into congestion propagation, but encountered practical challenges in perimeter control owing to non-compact cluster borders. Pascale et al. ( 41 ) proposed a dynamic method that extends ( 34 ) by incorporating time-dependent density measurements and refining similarity estimation. Rempe and Bogenberger ( 42 ) applied clustering to extract congestion-prone regions, analyzed their correlations, and utilized this analysis to formulate a k-nearest neighbors (KNN) predictor for estimating travel time losses. Saeedmanesh and Geroliminis ( 43 ) further extended their snake method to include the temporal dimension, ensuring high connectivity and homogeneity, albeit at a high computational cost, thereby raising questions about its practical feasibility. Lopez et al. ( 44 ) compared various clustering methods using speed measurements from travel time data, finding k-means marginally superior. This was tested on an abstract representation of Amsterdam’s road network. Casadei et al. ( 45 ) developed a graph theory-based clustering method for urban networks, in which nodes are clustered if connected by paths meeting specific criteria related to road type and speed. Clusters are formed by removing non-conforming edges and identifying weakly connected components. This method yielded connected paths rather than compact regions. Bellocchi et al. ( 46 ) proposed a dynamical efficiency measure in multimodal multilayer networks that detects evolving congestion clusters and critical intermodal junctions, enhancing understanding of urban traffic dynamics. Jiang et al. ( 47 ) explored partitioning urban networks with polycentric congestion patterns. Their dynamic method involves a six-step algorithm for partitioning large-scale urban networks through the integration of traffic data and geographic connectivity. This process includes graph definition, data pre-processing, feature handling, identification of clusters and partitions, and boundary reshaping.

The list of papers earlier demonstrates the significant interest in the clustering of urban networks within the research community. However, it is well established that no single method is optimal for clustering urban networks, which justifies the diversity of existing approaches ( 48 ). Most of these methods focus on clustering based on graph theory and involve complex estimations of similarity between each pair of links within the network. These approaches can be computationally intensive, a constraint that becomes more pronounced as the network size increases. The goal of this study is to compare multiple clustering algorithms and methods to evaluate the trade-offs between traffic homogeneity, spatial connectivity, and computational cost for practice-relevant applications such as route guidance and perimeter control. In pursuing this goal and addressing the related research gap, the present work makes the following contributions to the state of research:

We compare machine learning (ML) clustering algorithms and other state-of-the-art methods to assess trade-offs between traffic homogeneity, spatial connectivity, travel time estimation accuracy, and computational cost for specific practice-relevant applications.

We conduct the comparison using calibrated traffic simulations based on real networks to enhance the practical relevance of our study.

We perform the study on different networks to mitigate case-specific effects and increase the generality of our conclusions.

Methodology

As stated earlier, the goal of this study is to compare ML clustering algorithms and other state-of-the-art methods. ML clustering algorithms vary in their mathematical formulation, hyperparameters, and computational complexity. Accordingly, the quality of results for applications varies, and the appropriate clustering algorithm for a specific application often needs to be selected experimentally ( 49 ). Therefore, we apply different ML algorithms to the network clustering problem. The focus is on clustering algorithms that use Euclidean distance as a measure of similarity between data points. Owing to their simplicity and computational efficiency, such algorithms are well suited for large-scale and real-time applications.

Building on this foundation, within the proposed methodological framework for clustering urban networks, the algorithms evaluated include k-means, GMM, density-based spatial clustering of applications with noise (DBSCAN), and hierarchical agglomerative clustering (HAC). The schematic illustration in Figure 1 outlines the process, beginning with the collection of static link attributes and traffic data from a simulation scenario. This is followed by the optimization of clustering hyperparameters and feature weights to achieve optimal clustering results. The performance of these algorithms is assessed using a set of traffic-relevant indicators that reflect the objectives of the clustering process, namely homogeneity, connectivity, and travel time estimation accuracy. The ML algorithms are compared against state-of-the-art methods, primarily the snake similarity and sensor-bias-corrected community detection (SBCCD). The following sections introduce the performance indicators, detail the formulation of the clustering problem, describe the solution algorithm, and outline the implementation procedure.

Figure 1.

Schematic illustration of utilised methods.

Performance Indicators

The quality of the clustering results is evaluated using indicators in three categories, reflecting the main objectives of creating homogeneous, connected clusters that facilitate MFD-based applications.

Homogeneity

Homogeneity is assessed using two indicators. First, the normalized total variance, which is defined as the ratio of the total within-cluster variance to the global variance of the network. To account for cluster size, the variance of each cluster is weighted by its length. The formula for ( $T V_{n}$ ) is given by Equation 1.

T V_{n} = \frac{\sum_{c \in C} l_{c} \cdot var (c)}{l_{G} \cdot var (G)},

(1)

where $C = {c_{1}, \dots, c_{K}}$ is the set of clusters, $l_{c}$ is the length of cluster $c$ (in km-ln), and $l_{G}$ is the total length of the network (in km-ln). A smaller ( $T V_{n}$ ) indicates more homogeneous clustering results.

Second, the scatter of the MFD is quantitatively evaluated. The MFD is estimated by averaging link flows, $q_{i} (t)$ , and densities, $k_{i} (t)$ , weighted by the link length $l_{i}$ . ( 3 ). For a network comprising $N$ links, the network average flows, $Q (t)$ , and densities, $K (t)$ , are calculated using Equations 2 and 3, respectively.

Q (t) = \frac{\sum_{i = 1}^{N} q_{i} (t) l_{i}}{\sum_{i = 1}^{N} l_{i}},

(2)

K (t) = \frac{\sum_{i = 1}^{N} k_{i} (t) l_{i}}{\sum_{i = 1}^{N} l_{i}} .

(3)

The area covered by the MFD points is used as an indicator of the scatter of the MFD, reflecting the heterogeneity within the subnetwork. The smaller the area, the more homogeneous the subnetwork’s traffic state is. To achieve this, $Q (t)$ and $K (t)$ are scaled using their local minima and maxima to range between zero and one. The envelope of the points is then defined using the convex hull; the mean area of all MFDs serves as an indicator of homogeneity.

Connectivity

Connectivity is quantified using the indicator $C_{conn}$ , defined as the ratio of the length of the largest connected component within a cluster, $L_{cc}$ , to the total length of the cluster, $L_{total}$ . This relationship is expressed in Equation 4 below. A $C_{conn}$ value close to one indicates strong connectivity within the cluster.

C_{conn} = \frac{L_{cc}}{L_{total}}

(4)

Accuracy of Travel Time Estimation

Finally, homogeneous clustering implies that the links within a cluster exhibit similar speeds that are close to the cluster’s average speed. This can be useful for estimating travel times of trips completed within the network while maintaining high computational efficiency.

Given a trip from link $A$ to link $B$ along the route $R = {r_{A}, \dots, r_{B}}$ , the travel time is estimated by matching each consecutive link in to its corresponding cluster’s average speed during the time interval $t$ when the vehicle enters that link. Knowing the lengths of the links, the travel time on each link, as well as the total travel time, can then be estimated. The cluster’s average speed, $v_{c}^{t}$ , is calculated as expressed in Equation 5.

v_{c}^{t} = \frac{\sum_{i \in c} v_{i}^{t} q_{i}^{t}}{\sum_{i \in c} q_{i}^{t}},

(5)

where $q_{i}^{t}$ and $v_{i}^{t}$ denote the flow and speed of lane $i$ at time interval $t$ , respectively.

For a representative number of trips, the resulting margin of error in travel time estimation reflects the homogeneity of the clusters. The more homogeneous a cluster is, the closer the link speeds are to the cluster’s average speed, and the more accurate the travel time estimation becomes.

We evaluate the accuracy of the travel time estimation using a sample of trips completed in the simulation scenario. The sample size is set to 75% of all trips completed in the network and is randomly selected at each evaluation time. Estimated travel times are then compared with actual travel times obtained from the simulation. The error function used to assess accuracy is the root mean squared normalized error (RMSNE), which calculates the average of the squared relative errors between the estimated and actual travel times, as defined in Equation 6.

RMSNE = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(\frac{{\hat{y}}_{i} - y_{i}}{y_{i}})}^{2}},

(6)

where $y_{i}$ denotes the actual travel time, ${\hat{y}}_{i}$ is the estimated travel time, and $n$ is the number of observations.

Machine Learning Clustering Algorithms

Features Vector Extraction

The input to the ML algorithms is a vector space, where each link in the network is represented by a vector comprising multiple features that describe the link’s attributes and traffic state. Thus, the entire input space can be represented as an $m \times n$ matrix, where $m$ is the number of links and $n$ is the number of features describing each link. Traffic state features can include, among others, density, flow, speed, travel time, or a combination of these. Features can also include a link’s static attributes, such as type, speed limit, or geographic coordinates. These latter attributes are particularly useful for grouping nearby links, thereby enhancing the potential to form connected and compact clusters.

In this study, we selected as features the $x_{i}$ and $y_{i}$ coordinates of each link’s midpoint, along with the traffic density $k_{i} (t_{\max})$ , where $t_{\max}$ is the interval during which the density variance is highest. Unless otherwise stated, every time-dependent variable hereafter is also evaluated at $t_{\max}$ ; the argument $(t_{\max})$ is omitted for brevity. The logic is simple: achieving homogeneity during periods of peak variance suggests that the clusters will likely remain homogeneous at other times. Additionally, traffic density directly maps to traffic states and is therefore an appropriate proxy. These data are extracted from simulation outputs and are normalized using the feature’s global minima and maxima, as given in Equation 7.

x^{'} = \frac{x - \min (x)}{\max (x) - \min (x)}

(7)

When an adaptive signal controller is present, practitioners may wish to let the clustering sense its homogenizing effect by adding control-aware features. In our framework this is straightforward: simply extend each link’s feature vector with one or more control-sensitive terms, for example the green ratio $g_{i} / C$ , the queue length $z_{i}$ , or the degree of saturation DoS _i , which compactly combines prevailing flow with effective capacity. Any of these terms can be appended to, or substituted for, the density component $k_{i}$ whenever adaptive control is in place. Formally, the feature vector is $〈 x_{i}, y_{i}, k_{i}, g_{i} / C_{i}, z_{i} 〉$ .

If adaptive-control data are unavailable, the original three-dimensional vector $〈 x_{i}, y_{i}, k_{i} 〉$ remains sufficient, so the workflow is applicable in both controlled and uncontrolled settings without further modification.

Parameters Fine-tuning

ML algorithms often require specific hyperparameters as input. For example, many clustering algorithms need the number of clusters $K$ to be specified. However, density-based methods such as DBSCAN depend on parameters like $ϵ$ and $MinPts$ . These parameters depend on prior knowledge of the data structure, which may not be available in complex systems like urban networks, making parameter selection challenging without empirical examination. Additionally, feature weights can be adjusted to put emphasis on a particular link attribute. Therefore, the optimal selection of clustering hyperparameters and feature weights can be automated using an objective function that maximizes a multi-criteria objective. The optimization problem to find the best hyperparameters and weights, $θ$ , is given in Equation 8.

\max_{θ} w_{1} (1 - T V_{n} (θ)) + w_{2} C_{conn} (θ),

(8)

where $θ$ denotes the vector of clustering hyperparameters and feature weights, $w_{1}$ and $w_{2}$ are fixed non-negative weights, and $T V_{n} (θ)$ and $C_{conn} (θ)$ are computed from the clustering results obtained with $θ$ .

The weights $w_{1}$ and $w_{2}$ allow prioritization of homogeneity and connectivity, respectively, and are fixed based on application needs. By systematically varying $θ$ and assessing the objective function, the optimal configuration that maximizes the multi-criteria objective can be identified. Although clustering performance is sensitive to the selection of weights and hyperparameters, this approach ensures each method is evaluated under its best possible configuration, helping to achieve near-optimal performance for the specific scenario.

k-Means

Owing to its simplicity and efficiency, k-means is one of the most widely used clustering algorithms for scientific and industrial applications. Given a data set $X = {x_{1}, \dots, x_{N}} \subset R^{d}$ , k-means partitions the data set into $K$ clusters ( $1 \leq K \leq N$ ) with centers $μ = {μ_{1}, \dots, μ_{K}}$ , where each $μ_{j} \in R^{d}$ . The goal is to minimize the within-cluster sum of squares (WCSS) defined by Equation 9.

\begin{matrix} L (A, μ; K) = \sum_{j = 1}^{K} \sum_{i = 1}^{N} a_{ij} ∥ x_{i} - μ_{j} ∥^{2}, \\ a_{ij} \in {0, 1}, \sum_{j = 1}^{K} a_{ij} = 1 \forall i, \end{matrix}

(9)

where $A = (a_{ij})$ is the $N \times K$ binary assignment matrix ( 50 , 51 ).

Finding the exact global minimum of Equation 9 is NP-hard. k-means therefore alternates two steps until convergence ( 50 , 52 ). The algorithm starts with $K$ initial centers. Each point in $X$ is assigned to its nearest center based on the Euclidean distance in Equation 10. Centers are then updated by computing the centroids of their assigned points, as given in Equation 11.

a_{ij} = {\begin{matrix} 1, if ∥ x_{i} - μ_{j} ∥^{2} = \min_{k = 1, \dots, K} ∥ x_{i} - μ_{k} ∥^{2}, \\ 0, otherwise . \end{matrix}

(10)

μ_{j} = \frac{\sum_{i = 1}^{N} a_{ij} x_{i}}{\sum_{i = 1}^{N} a_{ij}} .

(11)

Because $L (A, μ; K)$ is non-increasing at every assignment–update step, the algorithm can never revisit a previous clustering and thus terminates after a finite number of iterations, $K^{N}$ at most. k-means is sensitive to its initial centers: different initializations can lead to different local optima. To make results more stable and increase the likelihood of reaching the global optimum, we use k-means++ to initialize the $K$ centers ( 51 ).

Gaussian Mixture Model

The GMM is a soft probabilistic clustering method. It assumes that a data set can be represented as a mixture of parameterized multivariate Gaussian distributions, each corresponding to a cluster. Given a data set $X = {x_{1}, \dots, x_{N}} \subset R^{d}$ , the probability of $X$ is expressed as a weighted sum of K multivariate Gaussian distributions in Equation 12.

\begin{matrix} P (X ∣ θ) = Π_{i = 1}^{N} (\sum_{k = 1}^{K} π_{k} N (x_{i} ∣ μ_{k}, Σ_{k})), \\ θ = {π_{k}, μ_{k}, Σ_{k} ∣ k = 1, \dots, K}, \\ 0 \leq π_{k} \leq 1, \sum_{k = 1}^{K} π_{k} = 1, \end{matrix}

(12)

where the mixing coefficients $π_{k}$ reflect the relative weight of each component, while $μ_{k}$ and $Σ_{k}$ are its mean vector and covariance matrix, respectively ( 53 , 54 ).

The objective is to find the parameter set $θ$ that maximizes the log-likelihood—equivalently, minimizes the negative log-likelihood—of the observed data in Equation 13 ( 55 ).

\log L (θ ∣ X) = \sum_{i = 1}^{N} \log (\sum_{k = 1}^{K} π_{k} N (x_{i} ∣ μ_{k}, Σ_{k})) .

(13)

Optimization challenges primarily arise from the logarithm of the sum inside Equation 13. To simplify the optimization, latent variables $Z = {z_{i}}_{i = 1}^{N}$ , are introduced. Each latent variable $z_{i}$ represents the cluster assignment for the $i$ -th data point, essentially indicating which of the $k$ Gaussian components $x_{i}$ most likely belongs to. The introduction of $Z$ transforms the complex log-sum optimization problem into a more manageable form by decoupling the Gaussian components during the computation of the likelihood given in Equation 14.

\log L (θ ∣ X, Z) = \sum_{i = 1}^{N} \log (π_{z_{i}} N (x_{i} ∣ μ_{z_{i}}, Σ_{z_{i}})) .

(14)

Because $Z$ is unknown, the expectation–maximization (EM) algorithm solves this iteratively, alternating between computing posterior probabilities (expectation) with the current values of $θ$ using Equation 15, and updating parameters (maximization) in $θ$ with the current values using Equations 16, 17, and 18.

p (z_{i} = k ∣ x_{i}, θ) = \frac{π_{k} N (x_{i} ∣ μ_{k}, Σ_{k})}{\sum_{j = 1}^{K} π_{j} N (x_{i} ∣ μ_{j}, Σ_{j})},

(15)

π_{k} = \frac{1}{N} \sum_{i = 1}^{N} p (z_{i} = k ∣ x_{i}, θ),

(16)

μ_{k} = \frac{\sum_{i = 1}^{N} x_{i} p (z_{i} = k ∣ x_{i}, θ)}{\sum_{i = 1}^{N} p (z_{i} = k ∣ x_{i}, θ)},

(17)

Σ_{k} = \frac{\sum_{i = 1}^{N} p (z_{i} = k ∣ x_{i}, θ) (x_{i} - μ_{k}) {(x_{i} - μ_{k})}^{⊤}}{\sum_{i = 1}^{N} p (z_{i} = k ∣ x_{i}, θ)} .

(18)

EM increases the log-likelihood monotonically until convergence to a local maximum. It is sensitive to the starting values of $θ$ ( 55 ). We therefore initialize the means with k-means++ and set the initial covariances to the empirical within-cluster covariances.

Density-Based Spatial Clustering of Applications with Noise

DBSCAN partitions the data set into contiguous dense regions (clusters) separated by sparse regions. The standard DBSCAN algorithm proposed by Ester et al. ( 56 ) characterizes dense versus sparse regions by means of two input parameters: neighborhood radius, $ϵ \in R^{+}$ , and minimum number of points, $MinPts \in Z^{+}$ . Given a data set $X = {x_{1}, \dots, x_{N}} \subset R^{d}$ , the $ϵ$ -neighborhood of $x_{i}$ is the set of points within a specified radius $ϵ$ around $x_{i}$ , as outlined in Equation 19.

N_{ϵ} (x_{i}) = {x_{q} \in X ∣ d (x_{i}, x_{q}) \leq ϵ},

(19)

where $d$ denotes the distance measure; here, we use the Euclidean distance $∥ x_{i} - x_{q} ∥_{2}$ .

If the -neighborhood of a point $x_{i}$ contains at least the minimum number of points, $| N_{ϵ} (x_{i}) | \geq MinPts$ , then $x_{i}$ is a core point. A point $x_{q}$ is directly density-reachable from a point $x_{i}$ if $x_{q} \in N_{ϵ} (x_{i})$ and $x_{i}$ is a core point.

The algorithm starts with an arbitrary point $x_{i}$ and retrieves its $ϵ$ -neighborhood. If it is a core point, it initiates a new cluster that is expanded to include all density-reachable points in a chain-effect. If additional core points are found in its $ϵ$ -neighborhood, the search is extended to include all points in the $ϵ$ -neighborhood of each newly discovered core point. Once no more core points are found in the expanded $ϵ$ -neighborhood, the cluster is complete, and the remaining points are searched for another core point to start a new cluster. A cluster can contain both core and non-core points. Any non-core point $x_{i}$ in a cluster is defined as a border point. Some non-core points may not belong to any cluster and are regarded as noise points ( 56 , 57 ).

Ester et al. ( 56 ) suggest using the $k$ -distance graph to estimate the value of $ϵ$ by plotting the distance from each data point to its $k = MinPts$ -th nearest neighbor in descending order. A good estimate of $ϵ$ is at the change of curvature of the $k$ -distance graph. In this work, we use this method to estimate $ϵ$ ; thus, only $MinPts$ requires fine-tuning.

Hierarchical Agglomerative Clustering

HAC builds a hierarchy of clusters by iteratively merging the pair of clusters with the smallest distance. Given a data set $X = {x_{1}, \dots, x_{N}} \subset R^{d}$ , each observation initially forms its own cluster, yielding $N$ clusters ${C_{1}, C_{2}, \dots, C_{N}}$ . The pairwise distances between clusters are then computed. Using Ward’s method, which minimizes the increase in WCSS, the distance between clusters $C_{i}$ and $C_{j}$ is given by Equation 20.

D (C_{i}, C_{j}) = \frac{| C_{i} | | C_{j} |}{| C_{i} | + | C_{j} |} ∥ {\bar{x}}_{i} - {\bar{x}}_{j} ∥^{2},

(20)

where $| C_{i} |$ is the number of objects in $C_{i}$ and ${\bar{x}}_{i}$ its centroid. The two clusters with the smallest distance are merged, the distance matrix is updated, and the process repeats until the specified number of clusters is reached ( 58 ).

HAC can also incorporate connectivity constraints that restrict merging to spatially adjacent clusters. This modification is expressed by Equation 21.

D_{(C_{i}, C_{j})}^{'} = {\begin{matrix} D (C_{i}, C_{j}), if \exists u \in C_{i}, v \in C_{j} : A_{uv} = 1, \\ \infty, otherwise, \end{matrix}

(21)

where $A$ is the binary adjacency matrix of the network ( 59 , 60 ).

The connectivity constraints are particularly useful for clustering geographical objects, for example, segments of a road network, because they ensure that only spatially contiguous clusters are merged.

State-of-the-art Clustering Methods

Snake Similarity

The snake method was initially introduced in Saeedmanesh and Geroliminis ( 35 ). In this method, each link in the network initiates a sequence termed a “snake” by iteratively adding one adjacent link to the current sequence of links that minimizes the variance of the sequence’s properties, such as traffic flow or density. The formulas for variance, $σ_{k}^{2}$ , and mean, ${\bar{x}}_{k}$ , at each iteration are given by Equations 22 and 23, respectively.

σ_{k}^{2} = \frac{(k - 1) σ_{k - 1}^{2} + (x_{k} - {\bar{x}}_{k}) (x_{k} - {\bar{x}}_{k - 1})}{k}

(22)

{\bar{x}}_{k} = \frac{(k - 1) {\bar{x}}_{k - 1} + x_{k}}{k}

(23)

These snakes are then used to estimate the similarity between link pairs in the network. The similarity metric, $w (i, j)$ , is determined by summing the intersections of corresponding snakes of equal size for each link, as shown in Equation 24.

w (i, j) = \sum_{k = 1}^{N} ϕ^{N - k} \times intersect (S_{ik}, S_{jk})

(24)

In this equation, $S_{ik}$ and $S_{jk}$ represent the snakes of size for links $i$ and $j$ , $N$ is the total number of links, and $ϕ$ is a weighting coefficient. The function $intersect$ calculates the number of common elements between two sets. A larger $ϕ$ gives more weight to spatial information, resulting in more compact clusters, as links added later in the snakes are weighted less owing to the decreasing weighting coefficient. After calculating the similarity between all link pairs, links are clustered using symmetric non-negative matrix factorization.

For this study, $ϕ$ is set to one, and the snake length is truncated to 10% of the total number of links in the network. This was also suggested by the authors of the original paper as an alternative to the weighting coefficient and to improve computational times. Additionally, to be consistent with the ML algorithms, the sequence’s property is traffic density, $k (t_{\max})$ , where $t_{\max}$ represents the time interval during which the highest density variance is observed.

Sensor-Bias-Corrected Community Detection

The SBCCD method introduced by Ambühl et al. ( 37 ) is a two-step process to partition urban traffic networks into homogeneous regions and accurately estimate MFDs. Unlike other methods studied, this approach inherently requires less data by relying solely on stationary sensor data. First, multiple candidate partitions are generated based on the network’s topology using a community detection algorithm based on random walks, ensuring that each region is geographically contiguous and connected. Second, for each candidate partition, MFDs are estimated using a re-sampling method with a correction applied to account for sensor placement biases along the links, as detailed in Equations 25 and 26.

q_{MFD} (t) = \frac{1}{J} \sum_{j = 1}^{J} \sum_{i \in M_{j}} \frac{q_{i} (t) l_{i}}{\sum l_{i}},

(25)

\begin{matrix} k_{MFD} (t) = \frac{1}{J} \sum_{j = 1}^{J} \sum_{i \in M_{j}} \frac{k_{i} (t) l_{i}}{\sum l_{i}}, \\ with M = \cup_{j \in J} M_{j}, M_{j} = {i \in M | \frac{j - 1}{J} < r_{i} < \frac{j}{J}}, \end{matrix}

(26)

where $M$ is the set of sensors, $J$ is the number of sensor groups formed according to their relative position, and $r_{i}$ denotes the relative position of sensor $i$ along the link of length $l_{i}$ .

The heterogeneity measure $α$ quantifies the spatial flow variability within a region, with lower values indicating a more homogeneous region. The heterogeneity of all candidate partitions is evaluated using Equation 27, and the candidate partition with the lowest mean $α$ is selected as the quasi-optimal solution.

α = \frac{Q_{0.5}}{Q_{1}},

(27)

where $Q_{1}$ is the capacity flow (i.e., the 97.5th percentile of observed flows) and $Q_{0.5}$ is the average flow of the top 50% highest-flow links within the region. In this method, Voronoi polygons around intersections are constructed per community to define region boundaries.

Case Studies

The first case study is a microscopic simulation model of Ingolstadt, Germany, covering an area of approximately 10 km by 10 km, Figure 2b. The road network, traffic control system, and traffic demand were modeled and calibrated in Simulation of Urban MObility (SUMO) software ( 61 , 62 ). The network extends for 1,248 km-ln and comprises around 12,000 links connected via 5,600 intersections. Speed limits across the network range from 20 km/h to 120 km/h, depending on the road category. All traffic signals in the network are modeled as multi-phase, fixed-time controlled. However, the impact of adaptive traffic signal control is still considered by averaging green time adaptations and setting a specific traffic light program for each signal on an hourly basis. The demand in the simulation is defined as time-dependent vehicle routes, with a 20% probability of route adaptation based on traffic conditions.

Figure 2.

Case studies. (a) Zurich network. (b) Ingolstadt network.

The second case study is a mesoscopic simulation model of Zurich, Switzerland, covering an area of approximately 13 km by 14 km, Figure 2a. The road network, traffic control system, and traffic demand were modeled and calibrated in SUMO. The network extends for 980 km-ln and comprises around 7,240 links connected via 3,530 intersections. Roads are split into 100 m segments, with traffic dynamics modeled on a segment-to-segment basis. These dynamics are constrained by link capacities and vehicles’ desired headways, allowing for realistic modeling of vehicle queues and individual vehicle movements. Traffic signal timings are defined uniquely for each intersection. For a typical intersection, the cycle time is set to 48 s and the green time to 16 s. These values are based on interviews with local operators. Intersection delays were scaled by a factor of 0.5 to account for intersection capacity and the average level of signal coordination. The time-dependent traffic demand is estimated using an endogenous procedure based on one year of data from loop detectors. Vehicles are periodically stochastically routed based on the current traffic state of the network. On entering the network, vehicles are assigned to a stochastic shortest path, with link speeds overestimated by a uniformly distributed random factor between 1 and 1.8.

Both simulation scenarios include a mix of urban and highway roads, featuring varying lane counts and speed limits. The networks exhibit an irregular topology (asymmetric structure) with hierarchical components and varying levels of spatial connectivity. The defined traffic demand results in different spatial and temporal congestion levels, directly influencing the observability and shape of the MFD, which makes the application of the methodology more challenging. A 24 h simulation period is used, with lane density and space-mean speed recorded every 5 min, a duration sufficient to comprise a few traffic signal cycles. Flow is estimated using the identity $Q = K \cdot V$ . Additionally, the total travel time and routes of all completed trips are recorded throughout the simulation.

Results

The methodological framework is implemented in Python using the sklearn package ( 63 ). We begin by examining the effects of $w_{1}$ and $w_{2}$ in Equation 8 on the homogeneity and connectivity of the clustering results. Using the Zurich network as an example, Figure 3 demonstrates that the weights $w_{1}$ and $w_{2}$ significantly influence the trade-off between $T V_{n}$ and $C_{conn}$ across all methods. Increasing $w_{1}$ , which emphasizes homogeneity, generally improves $T V_{n}$ but reduces $C_{conn}$ , whereas increasing $w_{2}$ , which emphasizes connectivity, enhances $C_{conn}$ at the expense of . Despite these shifts, the values of $T V_{n}$ and $C_{conn}$ for each clustering method remain within the same order of magnitude across all weight configurations. This suggests that the intrinsic characteristics of each method, such as DBSCAN’s tendency toward lower connectivity or HAC’s ability to achieve maximum connectivity, primarily determine performance, while the weights mainly fine-tune the balance between the two indicators. From this point forward, we assume $w_{1} = w_{2} = 1$ , particularly because our study does not focus on any specific application where the relative importance of homogeneity versus connectivity is predefined.

Figure 3.

Effect of weights in the objective function.

The clustering process begins with fine-tuning the hyperparameters, specifically the number of clusters, $ϵ$ , and $MinPts$ . This step also adjusts the weights assigned to traffic density and coordinates in the feature vector. In this study, we focus on varying the number of clusters and density weights.

Figure 4 presents the fine-tuning results for k-means clustering applied to the Zurich network. Figure 4a shows the objective value, computed as described in Equation 8, plotted against the number of clusters. As the number of clusters increases, the objective value initially increases, reflecting improved clustering performance through a better balance of homogeneity and connectivity. However, beyond a certain point, the curve flattens, indicating diminishing returns in performance gains. For a given traffic weight, the objective value generally remains within the same order of magnitude, suggesting that the traffic feature weight sets a baseline for clustering performance, with the number of clusters fine-tuning the results.

Figure 4.

k-means fine-tuning for Zurich network. (a) Objective evaluation. (b) TV_n evaluation. (c) C_conn evaluation.Note: TVn = normalized total variance; Cconn = connectivity indicator.

Interestingly, neither higher (brighter shades) nor lower (darker shades) density weights alone guarantee optimal performance. Instead, achieving a balance between homogeneity and connectivity is crucial. However, doubling the density weight achieves an optimal trade-off, improving connectivity and reducing variance without causing excessive cluster fragmentation.

Figure 4b examines $T V_{n}$ , a measure of homogeneity, which consistently decreases as the number of clusters increases. This behavior is expected, as smaller clusters inherently reduce internal variance. Higher density weights amplify this effect, emphasising traffic-based similarities among links, leading to more homogeneous clusters. Conversely, when the density weight is set to zero, clustering relies solely on geographical features. This results in the highest $T V_{n}$ values, since clusters are formed based on spatial proximity alone, failing to capture traffic-based relationships and thus producing more heterogeneous clusters.

Figure 4c focuses on $C_{conn}$ , a measure of connectivity. As the number of clusters increases, $C_{conn}$ generally declines owing to the higher likelihood of link fragmentation. However, lower density weights help counteract this decline by prioritising the grouping of spatially proximate links, thereby maintaining stronger connectivity. This highlights the trade-off between achieving connectivity and homogeneity in clustering. These observations hold consistently across all evaluated algorithms and scenarios.

Given the size of the networks and after empirical assessment, the number of clusters is capped at a maximum of 15. Taking Zurich as an example, Figure 4a shows that increasing the number of clusters beyond this point provides only marginal improvements in the combined objective value of homogeneity and connectivity. Moreover, it is not practically feasible to have many clusters with many state variables, as this increases the computational complexity of the modeling, making it harder to apply real-time traffic control. For each scenario, the number of clusters range from 2 to 15, while the density weight ranges from 1 to 10, resulting in a total of 117 configurations for each scenario and method. The coordinate weight is fixed at 1. This method of choosing hyperparameters allows easy tailoring of the clustering outcome to the specific requirements of the application. Feature weights and hyperparameters can be varied within different ranges, or weighting factors can be introduced in Equation 8 to assign greater importance to connectivity or homogeneity. The results of ML algorithm fine-tuning are illustrated in Table 1.

Table 1.

Parameters After Hyper-Parameter Optimization

	Ingolstadt		Zurich
	No. clusters	Traffic feature weight	No. clusters	Traffic feature weight
k-means	10	1	12	2
GMM	14	6	14	2
DBSCAN	6	2	6	1
HAC	12	1	8	1

Note: DBSCAN = density-based spatial clustering of applications with noise; GMM = Gaussian mixture models; HAC = hierarchical agglomerative clustering.

Based on the fine-tuned parameters in Table 1, Figures 5 and 6 illustrate the clustering results for Zurich and Ingolstadt, respectively. The clusters are distinguished by different colors, as shown in the respective legends. For Zurich, k-means and HAC identify distinct regions, while the GMM clustering outcomes appear to align with major traffic corridors. In contrast, the DBSCAN results tend to be fragmented and uneven in size, highlighting localized density variations. Similar patterns are observed for Ingolstadt; however, the results are also influenced by the network topology.

Figure 5.

Zurich clustering results. (a) k-means. (b) hierarchical agglomerative clustering (HAC). (c) Gaussian mixture models (GMM). (d) density-based spatial clustering of applications with noise (DBSCAN). (e) snake similarity. (f) sensor-bias-corrected community detection (SBCCD).

Figure 6.

Ingolstadt clustering results. (a) k-means. (b) hierarchical agglomerative clustering (HAC). (c) Gaussian mixture models (GMM). (d) density-based spatial clustering of applications with noise (DBSCAN). (e) snake similarity. (f) sensor-bias-corrected community detection (SBCCD).

Table 2 summarizes the indicators used to evaluate the clustering outcomes. In both cities, GMM and k-means show the best performance in regard to $T V_{n}$ , indicating better homogeneity. The low variation in density within the clusters directly influences the shape of the MFD, resulting in both methods showing lower scatter. Consequently, the mean area covered by the MFDs’ point envelopes is smaller. Figure 7 illustrates the estimated MFDs based on the results of GMM and k-means in Ingolstadt, where one can clearly see differences between the clusters’ maximum flow and density, as well as a well-aligned shape with low scatter. The scatter observed in the MFDs is attributed to fewer measurements (links) within the clusters.

Table 2.

Overview of Study Indicators (↑ Preferred when Higher; ↓ Preferred when Lower)

	$C_{conn} ↑$		$T V_{n} ↓$		MFDs mean area ↓		Travel time RMSNE ↓
	Ingolstadt	Zurich	Ingolstadt	Zurich	Ingolstadt	Zurich	Ingolstadt	Zurich
k-means	0.870	0.858	0.383	0.396	0.170	0.072	0.245	0.193
GMM	0.098	0.536	0.157	0.384	0.188	0.056	0.261	0.191
DBSCAN	0.588	0.781	0.582	0.862	0.243	0.044	0.270	0.181
HAC	1.0	1.0	0.694	0.720	0.236	0.092	0.244	0.308
Snake Similarity	0.870	0.909	0.984	0.989	0.153	0.025	0.282	0.184
SBCCD	1.0	1.0	0.967	0.989	0.205	0.047	0.269	0.186

Note:DBSCAN = density-based spatial clustering of applications with noise; GMM = Gaussian mixture models; HAC = hierarchical agglomerative clustering; MFD = macroscopic fundamental diagram; RMSNE = root mean squared normalized error; SBCCD = sensor-bias-corrected community detection.

Figure 7.

Comparative macroscopic fundamental diagrams (MFDs) for Ingolstadt. (a) Ingolstadt MFDs based on Gaussian mixture models (GMM) clusters. (b) Ingolstadt MFDs based on k-means clusters.

Although k-means and GMM show very similar performance in $T V_{n}$ , GMM is marginally better. However, this comes at the expense of very low connectivity, reflecting the contradictory goals mentioned in the introduction. k-means, on the other hand, demonstrates a good trade-off, providing results that are both homogeneous and connected. This, along with clusters aligning with major traffic corridors, makes GMM a good option for routing applications where the focus is on clusters with homogeneous traffic states, and connectivity is a secondary goal. In contrast, k-means is more suitable for traffic management schemes, such as perimeter control and congestion pricing.

With explicit connectivity constraints, HAC always guarantees connected clusters, where any link in a cluster can be reached from any other link. Nonetheless, this affects homogeneity, with $T V_{n}$ being in the middle range compared with the other algorithms. DBSCAN shows a moderate level of homogeneity, which is because the algorithm filters out congestion pockets (outliers), leaving the rest of the network clustered into one large group. This also explains the moderate connectivity compared with the other algorithms.

After estimating the similarity matrix, the number of clusters for the snake similarity method is determined using Equation 8 by varying the number of clusters. Although the snake similarity produces clusters that are nearly connected, the $T V_{n}$ value is considerably high. Interestingly, this does not align with the mean area of the MFDs, which falls within the range observed for other algorithms.

For the SBCCD method, clustering is performed based on the outcomes of virtual sensors in the simulation, where the sensor locations correspond to the actual positions of loop detectors in the real network. However, the evaluation is based on the full information from the simulation. Similar to HAC, this method results in connected clusters, but with a higher $T V_{n}$ .

Concerning travel time estimation error, all methods exhibit relatively similar performance. One possible explanation is that the errors in travel time estimation on individual links cancel out, resulting in the total travel time at the end of the trip being very close to the actual travel time. This suggests that the estimation of travel time may be insensitive to smaller variations in traffic state variance.

Additionally, while the estimate of $T V_{n}$ reflects the homogeneity of the traffic state at a single time interval, indicators such as the travel time estimate, RMSNE, and mean area of the MFDs capture the evolution of homogeneity within the cluster over time. This observation calls for further investigation, including incorporating the temporal dimension into the clustering process.

Generally, computationally efficient algorithms optimize resource usage, enabling faster, scalable, and cost-effective applications. ML algorithms have shown the best performance in this regard, with run times between 1 and 2 s for both cities. On the other hand, the snake similarity, despite producing homogeneous results, is extremely costly in regard to computational resources, taking 234,000 s (approximately 2.7 days) for Ingolstadt and 43,100 s (approximately 0.5 days) for Zurich. SBCCD has moderate run times of 4,080 s for Ingolstadt and 7,500 s for Zurich, balancing performance and efficiency. For static clustering performed offline or at low frequencies, computational efficiency may not be a significant concern. However, congestion patterns can change rapidly within short time frames. To capture these dynamic patterns effectively, time-dynamic clustering with efficient computation becomes essential. In such cases, repeating a static clustering method at each time interval could be a viable solution, though it comes with increased computational demands. Many MFD-based real-time applications operate on control cycles of 3–5 min, requiring updated clustering results within each cycle to respond to evolving traffic conditions ( 47 ). Efficient algorithms ensure that these updates can be performed within the limited time available.

Additionally, the scalability of these algorithms becomes evident when considering the network sizes: Ingolstadt’s network consists of approximately 12,000 links, while Zurich’s has 7,240 links. Despite the larger size of Ingolstadt’s network, ML algorithms maintain low computational times, demonstrating their scalability in handling larger data sets efficiently. In contrast, the snake method’s computational time increases significantly with network size, limiting its practicality for large-scale applications.

Practical Applicability and Generalizability

Table 3 condenses the quantitative findings from the Results Section into a practitioner-oriented matrix. Scores for homogeneity ( $H$ ), connectivity ( $C$ ), and runtime ( $RT$ ) are first averaged over both cities, normalized between 0.00 and 1.00, and inverted when a lower raw value denotes better performance. The final column proposes typical MFD-based and, more broadly, traffic management applications that are likely to benefit from each algorithm’s particular strengths. Our findings show a clear segmentation of use cases.

Table 3.

Algorithm Suitability Matrix

	$H$	$C$	$RT$	Overall trade-off	Recommended application
k-means	0.83	0.80	1.00	Balanced	Distance-based pricing, parking-based pricing, mean trip length estimation.
GMM	1.00	0.00	1.00	High $H$ , lower C	Route guidance, travel -time estimation/maps.
DBSCAN	0.37	0.54	1.00	Hotspot-oriented	Incident detection, Congestion pocket ID.
HAC	0.38	1.00	1.00	Guaranteed connectivity	Cordon-based pricing, perimeter or hierarchical control, regional modelling.
Snake Similarity	0.27	0.84	0.00	Very high cost	Offline scenario analysis (max. homogeneity).
SBCCD	0.28	1.00	0.96	High $H$ & $C$ , Moderate $RT$	Offline analysis, limited data availability scenario.

Note: C = ; connectivity; DBSCAN = density-based spatial clustering of applications with noise; GMM = Gaussian mixture models; H = homogeneity; HAC = hierarchical agglomerative clustering; ID = ; RT = runtime; SBCCD = sensor-bias-corrected community detection.

GMM, which achieves the highest $H$ , is well suited to route guidance and macroscopic travel time estimation, where internal traffic-state consistency outweighs strict spatial contiguity. k-means balances all three criteria, making it a robust choice for dynamic distance-based or parking-based pricing, where full connectivity is helpful but not essential to user comprehension and acceptance. HAC, by construction, guarantees connectivity and therefore excels in interventions that require contiguous zones, such as cordon-based pricing, perimeter control, or hierarchical signal control (i.e., simultaneous control of perimeter and local junctions). DBSCAN’s ability to expose anomalies renders it valuable for incident detection and congestion pocket identification. Finally, although the snake similarity and SBCCD methods exhibit high $H$ and $C$ , their prohibitive run-times confine them to offline strategic studies rather than real-time control.

Although Zurich and Ingolstadt differ considerably in size and network density, both are compact European cities with strong public-transport orientations. For networks that deviate more markedly from our case studies, we believe two factors become decisive:

Topology: grid-like North American downtowns exhibit inherently higher connectivity, diminishing HAC’s relative advantage, whereas highly radial or hierarchical networks may require additional constraints to preserve connectivity.

Traffic regime: cities with pronounced directional peaks (e.g., inbound in the morning, outbound in the evening) tend to inflate intra-cluster variance; in such cases, increasing the number of clusters can help maintain homogeneity.

Practitioners are therefore advised to (i) select features appropriate to their specific problem and data availability, (ii) fine-tune hyperparameters and feature weights, and (iii) calibrate $w_{1}$ and $w_{2}$ to the intended application. Finally, if the clustering results are either mildly disconnected or connected yet non-compact, a boundary adjustment post-processing routine can be applied to enforce spatial coherence needed for application of the perimeter control and regional modeling. These steps, and their associated limitations, are explicitly revisited in the following conclusion.

Conclusion

This paper contributes to the literature on urban network clustering by providing a comparative analysis of ML clustering algorithms. Our study assesses the trade-offs among traffic homogeneity, spatial connectivity, and computational cost for practical applications. To ensure the study’s relevance, we conduct this comparison on real networks with calibrated traffic simulations. This approach helps mitigate case-specific effects and strengthens the overall conclusions.

The results show that ML algorithms emerge as promising alternatives to the more complex methods found in the literature because of their performance and efficiency. We apply these ML algorithms to two different networks and observe varied performance in regard to homogeneity and connectivity. Specifically, GMM produces highly homogeneous clusters, making it suitable for routing and macroscopic travel time estimation, yet its clusters are often not well connected or compact. In contrast, HAC with connectivity constraints ensures the formation of connected clusters, which benefits traffic management schemes such as parameter control and congestion pricing. k-means demonstrates balanced performance between homogeneity and connectivity. Although DBSCAN does not perform well on these indicators, it effectively highlights congestion pockets that may warrant further investigation by transport planners.

These insights are likely to hold in cities whose network descriptors fall within the ranges examined here; nonetheless, networks with strongly divergent topologies should first re-tune the homogeneity–connectivity parameters and verify stability through a sensitivity analysis. Extending the empirical validation to grid-based and polycentric networks is a key direction for future work. Future work could also focus on validating these findings with empirical traffic data to enhance their practical relevance. This includes exploring scenarios with limited data availability, such as those relying on loop detectors with limited spatial coverage. Finally, future studies should incorporate the temporal dimension of traffic dynamics, develop post-processing algorithms that correct connectivity and adjust cluster boundaries, and refine performance indicators, particularly the mean area of MFDs.

We make the program code used in this study publicly available at: https://github.com/alayasreih/NetworkClustering.

Footnotes

Acknowledgements

The authors acknowledge Transcality AG for sharing a calibrated simulation model of Zurich for research purposes. We also thank Lukas Ambühl for providing the code for the Sensor-Bias-Corrected Community Detection clustering method.

Author Contributions

The authors confirm contribution to the paper as follows: study conception and design: Y. Alayasreih, G. Tilg, F. Dandl, M. Keyvan-Ekbatani, K. Bogenberger; data collection: Y. Alayasreih, G. Tilg; analysis and interpretation of results: Y. Alayasreih, G. Tilg, F. Dandl; draft manuscript preparation: Y. Alayasreih, G. Tilg, F. Dandl. All authors reviewed the results and approved the final version of the manuscript.

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Yamam Alayasreih acknowledges funding by the German Federal Ministry for Education and Research in the framework of the project MCube SUE. Mehdi Keyvan-Ekbatani acknowledges the Royal Society New Zealand – Catalyst : Seeding grant (E8168).

ORCID iDs

Yamam Alayasreih

Gabriel Tilg

Florian Dandl

Mehdi Keyvan-Ekbatani

Klaus Bogenberger

References

Daganzo

C. F.

Urban Gridlock: Macroscopic Modeling and Mitigation Approaches. Transportation Research Part B: Methodological, Vol. 41, 2007, pp. 49–62. https://doi.org/10.1016/j.trb.2006.03.001.

Geroliminis

Daganzo

C. F.

Macroscopic Modeling of Traffic in Cities. Presented at 86th Annual Meeting of the Transportation Research Board, Washington, D.C., 2007.

Geroliminis

Daganzo

C. F.

Existence of Urban-Scale Macroscopic Fundamental Diagrams: Some Experimental Findings. Transportation Research Part B: Methodological, Vol. 42, 2008, pp. 759–770. https://doi.org/10.1016/j.trb.2008.02.002.

Daganzo

C. F.

Geroliminis

An Analytical Approximation for the Macroscopic Fundamental Diagram of Urban Traffic. Transportation Research Part B: Methodological, Vol. 42, 2008, pp. 771–781. https://doi.org/10.1016/j.trb.2008.06.008.

Leclercq

Geroliminis

Estimating MFDs in Simple Networks with Route Choice. Transportation Research Part B: Methodological, Vol. 57, 2013, pp. 468–484. https://doi.org/10.1016/j.trb.2013.05.005.

Laval

J. A. F

. Castrillón. Stochastic Approximations for the Macroscopic Fundamental Diagram of Urban Networks. Transportation Research Part B: Methodological, Vol. 81, 2015, pp. 904–916. https://doi.org/10.1016/j.trb.2015.09.002.

Tilg

Ambühl

Batista

S. F.

Menéndez

Leclercq

Busch

From Corridor to Network Macroscopic Fundamental Diagrams: A Semi-Analytical Approximation Approach. Transportation Science, Vol. 57, 2023, pp. 1115–1133. https://doi.org/10.1287/TRSC.2022.0402.

Buisson

Ladier

Exploring the Impact of Homogeneity of Traffic Measurements on the Existence of Macroscopic Fundamental Diagrams. Transportation Research Record: Journal of the Transportation Research Board, 2009. 2124: 127–136.

Ambühl

Menendez

Data Fusion Algorithm for Macroscopic Fundamental Diagram Estimation. Transportation Research Part C: Emerging Technologies, Vol. 71, 2016, pp. 184–197. https://doi.org/10.1016/j.trc.2016.07.013.

10.

Loder

Ambühl

Menendez

Axhausen

K. W.

Understanding Traffic Capacity of Urban Networks. Scientific Reports, Vol. 9, 2019, p. 16283. https://doi.org/10.1038/s41598-019-51539-5.

11.

Mazloumian

Geroliminis

Helbing

The Spatial Variability of Vehicle Densities as Determinant of Urban Network Capacity. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, Vol. 368, 2010, pp. 4627–4647. https://doi.org/10.1098/rsta.2010.0099.

12.

Knoop

V. L.

Lint

H. V.

Hoogendoorn

S. P.

Traffic Dynamics: Its Impact on the Macroscopic Fundamental Diagram. Physica A: Statistical Mechanics and its Applications, Vol. 438, 2015, pp. 236–250. https://doi.org/10.1016/j.physa.2015.06.016.

13.

Geroliminis

Sun

Properties of a Well-Defined Macroscopic Fundamental Diagram for Urban Traffic. Transportation Research Part B: Methodological, Vol. 45, 2011, pp. 605–617. https://doi.org/10.1016/j.trb.2010.11.004.

14.

Daamen

Hoogendoorn

Hoogendoorn-Lanser

Qian

Investigating the Shape of the Macroscopic Fundamental Diagram Using Simulation Data. Transportation Research Record: Journal of the Transportation Research Board, 2010. 2161: 40–48.

15.

Leclercq

Parzani

Knoop

V. L.

Amourette

Hoogendoorn

S. P.

Macroscopic Traffic Dynamics with Heterogeneous Route Patterns. Transportation Research Procedia, Vol. 7, 2015, pp. 631–650. https://doi.org/10.1016/j.trpro.2015.06.033.

16.

Gayah

V. V.

Daganzo

C. F.

Clockwise Hysteresis Loops in the Macroscopic Fundamental Diagram: An Effect of Network Instability. Transportation Research Part B: Methodological, Vol. 45, 2011, pp. 643–655. https://doi.org/10.1016/j.trb.2010.11.006.

17.

Zhang

Garoni

T. M.

de Gier

A Comparative Study of Macroscopic Fundamental Diagrams of Arterial Road Networks Governed by Adaptive Traffic Signal Systems. Transportation Research Part B: Methodological, Vol. 49, 2013, pp. 1–23. https://doi.org/10.1016/j.trb.2012.12.002.

18.

Knoop

V. L.

de Jong

Hoogendoorn

S. P.

Network Fundamental Diagrams and Their Dependence on Network Topology. In Traffic and Granular Flow’13 ( Chraibi

Boltes

Schadschneider

Seyfried

, eds.), Springer, Cham, 2015, pp. 585–590. https://doi.org/10.1007/978-3-319-10629-8_66.

19.

Mühlich

Gayah

V. V.

Menendez

Use of Microsimulation for Examination of Macroscopic Fundamental Diagram Hysteresis Patterns for Hierarchical Urban Street Networks. Transportation Research Record: Journal of the Transportation Research Board, 2015. 2491: 117–126.

20.

Keyvan-Ekbatani

Gao

Gayah

V. V.

Knoop

V. L.

Traffic-Responsive Signals Combined with Perimeter Control: Investigating the Benefits. Transportmetrica B: Transport Dynamics, Vol. 7, 2019, pp. 1402–1425. https://doi.org/10.1080/21680566.2019.1630688.

21.

Yildirimoglu

Ramezani

Geroliminis

Equilibrium Analysis and Route Guidance in Large-Scale Networks with MFD Dynamics. Transportation Research Procedia, Vol. 9, 2015, pp. 185–204. https://doi.org/10.1016/j.trpro.2015.07.011.

22.

Keyvan-Ekbatani

Kouvelas

Papamichail

Papageorgiou

Exploiting the Fundamental Diagram of Urban Networks for Feedback-Based Gating. Transportation Research Part B: Methodological, Vol. 46, 2012, pp. 1393–1403. https://doi.org/10.1016/j.trb.2012.06.008.

23.

Zheng

Geroliminis

Modeling and Optimization of Multimodal Urban Networks with Limited Parking and Dynamic Pricing. Transportation Research Part B: Methodological, Vol. 83, 2016, pp. 36–58. https://doi.org/10.1016/j.trb.2015.10.008.

24.

Loder

Bliemer

M. C.

Axhausen

K. W.

Optimal Pricing and Investment in a Multi-Modal City — Introducing a Macroscopic Network Design Problem Based on the MFD. Transportation Research Part A: Policy and Practice, Vol. 156, 2022, pp. 113–132. https://doi.org/10.1016/j.tra.2021.11.026.

25.

Zheng

Geroliminis

Area-Based Equitable Pricing Strategies for Multimodal Urban Networks with Heterogeneous Users. Transportation Research Part A: Policy and Practice, Vol. 136, 2020, pp. 357–374. https://doi.org/10.1016/j.tra.2020.04.009.

26.

Zheng

Geroliminis

On the Distribution of Urban Road Space for Multimodal Congested Networks. Transportation Research Part B: Methodological, Vol. 57, 2013, pp. 326–341. https://doi.org/10.1016/j.trb.2013.06.003.

27.

Dandl

Tilg

Rostami-Shahrbabaki

Bogenberger

Network Fundamental Diagram Based Routing of Vehicle Fleets in Dynamic Traffic Simulations. Proc., 2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC), Rhodes, Greece, IEEE, New York, 2020, pp. 1–8. https://doi.org/10.1109/ITSC45102.2020.9294204; https://ieeexplore.ieee.org/document/9294204/.

28.

Zhang

Rempe

Dandl

Tilg

Kraus

Bogenberger

Network Fundamental Diagram Based Dynamic Routing in a Clustered Network. Proc., 2023 8th International Conference on Models and Technologies for Intelligent Transportation Systems, MT-ITS 2023, Nice, France, IEEE, New York, 2023. https://doi.org/10.1109/MT-ITS56129.2023.10241650.

29.

Beojone

C. V.

Zhu

Sirmatel

I. I.

Geroliminis

A Hierarchical Control Framework for Vehicle Repositioning in Ride-Hailing Systems. Transportation Research Part C: Emerging Technologies, Vol. 168, 2024, p. 104717. https://doi.org/10.1016/j.trc.2024.104717; https://linkinghub.elsevier.com/retrieve/pii/S0968090X24002389.

30.

Mariotte

Leclercq

Batista

S. F.

Krug

Paipuri

Calibration and Validation of Multi-Reservoir MFD Models: A Case Study in Lyon. Transportation Research Part B: Methodological, Vol. 136, 2020, pp. 62–86. https://doi.org/10.1016/j.trb.2020.03.006.

31.

Takayasu

. Regional Traffic and Trip Characteristics Simulation and Applications for MFD Models Calibration. PhD thesis. Infrastructures de transport. Université de Lyon, 2022. https://theses.hal.science/tel-04041469.

32.

Bellocchi

Geroliminis

Unraveling Reaction-Diffusion-Like Dynamics in Urban Congestion Propagation: Insights from a Large-Scale Road Network. Scientific Reports, Vol. 10, 2020, p. 4876. https://doi.org/10.1038/s41598-020-61486-1.

33.

Haddad

Zheng

Adaptive Perimeter Control for Multi-Region Accumulation-Based Models with State Delays. Transportation Research Part B: Methodological, Vol. 137, 2020, pp. 133–153. https://doi.org/10.1016/j.trb.2018.05.019.

34.

Geroliminis

On the Spatial Partitioning of Urban Transportation Networks. Transportation Research Part B: Methodological, Vol. 46, 2012, pp. 1639–1656. https://doi.org/10.1016/j.trb.2012.08.005.

35.

Saeedmanesh

Geroliminis

Clustering of Heterogeneous Networks with Directional Flows Based on “Snake” Similarities. Transportation Research Part B: Methodological, Vol. 91, 2016, pp. 250–269. https://doi.org/10.1016/j.trb.2016.05.008.

36.

Chiu

Y. C.

Chen

A Network Partitioning Algorithmic Approach for Macroscopic Fundamental Diagram-Based Hierarchical Traffic Network Management. IEEE Transactions on Intelligent Transportation Systems, Vol. 19, 2018, pp. 1130–1139. https://doi.org/10.1109/TITS.2017.2713808.

37.

Ambühl

Loder

Zheng

Axhausen

K. W.

Menendez

Approximative Network Partitioning for MFDs from Stationary Sensor Data. Transportation Research Record: Journal of the Transportation Research Board, 2019. 2673: 94–103.

38.

Batista

S. F.

Lopez

Menéndez

On the Partitioning of Urban Networks for MFD-Based Applications Using Gaussian Mixture Models. Proc., 2021 7th International Conference on Models and Technologies for Intelligent Transportation Systems, MT-ITS 2021, Heraklion, Greece, IEEE, New York, 2021. https://doi.org/10.1109/MT-ITS49943.2021.9529296.

39.

Zhou

Lin

A Dynamic Network Partition Method for Heterogenous Urban Traffic Networks. Proc., International IEEE Conference on Intelligent Transportation Systems, ITSC, Anchorage, AK, IEEE, New York, 2012, pp. 820–825. https://doi.org/10.1109/ITSC.2012.6338712.

40.

Luo

Geroliminis

Empirical Observations of Congestion Propagation and Dynamic Partitioning with Probe Data for Large-Scale Systems. Transportation Research Record: Journal of the Transportation Research Board, 2014. 2422: 1–11.

41.

Pascale

Mavroeidis

Lam

H. T.

Spatiotemporal Clustering of Urban Networks: Real Case Scenario in London. Transportation Research Record: Journal of the Transportation Research Board, 2015. 2491: 81–89.

42.

Rempe

Bogenberger

Feature Engineering for Data-Driven Traffic State Forecast in Urban Road Networks. Presented at 98th Annual Meeting of the Transport Research Board, Washington, D.C., 2019.

43.

Saeedmanesh

Geroliminis

Dynamic Clustering and Propagation of Congestion in Heterogeneously Congested Urban Traffic Networks. Transportation Research Part B: Methodological, Vol. 105, 2017, pp. 193–211. https://doi.org/10.1016/j.trb.2017.08.021.

44.

Lopez

Leclercq

Krishnakumari

Chiabaut

Van Lint

Revealing the Day-to-Day Regularity of Urban Congestion Patterns with 3D Speed Maps. Scientific Reports, Vol. 7, 2017, p. 14029. https://doi.org/10.1038/s41598-017-14237-8.

45.

Casadei

Bertrand

Gouin

de Wit

C. C.

Aggregation and Travel Time Calculation over Large Scale Traffic Networks: An Empiric Study on the Grenoble City. Transportation Research Part C: Emerging Technologies, Vol. 95, 2018, pp. 713–730. https://doi.org/10.1016/j.trc.2018.07.033.

46.

Bellocchi

Latora

Geroliminis

Dynamical Efficiency for Multimodal Time-Varying Transportation Networks. Scientific Reports, Vol. 11, 2021, p. 23065. https://doi.org/10.1038/s41598-021-02418-5.

47.

Jiang

Keyvan-Ekbatani

Ngoduy

Partitioning of Urban Networks with Polycentric Congestion Pattern for Traffic Management Policies: Identifying Protected Networks. Computer-Aided Civil and Infrastructure Engineering, Vol. 38, 2023, pp. 508–527. https://doi.org/10.1111/mice.12895.

48.

Johari

Keyvan-Ekbatani

Leclercq

Ngoduy

Mahmassani

H. S.

Macroscopic Network-Level Traffic Models: Bridging Fifty Years of Development Toward the Next Era. Transportation Research Part C: Emerging Technologies, Vol. 131, 2021, p. 103334. https://doi.org/10.1016/j.trc.2021.103334; https://www.sciencedirect.com/science/article/pii/S0968090X21003375.

49.

Estivill-Castro

Why so Many Clustering Algorithms. ACM SIGKDD Explorations Newsletter, Vol. 4, 2002, pp. 65–75. https://doi.org/10.1145/568574.568575; https://dl.acm.org/doi/10.1145/568574.568575.

50.

Lloyd

S. P.

Least Squares Quantization in PCM. IEEE Transactions on Information Theory, Vol. 28, 1982, pp. 129–137. https://doi.org/10.1109/TIT.1982.1056489.

51.

Arthur

Vassilvitskii

k-means++: The Advantages of Careful Seeding. Proc., eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’07, Society for Industrial and Applied Mathematics, New Orleans, LA, 2007, pp. 1027–1035.

52.

Forgy

E. W.

Cluster Analysis of Multivariate Data: Efficiency vs Interpretability of Classifications. Biometrics, Vol. 21, 1965, pp. 768–780.

53.

Melnykov

Maitra

Finite Mixture Models and Model-Based Clustering. Statistics Surveys, Vol. 4, 2010, pp. 80–116. https://doi.org/10.1214/09-SS053; https://projecteuclid.org/journals/statistics-surveys/volume-4/issue-none/Finite-mixture-models-and-model-based-clustering/10.1214/09-SS053.full.

54.

Banfield

J. D.

Raftery

A. E.

Model-Based Gaussian and Non-Gaussian Clustering. Biometrics, Vol. 49, 1993, pp. 803–831. https://doi.org/10.2307/2532201; https://www.jstor.org/stable/2532201?origin=crossref.

55.

Bilmes

J. A.

A Gentle Tutorial of the EM Algorithm and Its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models. International Computer Science Institute, Berkeley, CA, 1998.

56.

Ester

Kriegel

H.-P.

Sander

A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. Proc., KDD’96: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, Portland, OR, AAAI Press, 1996, pp. 226–231. https://dl.acm.org/doi/10.5555/3001460.3001507.

57.

Schubert

Sander

Ester

Kriegel

H. P.

DBSCAN Revisited, Revisited: Why and How You Should (Still) Use DBSCAN. ACM Transactions on Database Systems, Vol. 42, No. 3, 2017, pp. 1–21. https://doi.org/10.1145/3068335.

58.

Ward

J. H.

Hierarchical Grouping to Optimize an Objective Function. Journal of the American Statistical Association, Vol. 58, 1963, pp. 236–244. https://doi.org/10.1080/01621459.1963.10500845.

59.

Ferligoj

Batagelj

Clustering with Relational Constraint. Psychometrika, Vol. 47, 1982, pp. 413–426. https://doi.org/10.1007/BF02293706; http://link.springer.com/10.1007/BF02293706.

60.

Randriamihamison

Vialaneix

Neuvial

Applicability and Interpretability of Ward’s Hierarchical Agglomerative Clustering With or Without Contiguity Constraints. Journal of Classification, Vol. 38, 2021, pp. 363–389. https://doi.org/10.1007/s00357-020-09377-y.

61.

Langer

Harth

Preitschaft

Kates

Bogenberger

Calibration and Assessment of Urban Microscopic Traffic Simulation as an Environment for Testing of Automated Driving. Proc., IEEE International Intelligent Transportation Systems Conference, ITSC, Indianapolis, IN, IEEE, New York, September 19–22, 2021, pp. 3210–3216. https://doi.org/10.1109/ITSC48978.2021.9564743.

62.

Harth

Langer

Bogenberger

Automated Calibration of Traffic Demand and Traffic Lights in SUMO Using Real-World Observations. SUMO Conference Proceedings, Vol. 2, 2021, pp. 133–148.

63.

Pedregosa

Varoquaux

Gramfort

Michel

Thirion

Grisel

Blondel

, et al. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, Vol. 12, 2011, pp. 2825–2830. http://scikit-learn.sourceforge.net.

Cross-comparison of Network Clustering Methods: Potential Macroscopic Fundamental Diagram (MFD)-based Applications

Abstract

Keywords

Literature Review

Methodology

Performance Indicators

Homogeneity

Connectivity

Accuracy of Travel Time Estimation

Machine Learning Clustering Algorithms

Features Vector Extraction

Parameters Fine-tuning

k-Means

Gaussian Mixture Model

Density-Based Spatial Clustering of Applications with Noise

Hierarchical Agglomerative Clustering

State-of-the-art Clustering Methods

Snake Similarity

Sensor-Bias-Corrected Community Detection

Case Studies

Results

Practical Applicability and Generalizability

Conclusion

Footnotes

Acknowledgements

Author Contributions

Declaration of Conflicting Interests

Funding

ORCID iDs

References