Unveiling industrial functional networks in megacity regions: A spatial and network-based analysis of agglomeration economies

Abstract

Urban areas function as complex systems that support a diverse range of socioeconomic activities, with industries clustering in specific regions according to different sectors. Although many studies describe and map the spatial distribution of industrial activities, few examine the complex interactions between industrial functions. This study presents a data-driven perspective to investigate the connections among different industrial functions, focusing on how their co-location and sectoral relatedness indicate relationships. Functional keywords are extracted from the names of Points of Interest (POIs) to construct an industrial functional network. In this network, nodes represent keywords, and edges measure spatial co-occurrence within clusters and functional relatedness between industrial sectors. The network’s structure is analyzed using selected metrics, and groups of industrial functions are identified through a community detection algorithm, along with their spatial distribution. Using the Chinese Greater Bay Area (GBA) as the study area, keywords with high degree and strength such as electronic and food, are identified as central to key manufacturing processes and closely connected to local industrial activities. Additionally, the strength of agglomeration economies is linked to geographic externalities. Industrial activities with strong connections to diverse sectors are mainly located in central areas of major cities. The findings provide practical insights into spatial differences between specialized and diversified industrial economies, highlighting regional inequalities.

Keywords

industrial agglomeration network analysis POIs manufacturing sectors spatial-functional organizations

Introduction

Urban spaces are complex systems hosting a variety of socioeconomic activities, where industries often cluster in specific areas forming industrial agglomerations based on different functions. Spatial concentration improves industrial efficiency and increases productivity in urban environments (Ellison et al., 2010; Rosenthal and Strange, 2003). Manufacturing industries benefit from clustering by reducing transportation and service costs while enhancing knowledge sharing and collaboration among similar firms, strengthening urban industrial networks. Explicitly, agglomerations in the early centralized stages of development typically follow a core-periphery pattern, where a dominant core concentrates economic activities and peripheral areas provide complementary functions. Subsequently, polycentric urban regions are characterized by multiple centers that give rise to more distributed and interconnected spatial structures, rather than reliance on a single dominant core (Meijers, 2005). To analyze agglomeration mechanisms, the literature focuses on two main dimensions: spatial and functional knowledge (O’Donoghue and Gleave, 2004). The spatial dimension examines geographic patterns and concentration of economic activities around core and peripheral areas, while the functional dimension highlights the interconnectedness of economic activities based on shared characteristics and complementary roles (Vogiatzoglou and Tsekeris, 2013).

Industrial activities within agglomeration economies are diverse, bringing together various professionals and supporting a wide range of manufacturing functions (Neffke et al., 2018). A common feature of agglomeration externalities is that close links between different industrial sectors can be observed within large metropolitan areas (Jacobs, 2016). In recent decades, there has been growing interest in better understanding industrial diversification and its effects on urban economies. This includes identifying the distinct functional and spatial roles that agglomeration economies play within cities and urban regions. To address this, some studies have used aggregated economic data at regional levels, such as industrial value, activity patterns, and employment statistics (Guillain and Le Gallo, 2010; Wu et al., 2022). These studies typically take a discrete spatial approach to analyze the functional categories and their spatial patterns. However, a widely accepted view today is that industrial activities within agglomerations are interconnected and often involve complex product chain systems, especially in the context of regionalization (Yu et al., 2023). There is still a notable gap in understanding how different industrial functions relate to each other, particularly how their co-location and sectoral convergence indicate shared roles or connections.

Conventional approaches and statistical data from previous studies may not adequately capture the internal relationships among industrial functions. Rapid urbanization has led to more complex and diverse socioeconomic activities that often go beyond the limits of traditional, digit-based taxonomies used to classify professional sectors (Raikov et al., 2019; Yu et al., 2022). These existing categories may not fully reflect the range and complexity of local industries, especially in dynamic urban areas where new economic activities develop quickly. While many current case studies focus on describing and mapping the distribution of industrial activities, there is limited research examining the multifaceted interactions of industrial functions within urban agglomerations.

This study adopts a data-driven approach to examine functional relationships within industrial agglomerations by integrating geospatial data with NLP-based network analysis. Functional keywords are extracted from the names of points of interest (POIs) to represent different industrial functions. Using these keywords, an industrial functional network is constructed, where nodes correspond to keywords and edges represent spatial co-occurrence within agglomerations and sectoral convergence. Sectoral convergence is measured using pre-trained language models to capture the relationship or synergy between industrial functions, highlighting collaborative integration across sectors. The network’s structural properties are assessed through selected metrics. Finally, community detection is applied to identify groups of industrial functions, and their spatial distribution within the Greater Bay Area (GBA) in southern China is analyzed and described. Three research questions are answered by this study:

• What functional keywords can be identified from industry-related POIs? Meanwhile, do spatial co-occurrence patterns reveal strong relationships through association rule mining?

• Powered by pre-trained language models, how can a network of functional keywords be constructed? In this sense, what are the main structural characteristics?

• By applying community detection, can distinct groups of closely related industrial functions be identified, and how do their spatial and functional patterns differ across these groups?

Literature review

Agglomeration economies in geographic studies are typically examined through two main dimensions that are spatial and functional behaviors. Explicitly, the spatial dimension addresses the geographic clustering behaviors of economic activities, for example, how proximity lowers transportation costs, as seen in some urban industrial centers (O’Donoghue and Gleave, 2004). Existing literature has proposed various metrics to reflect the concentration level of industrial activities, mostly regarding the discrete-space model (Yu et al., 2022). In this sense, the locational Gini coefficient developed by Krugman (1991) is one of the common metrics used to assess the concentration of specialized employment or patents observed in a local industrial market. Subsequently, the location quotient (LQ) is introduced to quantify the concentration degree of economic activities, in which the mechanism is to calculate the ratio between local and global percentage of socioeconomic data within a particular industrial sector (Glaeser et al., 1992). Fu et al. (2022) employ a spatial Durbin model on panel data from 278 Chinese cities to investigate the impact of urban agglomeration—measured with both scale and distance factors—on urban economic development. Besides, continuous-space approaches have also been applied to identify agglomeration economies. Verstraten et al. (2019) examine the spatial scope of agglomeration economies by constructing concentric rings around geographic centroids. Using density-based cluster mapping and simultaneous equations within each cluster, Singh (2022) identified spatial clusters and verified the significant co-existence and bidirectional causality among agglomeration, labor productivity, and urbanization.

The functional dimension focuses on the operational and relational interactions among different kinds of industries, including supply chain connections, innovation networks, and collaboration (Duranton and Puga, 2005). Reviewing current literature shows that conventional approaches to studying agglomeration largely depend on empirical, category-based indicators derived from digit-based industrial classifications, such as the Standard Industrial Classification (SIC) (Van der Panne, 2004). Previous studies have used the Hirschman–Herfindahl index (HHI) to measure economic concentration by summing the squared proportions of market or area categories. De Lucio et al. (1996) applied HHI to employment data from industrial surveys at the provincial level in Spain. Shannon entropy has also been used to quantify the functional complexity of agglomeration. Attaran (1986) used Shannon entropy on employment data across 56 sectors within a 2-digit SIC framework. Frenken et al. (2007) introduced an entropy-based method to distinguish these functional roles using regional statistics from the Netherlands. Relying on six-digit NAICS industries, Auerswald and Dani (2022) introduced the concept of related specialization, to measure the concentration of high interdependencies between specialized industries within a metropolitan region. Wang et al. (2025) examine the co-agglomeration rules of agribusiness in the Yangtze River Delta urban agglomeration from an industrial chain perspective. Using the co-location quotient and Apriori algorithm, they analyze companies in agriculture and related industries as defined by the National Bureau of Statistics of China. As can be seen, current studies mainly rely on pre-defined industrial classifications or taxonomies, which may restrict the capability to capture the dynamic and complex functional behaviors within urban agglomerations (Chain et al., 2019).

Current literature has expanded beyond the static aspects of industrial activities to examine the relationships among industries (Sarach, 2015). Evidence suggests that regions are more likely to develop industries that share functional and technological similarities with existing industries, promoting interconnected economic activities within urban agglomerations (Grieser et al., 2022; He et al., 2019). Using city-level panel data from 1997 to 2006, Fu et al. (2010) found that specialization in wholesale, retail trade, and construction can positively influence employment rates. Lu et al. (2013) studied the connection between specialization and diversity in manufacturing activities, reporting that specialization significantly enhances local economic growth, while diversity has little impact. In recent years, geographic and economic researchers have constructed measures of relatedness among industrial activities using concepts like industry space or spatial proximity indexes. Boschma et al. (2013) studied the emergence of new industries in Spain by measuring the capability distance between new and existing export products, showing that regions tend to develop industries functionally and technologically similar to those already present. Davies and Maré (2021) develop a refined measure of relatedness between economic activities based on weighted correlations of local employment shares, which explicitly accounts for local specialization and variations in employment data quality across the cities in New Zealand. Grieser et al. (2022) demonstrate that interactions with geographic neighbors significantly influence corporate investment behavior, helping to explain the persistence of industrial clusters such as Silicon Valley. At a smaller spatial scale, some studies have examined the relationship between industrial activities and their spatial patterns within cities. For example, Liu et al. (2021a) applied a data-driven co-location patterns mining approach to directly capture spatial relationships among multiple industrial types. Xu et al. (2025) investigated the coupling relationship between industrial linkage and spatial co-agglomeration of advanced manufacturing and producer services in Beijing using input-output tables and micro-enterprise spatial data.

Advancements in big data and machine learning, especially in network analysis, have become increasingly important in recent years (Cai and Tan, 2025; Ducruet and Beauguitte, 2014; Saleh et al., 2025). Network science is a multidisciplinary field that uses graph theory and models to build, describe, and analyze complex systems (Ghosh et al., 2024). Regarding industrial functions, several case studies have examined the relationships between industrial activities from a network perspective, revealing functional dynamics and connectivity in urban industrial agglomerations (Cheng et al., 2022; Liu et al., 2021b). These approaches demonstrate the value of network science as a tool to understand interactions among industrial activities.

Study area and dataset

The Guangdong-Hong Kong-Macao Greater Bay Area (GBA) in China serves as the study area to examine relationships among industrial functions. The GBA is an emerging megacity region with approximately 87 million residents in 2023 and covers an area of 56,000 km², including nine mainland cities and the Hong Kong and Macao Special Administrative Regions (SARs), as shown in Figure 1(a). Several reasons justify selecting the GBA for this research. First, the region exhibits strong economic performance, with a total GDP of USD 1,958.1 billion and a per capita GDP of USD 22,585 in 2021, contributing over 13% to China’s national GDP (HKTDC, 2024). The manufacturing sector in the Pearl River Delta has been a key driver of this growth, making the GBA a major economic center in the Asia-Pacific. However, recent concerns raised by central, provincial, and SAR governments point to uneven economic development and disparities within the region. In response, policies have been implemented to promote regional integration and cooperation by improving understanding of the spatial and functional patterns of industrial activities in the GBA (Chen and Yeh, 2022).

Figure 1.

(a) The spatial layout of the GBA; (b) the spatial distribution of aggregated industrial POIs density in the GBA; (c) the pre-processing workflow to aggregate industrial activities; (d) the pre-processing workflow to extract functional information of industrial activities.

This study utilizes Points of Interest (POIs) sourced from AutoNavi Map (i.e., a Chinese navigation and location-based services provider) collected in 2023, comprising over 450,000 manufacturing-related records. POIs offer rich spatial and functional insights, enabling researchers to investigate diverse urban functions and human activities (Yu et al., 2022). Each POI record includes attributes such as establishment name, geographic coordinates, address, city, and multi-level categorical labels. Figure 1(b) illustrates the spatial distribution of aggregated POI density. Specifically, as shown in Figure 1(c), a 10 km² resolution grid is employed to aggregate POIs spatially and depict their concentration intensity.

Figure 1(d) illustrates the language pre-processing pipeline used to extract semantic keywords from POI names. All original Chinese POI names are first translated into English using an LLM model. The translated results are then manually validated and corrected one by one by the research team to minimize translation artefacts, such as mistranslations and homonyms. In this step, similar or synonymous terms are merged where appropriate to ensure consistency. The criteria proposed by Yu et al. (2022) are used to identify industrial agglomerations. Macao is excluded from the subsequent analysis, as its economy has shifted predominantly toward gambling and tourism-related services (Wan and Li, 2013).

Methodology

A data-driven framework is proposed by this study to understand functional relationships within industrial agglomerations. To model the relationship of industrial activities, keywords extracted from the POIs name text are used to construct a one-mode network $G = (F, E)$ . Here, $F$ represents a set of nodes reflecting functional characteristics and $E$ refers to a set of edges within the network describing the relationship among different industrial functions. For an edge $e_{a, b}$ , the weight is quantitatively defined from two aspects related to the spatial and functional interactions, which are respectively referred to as spatial co-occurrence and sectoral convergence.

Spatial co-occurrence patterns of industrial functions

To discover the spatial co-occurrence patterns of industrial functions, data mining association analysis is applied to tokenized keywords of establishment name text. Association rules have been a common analytical technique used in business studies to address Market Basket Analysis problems, by unveiling the occurrence pattern in a large dataset (Kotsiantis and Kanellopoulos, 2006). In this study, data mining association analysis is introduced to extract and summarize the functional characteristics of industrial activities for each city. The formula for association rules in data mining is represented as:

I t e m s e t = {K_{1}, K_{2}, K_{3} \dots . . K_{n}}

(1)

{K_{i}} \to {K_{j}} (w h e r e 1 \leq i, j \leq n)

(2)

where

K_{n}

denotes a functional keyword extracted from the tokenized POI name text,

K_{i}

and

K_{j}

represent the antecedent and the consequent, respectively. It is important to note that the arrow

\to

indicates a co-occurrence relationship rather than a causal one. Three key measures are used to evaluate the strength of an association rule:

S u p p o r t {{K_{i}} \to {K_{j}}} = \frac{N u m b e r o f g r i d u n i t s c o n t a i n i n g {K_{i}, K_{j}}}{T o t a l n u m b e r o f g r i d u n i t s}

(3)

C o n f i d e n c e {{K_{i}} \to {K_{j}}} = \frac{N u m b e r o f g r i d u n i t s c o n t a i n i n g {K_{i}, K_{j}}}{N u m b e r o f g r i d u n i t s c o n t a i n i n g {K_{i}}}

(4)

L i f t {{K_{i}} \to {K_{j}}} = \frac{S u p p o r t {{K_{i}} \to {K_{j}}}}{S u p p o r t {K_{i}} \times S u p p o r t {K_{j}}}

(5)

Here, the support of the association rule indicates the number of grid observations containing antecedent and consequent items divided by the total number of grid observations. The confidence reports the number of grid observations containing antecedent and consequent items divided by the number of grid observations containing antecedent items. At the same time, lift evaluates the performance of the targeted association rules in predicting grid observations, compared to random conditions. In this study, values of left larger than 1 are considered to effectively evaluate the strength of a given association rule, which describes the significant spatial co-occurrence of different industrial functions within the same geographic space.

Quantifying sectoral convergence (industrial relatedness)

Apart from spatial co-occurrence, the functional dimension of industrial relatedness is also critical for constructing the network’s edge weights. In this study, pre-trained word embeddings are employed to quantify semantic similarity between functional keywords, thereby capturing the underlying relatedness between different industrial activities. BERT model is adopted as the primary model because its contextual embeddings can better capture nuanced semantic relationships in multi-word and domain-specific industrial terms (e.g., precision manufacturing), which can perform better than static embeddings such as Word2Vec and GloVe. By encoding keywords as dense vectors in a lower-dimensional space, these embeddings can effectively capture not only synonyms but also subtle functional associations and technological complementarities between industrial activities. This approach enables a more refined measurement of functional relatedness beyond simple spatial co-location.

To capture the functional relationship between two keywords, cosine similarity is used to measure the cosine of the angle between their vector representations in the multi-dimensional embedding space. The cosine similarity ranges from −1 to 1, where a value closer to 1 indicates higher semantic similarity. The formula is given as follows:

\cos (θ) = \frac{A \cdot B}{‖ A ‖ ‖ B ‖} = \frac{\sum_{i = 1}^{n} A_{i} B_{i}}{\sqrt{\sum_{i = 1}^{n} A_{i}^{2}} \sqrt{\sum_{i = 1}^{n} B_{i}^{2}}}

(6)

Where $\cos (θ)$ denotes the cosine of angle $θ$ between two vectors, $A \cdot B$ is the dot product (i.e., scalar product) of vectors two vectors, while $‖ A ‖$ and $‖ B ‖$ are the magnitudes (i.e., norms) of corresponding vectors.

When two vectors are closely aligned, the angle between them approaches zero, producing a Cosine Similarity value near 1. Words with strong semantic similarity represented by vectors pointing in similar directions can yield higher Cosine Similarity scores, indicating stronger functional relationships within urban industrial networks.

Constructing industrial functional network

Having quantified the spatial and functional relationships among industrial functional keywords, the weights of network edges can be determined by integrating these relatedness metrics into a unified coefficient. Specifically, lift value from association rule mining and cosine similarity are combined into one integrated coefficient $C$ to represent the edge weight using the following equations:

{l i f t}_{n o r m} = \frac{\log (l i f t)}{\log (\max (l i f t))}

(7)

{c o s i n e}_{n o r m} = \frac{c o s i n e + 1}{2}

(8)

C = {l i f t}_{n o r m} + {c o s i n e}_{n o r m}

(9)

It should be noted that the integrated coefficient assigns equal weight to the normalized lift value and cosine similarity. This equal weighting is used because the two metrics capture complementary dimensions of industrial relatedness, including spatial co-agglomeration intensity and functional complementarity, which are considered equally important in this study.

Network analysis is further conducted to explore the functional characteristics of agglomeration economies within cities. An undirected network is constructed to reflect various structural properties of clustering industrial functions.

As described previously, spatial and functional relationships can be employed to establish undirected network $G = (F, E)$ . Here, nodes $F$ within the undirected network are regarded as functional keywords related to different industrial activities, while the edge between two nodes $e_{a, b} \in E$ corresponds to the integration of spatial co-occurrence and sectoral convergence between two functional keywords. Having generated the undirected network of clustering industrial functions, several structural properties are selected to describe the network properties, of which their descriptions are summarized in Table A1 in appendix.

Community detection of related industrial activities

Another objective of this research is to reveal the community structure of functional networks, which can be considered a critical characteristic of real-world network behaviors. The community structure of a network showcases how nodes cluster into tightly interconnected groups, reflecting functional relationships within urban industrial agglomerations. The Louvain method is employed in this case to detect such community structures. This algorithm begins by treating each node as an independent community and iteratively merges nodes into communities based on the largest increase in modularity score, continuing until no further improvement in modularity is achieved. This process identifies cohesive functional clusters in urban networks. In an undirected network $G$ , the modularity $Q$ can be represented as follows:

Q = \frac{1}{2 m} \sum_{i, j} [w (e_{i, j}) - \frac{k_{i} * k_{j}}{2 m}] δ (c (v_{i}), c (v_{j}))

(10)

In this respect, $w (e_{i, j})$ represents the weight of the edge, $k_{i}$ and $k_{j}$ are the sum of the weights of all edges connecting to nodes $v_{i}$ and $v_{j}$ , while $c (v_{i})$ and $c (v_{j})$ are the communities of the nodes $v_{i}$ and $v_{j}$ , separately. $m$ is the total sum of all edge weights in the network. The Kronecker delta function $δ (c (v_{i}), c (v_{j}))$ evaluates whether nodes $v_{i}$ and $v_{j}$ are in the same community, returning 1 if they are and 0 otherwise. As the Louvain algorithm is stochastic, its results may vary across runs due to random node ordering during the optimization phase. To ensure robustness and assess stability, the Louvain algorithm is executed 100 times using different random seeds. The mean, minimum, maximum, standard deviation, and range of the resulting modularity scores are calculated to assess the stability.

Analytical results

Descriptive analysis of industrial characteristics

The descriptive analysis of functional keywords extracted from industrial POIs in the study area is presented here. A total of 299 different keywords were found, with an overall count of 295,302. The average number of keywords per grid is 188.81. Table 1 shows the descriptive statistics for the top eight keywords aggregated by grid. The keyword Electronic is the most frequent, appearing 25,420 times, with a mean of 20.1 per grid and a standard deviation of 33.8, indicating significant concentration with considerable variation. Hardware ranks second with 19,494 occurrences and a mean of 15.1. Mechanical and Metal have frequencies of 11,541 and 7,522, respectively, reflecting their roles in manufacturing-related activities. Keywords such as Precision, Package, Appliance, and Furniture have lower frequencies between 6,467 and 6,927, with means ranging from 6.1 to 7.7, indicating more specialized or less common industrial functions. Differences in standard deviations (e.g., 17.8 for Appliance and 19.2 for Furniture) and 95th percentile values suggest a heterogeneous spatial distribution of these functional activities across the Greater Bay Area.

Table 1.

Descriptive statistics of the top-8 ranked keywords aggregated by grids.

Rank	Keyword	Frequency	Mean	Std	95% percentile
1	Electronic	25,420	20.1	33.8	84.9
2	Hardware	19,494	15.1	22.0	59.0
3	Mechanical	11,541	9.1	12.9	31.3
4	Metal	7522	6.9	10.9	25.0
5	Precision	6927	7.7	15.3	27.3
6	Package	6784	6.1	7.8	20.0
7	Appliance	6551	6.8	17.8	25.0
8	Furniture	6467	6.2	19.2	18.7

Figure 2 presents the word cloud visualization of functional keywords alongside the spatial distribution of selected keywords. In Figure 2(a), the keywords cover a broad range of manufacturing sectors, reflecting diverse industrial activities. For example, keywords related to electronics and electronic appliances indicate a strong presence of technology-driven industries. Keywords associated with hardware and mechanical manufacturing represent key areas of heavy industry focused on equipment and machinery production. Additionally, keywords like metal and material relate to core industrial activities, likely involving fabrication and raw material handling, while precision manufacturing points to specialized, high-accuracy production processes.

Figure 2.

(a) Word cloud visualization of functional keywords extracted from industrial POIs in the GBA, with font size and color related to the keyword frequency; (b) spatial distribution of top-4 ranked keywords’ distribution in the GBA.

Figure 2(b) shows the spatial distribution of different keywords, revealing considerable heterogeneity. Industrial activities linked to electronics are mainly clustered in the central and eastern parts of the Greater Bay Area (GBA), particularly in Shenzhen and Dongguan. Mechanical manufacturing is concentrated largely in the central region. The distributions of hardware and metal keywords show similar patterns, with high-frequency grids spread across the major cities in the GBA, indicating widespread industrial functions in the region.

To evaluate the robustness of the POI-derived industrial characteristics, we validate the top-8 functional keywords using enterprise registration data in 2023. For each keyword, we obtain the proportion of manufacturing companies whose registered business scope related to functional keywords in the Greater Bay Area. The Pearson correlation coefficient between these proportions and the POI keyword frequencies is 0.81 (p = 0.015), suggesting a strong positive association. A further city-level analysis focusing on representative cities, including Guangzhou, Foshan, and Shenzhen, yields a higher correlation of 0.84 (p < 0.001). The scatter plot of POI-based keyword frequencies versus enterprise registration-based frequencies is shown in Figure A1 of the appendix. The consistency across both regional and city scales can support the reliability of our approach using POIs to reflect industrial activities in the Greater Bay Area.

Industrial functional network and its structural properties

The functional network of industrial activities is constructed, visualized, and analyzed in this subsection by integrating spatial and functional relationships among keywords to address the second research question.

The network properties of the industrial functional network are presented. The network consists of 72 nodes and 434 edges. A network density of 0.17 indicates a moderately connected structure. The average degree of 12 means that each node connects to about 12 other nodes, showing strong interrelationships among various industrial activities. The global clustering coefficient of 0.63 suggests that nodes tend to form tightly connected groups, highlighting cohesive functional clusters in the network.

Figure 3 shows the industrial functional network using a spring layout based on POIs in the GBA and includes the rank-size distribution of node strength. In this distribution, larger node sizes correspond to higher strength values. A few dominant nodes, such as material, food, building material, architecture, and biology, exhibit significantly higher strength and occupy central positions. This pattern reflects the king effect, whereby a small number of nodes possess disproportionately high connectivity and influence compared to the rest of the network. These nodes can be seen as central or influential within the regional industrial system. In contrast, nodes positioned at the periphery have lower strength, indicating a less prominent role. For example, nodes 70 and 71, representing coating and electroplating, likely correspond to niche manufacturing processes or subsectors with limited interconnections, though they may still contribute to specific subnetworks or clusters.

Figure 3.

(a) Industrial functional network constructed using POIs in the GBA, with node size based on keyword frequency and edge width reflecting the strength of spatial-functional associations; (b) rank-size distribution of nodes’ strength observed in the industrial functional network. Full legend of the network nodes is provided in Figure A2 of appendix.

Figure 3(b) shows the rank-size distribution of node strength in the industrial functional network to examine network heterogeneity. The node strength decreases sharply among the top ranks, indicating that a small number of keywords have much higher strength than others. A logarithmic function, y = a log(x) + b, is employed to fit the distribution, with coefficients a = −1.89 and b = 8.36. The negative coefficient a confirms the declining trend in node strength as rank increases, quantifying the rate of this decline. This pattern suggests that the top-ranked keywords represent key materials, sectors, or processes in the local industrial system, playing a central role in supporting industrial activities and reflecting significant economic or functional importance.

In addition to node strength, node degree serves as another critical metric for assessing the importance of keywords within an industrial functional network. This is illustrated in the scatter plot of node strength versus degree for industrial functional keywords, as presented in Figure 4. The plot reveals a general positive correlation between node degree and strength, indicating that an increase in the number of connections (i.e., degree) tends to incrementally enhance a node’s total connection weight (strength). Notably, several keywords, such as electronic, mechanical, food, and material, exhibit both a high degree of strength, reflecting their central roles in major manufacturing processes and their close ties to local industrial activities. An intriguing finding is that some keywords with a high degree and strength, such as those mentioned, appear with relatively low frequency in the Greater Bay Area, possibly due to their specialized critical roles in industrial sectors. Furthermore, the shell-layout network, which highlights the top 50% of edges by weight, reveals distinct relational patterns among specific keywords. For instance, the edge between node 20 (Aluminium) and node 28 (Door and Frame) exhibits a notably high weight, indicating a strong association. Additionally, node 17 (Clothing) demonstrates robust connections with both node 13 (Worn Accessories) and node 21 (Garment), underscoring its pivotal role in the textile-related subnetwork.

Figure 4.

Scatter plot of node’s strength versus degree of industrial functional keyword, with dot size proportional to keyword frequency, and shell-layout network highlighting the top 50% edges by weight.

Functional communities

In addition to the network properties, this study further identifies and investigates the network community structure to figure out groups of nodes that are more densely connected within the group than to the rest of the network. Figure 5(a) shows the communities identified in the industrial functional network, with different colors representing each community. Five communities are detected using the Louvain method. The modularity scores range from 0.400 to 0.426 (mean = 0.417 ± 0.002). The most common community partition is obtained in 88% of the runs, indicating good stability. The modularity score of 0.42 indicates a clear community structure, suggesting that the functional connections between industrial agglomerations are not randomly distributed. Communities 1 and 4 are the largest, containing 29 and 31 nodes respectively. The other communities have fewer nodes and can be considered small-scale.

Figure 5.

(a) The partition result of community detection for the industrial functional network, of which various colors represent the identified communities; (b) the word cloud for showing the industrial functions different communities; (c) the cumulative probability distribution of degree centrality in terms of different communities.

Figure 5(b) shows word clouds of keywords from each community, highlighting distinct industrial themes. Community 1 covers a wide range of sectors such as food, biology, electronics, and architecture, indicating diverse industrial coverage. Other communities focus on more specific industries: Community 4 is centered on manufacturing related to metal and materials, while Communities 3 and 5 are linked to specialized production of shoes and garments. Meanwhile, Figure 5(c) displays the degree centrality distribution, which reflects network connectivity. Most nodes in Communities 1 and 5 have low degree centrality, indicating weaker internal connections and less interaction. In contrast, Community 4 has strong interconnections, forming a cohesive subnetwork focused on material-related industrial activities.

The spatial distribution of keywords associated with each community within the functional network across the Greater Bay Area (GBA) is shown in Figure A3 of the appendix, revealing diverse geographic patterns that reflect the region’s industrial landscape. Keywords from Communities 1 and 4 exhibit a wide spatial distribution across the Greater Bay Area, indicating broad regional influence. Both communities show statistically significant positive spatial auto-correlation, with Global Moran’s I values of 0.45 for Community 1 and 0.56 for Community 4. Community 1 displays a distinct dual-center spatial pattern, with high keyword frequencies concentrated in the central GBA and along the eastern bank of the Pearl River estuary. This more balanced distribution reflects a diverse mix of industrial activities, including food, biology, electronics, and architecture-related sectors. On the other hand, Community 4 exhibits a stronger monocentric pattern, with keyword concentrations primarily on the northern side of the Pearl River estuary, focusing heavily on metal and material industries. The higher Global Moran’s I for Community 4 suggests a more obvious spatial clustering compared to Community 1, which may be attributed to stronger urbanization economies in the densely populated central megacities that support localized metal and material production. Other communities display more concentrated spatial patterns, consistent with Marshallian specialization, where high keyword frequencies appear in only a few areas. For example, Community 2 is strongly concentrated in the western GBA, with a Global Moran’s I value of 0.60, which is the highest one among all communities. This pronounced spatial clustering reflects the presence of highly specialized manufacturing hubs in the lighting industry. Additionally, Communities 3 and 4, which contain highly specialized functional keywords, show significant industrial clusters mainly in the hinterland and peripheral regions of the GBA. These patterns indicate that niche industries, such as shoes and garments, are located in less central areas to benefit from cost advantages and regional resources.

Discussion and conclusions

Having gone through the current literature, a research void has been reported in that the interconnections among various industrial functions are less investigated. This study, therefore, proposes a network-based approach via Points of Interest (POIs) to construct and explore networks of industrial functions, identify interconnected industrial functional communities, and map their spatial distributions.

The descriptive statistics and spatial distribution of functionally aggregated keywords are presented first. The analysis identified 299 distinct functional keywords from industrial POIs, with a total frequency of 295,302. The mean frequency of keywords per grid is 188.81. These results indicate a wide range of industrial activities across the Greater Bay Area (GBA), consistent with the findings of Yu et al. (2023), who reported diverse industrial activities in this region. The most frequent keyword electronic appears 25,420 times, with an average frequency of 20.1 per grid and a high standard deviation of 33.8, reflecting both a strong concentration and substantial spatial variability. The spatial distribution shows that electronics-related industrial activities are mainly clustered in the central and eastern parts of the GBA, particularly in Shenzhen and Dongguan. Fu et al. (2012) and Stevens (2019) noted that these two cities have established regional innovation systems for electronics industries, which support well-developed electronic markets. In contrast, other top keywords such as hardware, mechanical, and metal highlight the dominance of heavy manufacturing functions. These keywords show high-frequency grids mainly concentrated in the central areas of the study region (Li et al., 2018).

The industrial functional network and its structural properties are then investigated to unveil the complex interconnections among various industrial functions. The average degree and global clustering coefficient are 12 and 0.63, respectively, indicating a high tendency for nodes to form cohesive functional clusters with robust interrelationships. The network with a spring layout is then visualized to highlight that a small number of central nodes possess higher strength values. The rank-size distribution of node strength reveals a pronounced king effect, a phenomenon in which a small number of nodes are disproportionately more influential and better connected than the rest of the network, deviating significantly from the expected rank-size rule. Combined with the significant king effect observed in the rank-size distribution of node strength, influential roles of certain industrial activities in connecting multiple sectors within the Greater Bay Area can be clearly revealed. In urban geography, the king effect refers to a phenomenon whereby the largest entity is disproportionately more dominant than others, deviating markedly from the expected rank-size rule (Ausloos and Cerqueti, 2016). Indeed, Hu and Lin (2011) confirmed that the mechanical sector had produced profound impacts on the local manufacturing economies within the region of the Pearl River Delta (PRD). On the other hand, a scatter plot is generated to demonstrate the positive correlation between node degree and strength. Keywords featured with a high degree and strength (e.g., electronic, food, and mechanical) are observed, which can play central roles in major manufacturing processes and are closely tied to local industrial activities. An intriguing observation is that some high-degree, high-strength keywords appear with relatively low frequency, such as food and material. Such phenomena may relate to the intensive spatial co-occurrence that can be explained by the widely distributed niche industrial sectors in cities to support local daily needs, mostly related to fast-moving consumer goods (He et al., 2020; Walker, 2017). The network is also visualized at the shell layout to better indicate the relationships among different keywords. In this context, a notably high-weighted edge is observed between the keyword aluminium and door and window frame, indicating that aluminium serves as a primary material for manufacturing doors and frames in the local market. This finding aligns with the survey results by Xu et al. (2023), which identify the casement window with an aluminium (ALU) frame as the dominant window type in the Chinese market.

This study further analyses the community structure of the industrial functional network using the Louvain detection algorithm. Five communities are identified, and the modularity score of 0.42 indicates that spatial and functional relationships within the network are non-random, reflecting a key characteristic of real-world networks. Each community shows distinct patterns in terms of the number of keywords and their functional focus. Communities 1 and 4 include a large number of keywords covering various manufacturing sectors. In contrast, communities 2, 3, and 5 contain fewer keywords, mainly associated with specialized sectors such as shoes and garments. Considering degree centrality, this network patterns appear closely ties to sectoral economies and cannot be fully explained by linear models. Existing literature suggests that industrial networks function as complex, nonlinear systems that foster complementary relationships among activities, enabling cities to benefit from scale economies (Taylor and Derudder, 2015). The spatial patterns of Communities 1 and 4 in the Greater Bay Area reveal distinct structures that can be interpreted through the lens of core-periphery and polycentric urban region theories. Community 1 exhibits a clear dual-center pattern, with strong concentrations in the central GBA and the eastern Pearl River estuary. The dual-center structure of Community 1 suggests a deviation from a purely monocentric core-periphery configuration toward a more polycentric urban region (Meijers, 2005). Community 4 displays a monocentric pattern centered in the northern estuary. These findings align with the Core-Periphery Model (Capello, 2015; Fujita and Mori, 2005), which emphasizes how core areas concentrate economic activities and exert influence over peripheral regions. On the other hand, the other communities show a more concentrated Marshallian specialization pattern. Community 2 is mainly concentrated in the western GBA and centers on the lighting industry, while Communities 3 and 4 specialize in shoes and garments with significant activity in peripheral hinterlands. These concentration patterns are linked to specialization and are attributed to localized economies in peripheral areas, which offer affordable land, shared inputs, and labor (Strange, 2008). Our empirical findings show that the strength of agglomeration economies is closely linked to their geographic externalities.

Several restrictions should be mentioned. For example, using POI counts as a proxy for industrial functional presence can have biases when compared to actual manufacturing activities represented in the industrial functional network. Each POI record represents one registered establishment, regardless of its scale, employment level, or output. While grid-level aggregation helps smooth out micro-level variations and captures broader spatial patterns of functional presence, it cannot fully resolve the issue of unweighted establishment counts. Incorporating with alternative datasets, such as employment census or economic statistics, can help to address these discrepancies and serve as ground truth for functional attributes at the municipal level (Yu and Liu, 2021). Additionally, incorporating a longitudinal perspective is necessary to capture long-term changes in the network structure. Expanding the analysis to include time-series and comparative approaches could provide deeper insights into the temporal evolution and behavioral diversity of industrial functional networks. These issues remain unresolved and present important directions for future research.

Supplemental material

Supplemental Material - Unveiling industrial functional networks in megacity regions: A spatial and network-based analysis of agglomeration economies

Supplemental Material for Unveiling industrial functional networks in megacity regions: A spatial and network-based analysis of agglomeration economies by Zidong Yu, Xintao Liu in Environment and Planning B: Urban Analytics and City Science.

Footnotes

ORCID iDs

Zidong Yu

Xintao Liu

Funding

This work was supported by the Distinguished Postdoctoral Fellowship Scheme (Grant No.YWD9) at the Hong Kong Polytechnic University.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data Availability Statement

Data available on request from the authors.

Supplemental material

Supplemental material for this article is available online.

Author biographies

Martin Zidong Yu is a Distinguished Postdoctoral Fellow at the Department of Land Surveying and Geo-Informatics, The Hong Kong PolyU. With a particular emphasis on Hong Kong and its neighboring areas, his research explores how diverse geospatial data and advanced analytical methods can reveal the hidden stories of cities to support local communities.

Xintao Liu is an Associate Professor in the Department of Land Surveying and Geo-Informatics at The Hong Kong Polytechnic University (PolyU). His research interests are centered around GIScience, transportation geography, and complex networks. My objective is to leverage state-of-the-art technologies to advance smart cities for a better urban life.

References

Attaran

(1986) Industrial diversity and economic performance in US areas. The Annals of Regional Science 20(2): 44–54. https://doi.org/10.1007/bf01287240

Auerswald

Dani

(2022) Entrepreneurial opportunity and related specialization in economic ecosystems. Research Policy 51(9): 104445. https://doi.org/10.1016/j.respol.2021.104445

Ausloos

Cerqueti

(2016) A universal rank-size law. PLoS One 11(11): e0166011. https://doi.org/10.1371/journal.pone.0166011

Boschma

Minondo

Navarro

(2013) The emergence of new industries at the regional level in Spain: a proximity approach based on product relatedness. Economic Geography 89(1): 29–51. https://doi.org/10.1111/j.1944-8287.2012.01170.x

Cai

Tan

(2025) Exploring the relationship between income inequality and crime in Toronto using frequentist and Bayesian models: examining different crime types and spatial scales. Environment and Planning B: Urban Analytics and City Science 52(8): 1814–1831. https://doi.org/10.1177/23998083241311969

Capello

(2015) Regional Economics. Routledge.

Chain

Santos

ACD

Castro

LGD

, et al. (2019) Bibliometric analysis of the quantitative methods applied to the measurement of industrial clusters. Journal of Economic Surveys 33(1): 60–84. https://doi.org/10.1111/joes.12267

Chen

Yeh

AGO

(2022) Delineating functional urban areas in Chinese mega city regions using fine-grained population data and cellphone location data: a case of pearl river Delta. Computers, Environment and Urban Systems 93: 101771. https://doi.org/10.1016/j.compenvurbsys.2022.101771

Cheng

Xie

Zhang

(2022) Industry structure optimization via the complex network of industry space: a case study of Jiangxi Province in China. Journal of Cleaner Production 338: 130602. https://doi.org/10.1016/j.jclepro.2022.130602

10.

Davies

Maré

(2021) Relatedness, complexity and local growth. Regional Studies 55(3): 479–494. https://doi.org/10.1080/00343404.2020.1802418

11.

De Lucio

Herce

Goicolea

(1996) Externalities and industrial growth: Spain 1978-1992 (Working Paper No. 96-14). FEDEA. https://ideas.repec.org/p/fda/fdaddt/9614.html

12.

Ducruet

Beauguitte

(2014) Spatial science and network science: review and outcomes of a complex relationship. Networks and Spatial Economics 14(3): 297–316. https://doi.org/10.1007/s11067-013-9222-6

13.

Duranton

Puga

(2005) From sectoral to functional urban specialisation. Journal of Urban Economics 57(2): 343–370. https://doi.org/10.1016/j.jue.2004.12.002

14.

Ellison

Glaeser

Kerr

(2010) What causes industry agglomeration? Evidence from coagglomeration patterns. The American Economic Review 100(3): 1195–1213. https://doi.org/10.1257/aer.100.3.1195

15.

Frenken

Van Oort

Verburg

(2007) Related variety, unrelated variety and regional economic growth. Regional Studies 41(5): 685–697. https://doi.org/10.1080/00343400601120296

16.

Dong

Chai

(2010) Industry specialization, diversification, churning, and unemployment in Chinese cities. China Economic Review 21(4): 508–520. https://doi.org/10.1016/j.chieco.2010.04.007

17.

Diez

Schiller

(2012) Regional innovation systems within a transitional context: evolutionary comparison of the electronics industry in Shenzhen and Dongguan since the opening of China. Journal of Economic Surveys 26(3): 534–550. https://doi.org/10.1111/j.1467-6419.2012.00721.x

18.

Luo

(2022) Does urban agglomeration promote the development of cities? An empirical analysis based on spatial econometrics. Sustainability 14(21): 14512. https://doi.org/10.3390/su142114512

19.

Fujita

Mori

(2005) Frontiers of the new economic geography. Papers in Regional Science 84(3): 377–405. https://doi.org/10.1111/j.1435-5957.2005.00021.x

20.

Ghosh

Mallick

Chowdhury

, et al. (2024) Graph theory applications for advanced geospatial modelling and decision-making. Applied Geomatics 16(4): 799–812. https://doi.org/10.1007/s12518-024-00586-3

21.

Glaeser

Kallal

Scheinkman

, et al. (1992) Growth in cities. Journal of Political Economy 100(6): 1126–1152. https://doi.org/10.1086/261856

22.

Grieser

LeSage

Zekhnini

(2022) Industry networks and the geography of firm behavior. Management Science 68(8): 6163–6183. https://doi.org/10.1287/mnsc.2021.4133

23.

Guillain

Le Gallo

(2010) Agglomeration and dispersion of economic activities in and around Paris: an exploratory spatial data analysis. Environment and Planning B: Planning and Design 37(6): 961–981. https://doi.org/10.1068/b35038

24.

Zhou

Tang

, et al. (2019) The spatial organization pattern of urban-rural integration in urban agglomerations in China: an agglomeration-diffusion analysis of the population and firms. Habitat International 87: 54–65. https://doi.org/10.1016/j.habitatint.2019.04.003

25.

Chen

, et al. (2020) New towns and the local agglomeration economy. Habitat International 98: 102153. https://doi.org/10.1016/j.habitatint.2020.102153

26.

Hong Kong Trade Development Council (HKTDC) (2024) Research on the greater Bay area. https://research.hktdc.com/en/article/MzYzMDE5NzQ5

27.

Lin

(2011) Situating regional advantage in geographical political economy: transformation of the state-owned enterprises in Guangzhou, China. Geoforum 42(6): 696–707. https://doi.org/10.1016/j.geoforum.2011.06.002

28.

Jacobs

(2016) The Economy of Cities. Vintage.

29.

Kotsiantis

Kanellopoulos

(2006) Association rules mining: a recent overview. GESTS International Transactions on Computer Science and Engineering 32(1): 71–82.

30.

Krugman

(1991) Increasing returns and economic geography. Journal of Political Economy 99(3): 483–499. https://doi.org/10.1086/261763

31.

Xue

Huang

(2018) The role of manufacturing in sustainable economic development: a case of Guangzhou, China. Sustainability 10(9): 3039. https://doi.org/10.3390/su10093039

32.

Liu

Chen

, et al. (2021a) Detecting industry clusters from the bottom up based on co-location patterns mining: a case study in Dongguan, China. Environment and Planning B: Urban Analytics and City Science 48(9): 2827–2841. https://doi.org/10.1177/2399808321991542

33.

Liu

Huang

Lai

, et al. (2021b) Analysis of urban agglomeration structure through spatial network and Mobile phone data. Transactions in GIS 25(4): 1949–1969. https://doi.org/10.1111/tgis.12755

34.

Tao

, et al. (2013) City–industry growth in China. China Economic Review 27: 135–147. https://doi.org/10.1016/j.chieco.2013.08.004

35.

Meijers

(2005) Polycentric urban regions and the quest for synergy: is a network of cities more than the sum of the parts? Urban Studies 42(4): 765–781. https://doi.org/10.1080/00420980500060384

36.

Neffke

Hartog

Boschma

, et al. (2018) Agents of structural change: the role of firms and entrepreneurs in regional diversification. Economic Geography 94(1): 23–48. https://doi.org/10.1080/00130095.2017.1391691

37.

O’Donoghue

Gleave

(2004) A note on methods for measuring industrial agglomeration. Regional Studies 38(4): 419–427. https://doi.org/10.1080/03434002000213932

38.

Raikov

Ermakov

Merkulov

(2019) Assessments of the economic sectors needs in digital technologies. Lobachevskii Journal of Mathematics 40(11): 1837–1847. https://doi.org/10.1134/s1995080219110246

39.

Rosenthal

Strange

(2003) Geography, industrial organization, and agglomeration. Review of Economics and Statistics 85(2): 377–393. https://doi.org/10.1162/003465303765299882

40.

Saleh

HJH

Arrang

JRT

de Barros Cardoso

LMO

(2025) Applications of geospatial AI in human geography and spatial networks: a literature review. EDRAAK 2025: 76–84. https://doi.org/10.70470/edraak/2025/010

41.

Sarach

(2015) Analysis of cooperative relationship in industrial cluster. Procedia - Social and Behavioral Sciences 191: 250–254. https://doi.org/10.1016/j.sbspro.2015.04.279

42.

Singh

(2022) Cluster space among labor productivity, urbanization, and agglomeration of industries in Hungary. Journal of the Knowledge Economy 13(2): 1008–1027. https://doi.org/10.1007/s13132-021-00726-9

43.

Stevens

(2019) The quotidian labour of high tech: innovation and ordinary work in Shenzhen. Science Technology & Society 24(2): 218–236. https://doi.org/10.1177/0971721819841997

44.

Strange

(2008) Urban agglomeration. In: The New Palgrave Dictionary of Economics. Palgrave Macmillan, pp. 1–5.

45.

Taylor

Derudder

(2015) World City Network: A Global Urban Analysis. Routledge.

46.

Van der Panne

(2004) Agglomeration externalities: Marshall versus Jacobs. Journal of Evolutionary Economics 14(5): 593–604. https://doi.org/10.1007/s00191-004-0232-x

47.

Verstraten

Verweij

Zwaneveld

(2019) Complexities in the spatial scope of agglomeration economies. Journal of Regional Science 59(1): 29–55. https://doi.org/10.1111/jors.12391

48.

Vogiatzoglou

Tsekeris

(2013) Spatial agglomeration of manufacturing in Greece: sectoral patterns and determinants. European Planning Studies 21(12): 1853–1872. https://doi.org/10.1080/09654313.2012.722964

49.

Walker

(2017) The geography of production. In: A Companion to Economic Geography, (pp. 111–132). John Wiley & Sons.

50.

Wan

YKP

(2013) Sustainability of tourism development in Macao, China. International Journal of Tourism Research 15(1): 52–65. https://doi.org/10.1002/jtr.873

51.

Wang

Zhou

Ren

(2025) Spatial agglomeration patterns and co-agglomeration rules of agribusiness: from the perspective of industrial chain. Applied Geography 179: 103628. https://doi.org/10.1016/j.apgeog.2025.103628

52.

Huang

Gao

(2022) Impact of industrial agglomeration on new-type urbanization: evidence from Pearl River Delta urban agglomeration of China. International Review of Economics & Finance 77: 312–325. https://doi.org/10.1016/j.iref.2021.10.002

53.

Xie

Zhang

, et al. (2023) Optimal selection of window components in China based on energy performance modeling. Energy and Buildings 297: 113400. https://doi.org/10.1016/j.enbuild.2023.113400

54.

Shen

Liu

, et al. (2025) The coupling relationship between industrial linkage and spatial co-agglomeration of advanced manufacturing and producer services in metropolitan: a case study of Beijing, China. Chinese Geographical Science 35(3): 631–646. https://doi.org/10.1007/s11769-025-1521-6

55.

Liu

(2021) Urban agglomeration economies and their relationships to built environment and socio-demographic characteristics in Hong Kong. Habitat International 117: 102417. https://doi.org/10.1016/j.habitatint.2021.102417

56.

, et al. (2022) Spatial and functional organizations of industrial agglomerations in China’s greater Bay area. Environment and Planning B: Urban Analytics and City Science 49(7): 1995–2010. https://doi.org/10.1177/23998083221075641

57.

Xiao

Yan

, et al. (2023) The geographic disparity of agglomeration economies: evidence from industrial activities in China's emerging greater bay area. Applied Geography 161: 103128. https://doi.org/10.1016/j.apgeog.2023.103128

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.61 MB