Sage Journals: Discover world-class research

Abstract

The global economic landscape is currently undergoing profound structural adjustments, characterized by the ascent of emerging economies and concurrent challenges facing established economic powers. Within this context, innovation has emerged as a pivotal strategy for enhancing national competitiveness and securing strategic advantages in future development. We utilize citation counts of a patent in this article, which is more internationally recognized, as a characterizing variable of innovation quality, and use the accelerated genetic algorithm-optimized projection pursuit method (RAGA-PP) to calculate data element agglomeration. Based on the 2012 to 2022 Chinese provincial panel data, we first employ a fixed effect model to examine the direct effect of data element agglomeration on innovation quality. Secondly, the research framework includes a regional innovation ecosystem. The mediating effect model and the threshold regression model are used to investigate the indirect and non-linear effects of data element agglomeration on innovation quality, with the technological innovation subjects, R&D funds, and market environment serving as the respective constraints. It is found that: innovation quality is positively impacted by data element agglomeration, with a “central > west > east” effect. Data element agglomeration can indirectly improve innovation quality by boosting the number of technological innovation subjects, increasing the investment of R&D funds, and creating an open market environment structure; With the thresholds of technological innovation subjects, R&D funds, and market environment crossed, data element agglomeration can exert a more favorable impact on innovation quality. The conclusions offer governments some pointers for developing regulations for regional differentiation and creating a welcoming digital environment.

Plain Language Summary

This paper employs data empowerment theory and innovation ecosystem theory as its foundational frameworks. Utilizing the 2012-2022 Chinese provincial panel data, this paper employs a baseline regression model to examine the direct impact of data element agglomeration on innovation quality. Concurrently, the regional innovation ecosystem is integrated into the research framework. Mediating effect model and threshold regression model are applied to test the transmission mechanisms through which the regional innovation ecosystem influences the relationship between data element agglomeration and innovation quality. Results indicate that data element agglomeration exerts a significant positive direct effect on innovation quality, while also generating indirect transmission effects and nonlinear effects through regional innovation ecosystem.

Keywords

data element agglomeration innovation quality regional innovation ecosystem

Introduction

Innovation is the fundamental driving force of development and a crucial cornerstone for building a modernized economic system and safeguarding national security. As the world’s second-largest economy, China places high importance on enhancing innovation quality (INNO). The Third Plenary Session of the 20th CPC Central Committee emphasized the importance of establishing comprehensive institutional mechanisms to support innovation, highlighting the coordinated reform of education, science and technology, and talent systems, aiming to optimize the new national innovation system and upgrade its overall capacity. Under the incentive of innovation policies, China’s science and technology indicators have shown steady growth. In September 2025, the World Intellectual Property Organization (WIPO) released the 2025 Global Innovation Index in Hong Kong, listing the top 100 innovation clusters. China had 24 clusters on the list, ranking first globally in terms of quantity for the third consecutive year (World Intellectual Property Organization, 2025). At the micro-technology level, nanotechnology has yielded remarkable achievements. According to the China Nanotechnology Industry White Paper (Chinese Academy of Science, 2025), the global total of granted nanotechnology patents exceeded 1.078 million between 2000 and 2025, with China accounting for 464,000 (43%) of the total, thereby consolidating its global leadership (Chinese Academy of Sciences, 2025). However, domestic innovation also faces shortcomings. By 2024, China’s industrialization rate for invention patents had risen to 53.3%. Yet compared to developed nations, there remains a gap in innovation capabilities within high-end technology sectors, with breakthrough technologies being scarce. Concerns persist regarding the “high quantity but low quality” of patents. Currently, the global competitive landscape is undergoing profound adjustments. As China’s economy advances toward high-quality development, the traditional growth model reliant on factors of production and investment is becoming unsustainable. To transform from a manufacturing powerhouse into an innovation powerhouse, China urgently needs to optimize its innovation system and overcome innovation bottlenecks.

Amidst the surging waves of technological revolution and industrial transformation, continuous breakthroughs in cutting-edge fields such as artificial intelligence, biotechnology, and quantum computing are profoundly reshaping global industrial division of labor and competitive landscapes. Against this backdrop, data is increasingly emerging as a vital new factor of production. Based on the data empowerment theory, the deep integration and widespread application of cutting-edge technologies like digital twins and artificial intelligence have significantly expanded the coverage and penetration depth of data element within the real economy, gradually positioning them as the core driving force behind technological innovation. Data element exhibits high permeability, scalability, and low-cost sharing, providing innovation activities with a robust technological foundation and rich information resources (Calic & Ghasemaghaei, 2020) that foster an efficient, open innovation ecosystem. Facing the current predicament of innovation characterized by “high quantity but low quality,” China urgently needs to align with macroeconomic development trends, further unleash the potential of data empowerment, continuously stimulate innovation vitality, and drive the innovation system’s transition from scale expansion to quality enhancement. In the process, the data element agglomeration (AGG) is probably going to play a major role in resolving China’s present innovation conundrum. Then, a realistic question worth pondering is whether the AGG can effectively promote the improvement of INNO. If the answer is yes, what is the transmission mechanism of AGG affecting INNO? Further, based on China’s regional heterogeneity, does the INNO effect of AGG show unique patterns and characteristics? In-depth analyses of the impact of the AGG on INNO, and clarification of the unique role of AGG in the process of influencing the INNO, to formulate targeted development policies, will create important theoretical and practical value for enhancing China’s strength in science and technology, as well as its competitiveness abroad, and building a strong innovation country.

While existing studies have extensively examined AGG, most remain focused on enhancing innovation quantity (Liao et al., 2024; Liu & Yang, 2025), with relatively insufficient attention paid to its impact on INNO and underlying mechanisms. Especially, there is a lack of empirical tests on the mediating transmission and nonlinear regulatory effects of AGG through optimizing the regional innovation ecosystem and thereby enhancing INNO, based on the systematic path of “AGG—regional innovation ecosystem—INNO.” This limitation not only constrains the in-depth development of data empowerment theory in China’s unique context but also fails to effectively support differentiated governance requirements for regional innovation practices. Therefore, this article adopts data empowerment theory and innovation ecosystem theory as its foundational frameworks. Using panel data from 30 Chinese provinces spanning 2012 to 2022, it employs benchmark regression, mediating effects, and threshold regression models to examine the impact of AGG on INNO and its transmission mechanisms. This article aims to provide theoretical guidance for leveraging AGG to enhance INNO, offer practical insights for regions to develop suitable innovation ecosystem, and further expand the research frontiers of data-driven innovation.

The potential marginal contribution of this article lies in the following aspects: First, from the perspective of AGG, systematically establishes the theoretical connection between it and INNO, reveals the deep-seated interaction mechanism between the two, and thereby expands the research scope of INNO. Second, it incorporates innovation ecosystem into the research framework, exploring the intrinsic mechanisms through which AGG influences INNO from perspectives including technological innovation subjects (TIS), R&D funds (FUND), and market environment (MAR), thereby deepening and broadening the understanding of data element’s innovative efficacy. Finally, this article conducts an in-depth analysis and discussion of the non-linear effects of AGG on INNO. It aims to provide a basis for formulating regional innovation policies and offer empirical support for enhancing China’s overall INNO.

Following the introduction, the remainder of this article is organized as follows: The second part presents a literature review, synthesizing existing studies on AGG, INNO, the innovation effects of data element, and regional innovation ecosystem. The third part outlines the article’s theoretical framework and research hypotheses. The fourth part details the research design, including model construction, variable measurement and specification, data sources, and descriptive statistics. The fifth part conducts empirical testing to validate and analyze the theoretical hypotheses. The sixth part discusses findings within a global context. Finally, the article presents research conclusions and policy recommendations, while also identifying limitations and future directions.

Literature Review

Research on the Data Element Agglomeration

With the rapid advancement of digital technology, the importance of data as a key factor of production has become increasingly prominent. Research on data as a factor and its value creation has emerged as a focal point for scholars both domestically and internationally. Relevant research primarily covers four aspects: the connotation of data element, the formation mechanism of AGG, influencing factors, and post-effect studies. In exploring the connotation of data element, Thomas (2014) was among the first to argue that data itself, akin to capital and labor, constitutes a factor of production. Its “techno-economic” characteristics: including low cost, immediacy, pervasiveness, mass availability, and self-organization (Cai & Ma, 2021; X. Xu et al., 2023). These attributes collectively raise corporate productivity. Regarding the formation mechanism of AGG, research indicates that AGG is driven by technological foundations such as digital infrastructure, platform scale, and algorithmic capabilities (Talukder et al., 2018), while also benefiting from government policy guidance, data openness strategies, and market demand (Han et al., 2024), exhibiting characteristics of synergistic agglomeration across physical and virtual spaces. Existing studies have identified multiple factors that significantly influence the AGG, including the level of digital infrastructure, industrial structure and scale, the concentration of high-tech talent, and data property rights and transaction systems (Gao et al., 2025; Y. Sun et al., 2023). In terms of post-effect studies, scholars have primarily analyzed from the perspectives of corporate decision-making and production, highlighting that deepening the extraction of data value plays a positive role in enhancing the quality and efficiency of manufacturing. Particularly in the era of Industry 4.0, data and information have become especially crucial. The collective intelligence perception and crowdsourcing effects triggered by their agglomeration will bring new advantages and challenges to enterprises (Pilloni, 2018). On one hand, the information embedded in data can optimize decision-making processes, accelerate resource circulation, and stimulate output growth (Ma et al., 2020). It also significantly drives the expansion of commerce and distribution (Meng et al., 2023) and enhances logistics efficiency (Lv et al., 2025). On the other hand, some scholars hold differing views (Aghion et al., 2023), arguing that data technology does not necessarily enhance data accuracy. Instead, it may dampen the impetus for innovation, slow economic growth, and pose societal challenges.

Research on the Innovation Quality

Innovation quantity typically refers to the number and frequency of innovative outputs such as patents and academic papers, emphasizing the scale, and output efficiency of innovation. INNO, however, places greater emphasis on the technological sophistication, economic benefits, and sustainable impact of innovation. It manifests in the breakthrough nature of outcomes, their transformative potential, and their contribution to regional competitive advantages, serving as a deeper reflection of innovation capacity (Cai & Yu, 2017). Existing studies on INNO primarily revolve around three dimensions: connotation definition, measurement methods, and influencing factors. Regarding conceptual definition, Haner (2002) first framed AGG as a core input that permeates the entire innovation process, defining it through product or service attributes, production processes, and firm management. Some scholars contend that INNO encompasses not only technological value but also the commercial value generated (Teemu & Tommi, 2014). The framework has gradually expanded from product, operational, and process dimensions to encompass services, processes, and culture (Cai & Yu, 2017). Regarding measurement methods since the number of patent applications and authorizations directly reflects the results of technological innovation, academics mostly use patent indicators as proxy variables for INNO. Common metrics include the number of applications or grants (Krammer, 2009), patent grant ratios and duration of payment periods (G. T. Zhang et al., 2011), the proportion of invention patents in total applications (Cai & Yu, 2017), and novel quality indicators such as knowledge breadth and patent citation frequency based on IPC classifications (T. C. Li et al., 2023). Regarding influencing factors, the existing literature suggests that knowledge reorganization (Yu & Yu, 2024), the company size (Deng et al., 2025), and the selection of supportive policies (Ma & Xiang, 2025) respectively exert significant influences on INNO from the micro, meso, and macro perspectives.

Research on the Innovation Effects of Data Element

Regarding the innovation effects of data element, existing studies primarily focus on how they empower innovation activities as a new type of production factor. Through deep integration with labor, capital, and knowledge, data not only optimizes factor allocation efficiency but also significantly enhances corporate innovation performance (Maryam & Goran, 2019; Z. G. Zhang et al., 2025), driving disruptive technological innovations and fundamental transformations in business models, thereby providing sustained momentum for corporate transformation and upgrading (X. Wang et al., 2022). Mechanism studies indicate that data element, through synergistic interactions with traditional factors on both the supply and demand sides, effectively reduce information asymmetry and trial-and-error costs in the innovation process, forming a crucial foundation for enterprises to achieve high-quality innovation (Q. L. Liu et al., 2022). However, the innovative efficacy of data element is constrained by multiple external conditions and may even generate innovation compensation effects due to inadequate alignment in certain contexts. For instance, its effective integration with human capital directly impacts the establishment of innovation incentive mechanisms and the realization of innovation benefits (Tao & Ding, 2022). Data element lacking support from high-quality talent struggles to fully unleash their innovative value-added potential. Furthermore, enterprise scale significantly influences innovation strategy selection. Large enterprises often leverage abundant data resources to advance incremental and iterative innovation. However, when confronting highly uncertain disruptive innovation, their organizational inertia and limitations in intrinsic incentive structures may result in relatively insufficient motivation (Forés & Camisón, 2016).

Research on Regional Innovation Ecosystem

As a vital component of innovation ecosystem, regional innovation ecosystem has garnered significant attention within academic circles. Relevant research primarily focuses on three aspects: conceptual origins, fundamental characteristics, and constituent elements. Regarding conceptual origins, academic studies on ecosystems initially stemmed from business ecosystem (Moore, 1993). In recent years, the ecosystem concept has garnered extensive attention from scholars in economic management fields, leading to the subsequent introduction of concepts such as innovation ecosystem (Adner, 2006) and platform ecosystem (Ceccagnoli et al., 2012). Among these, regional innovation ecosystem serves as the foundation of innovation ecosystem and is crucial for understanding and analyzing regional innovation activities (Cooke et al., 1997). Drawing upon innovation systems theory, a regional innovation ecosystem can be defined as a self-organizing system characterized by symbiotic competition and dynamic evolution within a specific geographical space. It consists of diverse innovation species, populations, and communities that engage in value co-creation through material, knowledge, and information exchanges with their innovation environment, all based on shared value propositions (Granstrand & Holgersson, 2020). Regarding fundamental characteristics, regional innovation ecosystem exhibits traits similar to natural ecosystem. Simultaneously, within complex and open environments, shifts in innovation subjects and contexts give rise to new features such as network effects, symbiotic collaboration, local embeddedness, and adaptability (Broekel, 2012; Carbonara, 2018; Russell & Smorodinskaya, 2018). Regarding constituent elements, current approaches to classifying the foundational components of regional innovation ecosystem primarily follow either a “binary” or “ternary” approach. The binary approach posits that innovation ecosystem comprises innovation subjects and the innovation environment, while the ternary approach further separates innovation resources from the broader innovation environment, emphasizing the critical role of innovation resources within regional innovation ecosystem (Doloreux, 2002; G. N. Xu et al., 2018). The synergy among these elements facilitates technological, economic, and social development (Rong et al., 2020). Some scholars have employed the “tripartite approach” to demonstrate that regional innovation ecosystem plays a vital role in driving innovation performance (Nan & Niu, 2024), influencing green innovation (Guo et al., 2024), and promoting carbon emission reduction (H. Sun et al., 2024).

In light of the above studies, there is still room for improvement in the research on data element and innovation as follows: First, existing studies have largely focused on the quantitative impact of data element on innovation output, while relatively neglecting their deeper influence on INNO. In particular, there is a lack of systematic explanations from the perspective of element agglomeration regarding the mechanism by which AGG empowers improvements in INNO. Second, research focusing on AGG and INNO has failed to closely integrate with the evolving characteristics of regional innovation ecosystem in the digital era, thereby struggling to reveal their key mediating role in this relationship. Third, taking into account the digital economy’s network features, the complex non-linear relationship between AGG and INNO needs to be explored urgently.

Building upon the aforementioned analysis, this article integrates AGG, regional innovation ecosystem, and INNO within a unified research framework grounded in data empowerment theory. The investigation primarily focuses on examining the quality-enhancing effects of AGG on INNO, while simultaneously addressing the mediating and threshold roles played by regional innovation ecosystem in this influence process.

Mechanistic Analyses and Research Hypotheses

Direct Transmission Mechanisms and Research Hypotheses

Data element is the micro-foundation of the digital economy, and AGG pulls the flow of traditional factors such as capital and labor, and builds a bridge connecting the virtual digital space and the real physical space. Unlike traditional factors, data exhibit non-competitiveness, non-homogeneity, externality, and other significant economic characteristics (M. Zhang et al., 2024), and will have a direct impact on INNO from the perspective of both the innovation process and the innovation output (Li et al., 2023), to achieve the climb of INNO.

On the one hand, in terms of innovation process, based on the data empowerment theory, AGG enhances INNO in multiple dimensions through sustained resource investment and efficient resource utilization. Specifically, the sharing and opening of data resources enable enterprises to leverage big data for precise forecasting of talent needs, thereby achieving flexible allocation of labor resources. At the same time, AGG helps enterprises make full use of the multi-level capital market’s diversified financing channels, eases the enterprise financing constraints, increases enterprise R&D investment, and then improves the enterprise’s fund-raising ability, optimizes the elastic allocation of capital (C. M. Liu et al., 2023), transforms the human capital advantage into the region’s high-quality innovation advantage, and exerts the sustained driving effect of the accumulation of human capital in improving INNO (Pang et al., 2023). AGG can mitigate the negative impact of “data depreciation” and continue to promote high-quality innovation through timely data iteration (Maryam & Goran, 2019), improved data quality, enhanced data sharing, and the formation of scale effects and abundant resources. In addition, non-competitive digital and intelligent transformation based on data element accelerates the conversion of innovation momentum. This helps expand the depth and breadth of innovation, enhances corporate innovation competitiveness and the utilization efficiency of innovation resources, and provides solid support for enterprises to achieve both qualitative improvement and quantitative growth in innovation.

On the other hand, in terms of innovation output, AGG can achieve a high degree of matching of supply and demand as well as key technological breakthroughs, thus enhancing INNO. Based on disruptive innovation theory, AGG forms a powerful network effect, and the leading user demand insights, user demand disassembly, technology simulation experiments, internal knowledge sharing platforms, open innovation platforms, etc., spawned by data element can precisely locate customer demand, guide the direction of enterprise innovation, achieve a high degree of supply and demand matching, and further enhance INNO. In addition, “data,” as a novel kind of crucial production factor, can break the situation of independent data storage and sharing barriers of business systems within the organization, strengthen the linkage of the cooperation network of innovation subjects within the region, and further encourage the sharing and consistent use of knowledge stock and data resources within the region to promote key core technologies to make breakthroughs, which actively promotes the improvement of INNO of enterprises (Yuan et al., 2020).

In summary, this article proposes Hypothesis 1: AGG can directly promote the improvement of INNO.

Indirect Transmission Mechanisms and Research Hypotheses

Innovation itself is an activity that combines high investment, high risk, and high cost with high benefit, high growth, and long-term characteristics. Therefore, efficient resource allocation and a favorable innovation environment are crucial safeguards for enhancing innovation efficiency and quality. Based on innovation ecosystem theory and considering the substitutability among elements within regional innovation ecosystem and its significance in influencing INNO (Hu & Hou, 2023), this article adopts a “three-part framework” for innovation ecosystem. Within this framework, innovation subjects refer to core organizations engaged in knowledge creation and technology application, serving as the direct executors of innovation activities; innovation resources refer to the aggregate of various inputs deployed in innovation activities, providing foundational support; innovation environment encompasses the totality of external factors influencing the innovation process, offering fundamental safeguards. This article examines the mediating roles of innovation subjects, innovation resources, and innovation environment in the pathway through which AGG impacts INNO, analyzing these aspects respectively from the perspectives of TIS, FUND, and MAR.

The Mediating Role of TIS in the Pathway of AGG Affecting INNO

Within regional innovation ecosystem, digital enterprises serve as pivotal TIS. They not only function as vital engines for stimulating technological demand and driving innovation (S. L. Zhao et al., 2015), but also play a central role in enhancing INNO. Based on innovation network theory, AGG significantly promotes data openness, trading, and sharing by optimizing data organization and circulation mechanisms. This attracts digital enterprises to form spatial and business clusters. These clusters accelerate digital-technology development and revitalize technological innovation, establishing the preconditions for sustained gains in INNO. Furthermore, digital enterprises implement vertical and horizontal network integration strategies to strengthen connections and collaboration among TIS. This facilitates the efficient transfer and diffusion of innovation outcomes, knowledge, and technologies within the system, promoting the commercialization of scientific and technological achievements (Clarysse et al., 2014). Through complex nonlinear interactions, this process ultimately achieves a significant leap in INNO.

The Mediating Role of FUND in the Pathway of AGG Affecting INNO

Both the resource-based theory and the knowledge-based perspective emphasize that the scarce resources and knowledge capabilities owned and allocated by an organization constitute the fundamental source for building and sustaining its long-term competitive advantage, as well as the key driver for achieving high-level innovation performance in a region. According to the new economic growth theory, human capital plays an irreplaceable role in promoting regional innovation (Zhang & Guo, 2025). Within this theoretical framework, AGG, through in-depth analysis and application of big data, significantly enhances the efficiency of matching R&D resources with funding, enabling enterprises to more accurately identify innovation opportunities and optimize decision-making processes. This enables enterprises to more accurately identify innovation opportunities and optimize decision-making processes. This mechanism strengthens innovation subjects’ capacity to allocate and utilize data element while improving risk management capabilities in responding to uncertainties. Consequently, it fully unleashes the benefits of data through innovation incentive effects (Tao & Ding, 2022). On the other hand, adequate FUND support facilitates the establishment of systematic, high-level R&D systems and platforms. This drives the in-depth advancement of data-centric R&D activities and accelerates the innovation iteration process, raising both the probability of success and the quality of outputs while ensuring that data element is efficiently converted into concrete innovative achievements.

The Mediating Role of MAR in the Pathway of AGG Affecting INNO

Based on collaborative innovation theory, the abundance of innovation factors within a system and their significant synergistic effects constitute the key mechanism for enhancing regional innovation capabilities and performance. As a vital external driver of regional innovation ecosystem, the openness and demand structure of MAR influence innovation efficiency and quality. According to demand-driven innovation theory, increased market openness facilitates the efficient flow and optimal allocation of key innovation factors such as talent, technology, and data. This significantly reduces the economic and time costs associated with factor acquisition, thereby laying the foundation for enhancing INNO (Wu, 2020). Simultaneously, a robust MAR facilitates the identification and response to consumer value orientations and practical needs. This not only provides stable guidance for innovation direction but also enhances the practicality and market success rate of innovation outcomes through demand alignment. Consequently, it drives the effective realization of innovation value, ultimately fostering continuous improvement in INNO.

In summary, this article proposes Hypothesis 2: TIS, FUND, and MAR in the regional innovation ecosystem all have a mediating effect on AGG affecting INNO.

Non-Linear Transmission Mechanisms and Research Hypotheses

Based on the characteristics of network externality and Metcalfe’s Law, AGG may have a non-linear effect on INNO (T. Zhao et al., 2020). (1) TIS. When TIS is of high quality and high quantity, the number and diversity of TIS can promote the cooperation between different functional subjects and meet the needs of innovation activities, which can significantly increase the innovation rate and success rate of the regional innovation ecosystem, and can play a “supportive” role for AGG to promote INNO (Maryam & Goran, 2019). When the TIS is “small and single,” the internal operation efficiency of the it is reduced, and the ability to obtain external information, technology, and resources is weakened, resulting in the mismatch between supply and demand of factors and product markets, which is not conducive to the enhancement of regional INNO. (2) FUND. Innovation resources are the determinant of regional innovation capacity (M. L. Zhang et al., 2020), when the capital investment in innovation activities is sufficient, it can maintain the continuous innovation R&D investment of the innovation subjects in the region, stimulate the independent innovation and breakthrough innovation, and at the same time, help the transformation of the innovation results and improve the systematic innovation output, which will improve INNO to a greater extent. When capital investment in innovation activities is severely constrained, it crowds out innovation activities to a certain extent, leading to weak innovation and an inability to fully release the positive impact of AGG on INNO. (3) MAR. The stronger the openness of the market, the better resource integration efficiency, the more favorable conditions for the growth of cooperative innovation activities among diverse innovation subjects, and the more favorable the innovation environment’s adaptive cooperation with innovation subjects (Li & Zhang, 2018), thus providing convenient conditions for innovation activities and incentivizing the continuation of innovation activities. At the same time, it helps to promote the circulation and sharing of data, break the “information silo,” maximize the value of data, and further promote INNO. Closed MAR will lead to the blockage of data element circulation, reduce its gathering efficiency and scale, and make it difficult to offer each data element its maximum potential value, thus inhibiting the improvement of INNO. Furthermore, it is easy to form a data monopoly, interfering with the effective integration and utilization of data element, hindering the exchange and sharing of knowledge and technology, and making it difficult to form a good innovation ecosystem. This will have a negative impact on INNO.

In summary, this article proposes Hypothesis 3: AGG exerts a positive non-linear effect on INNO, and this positive effect is more significant when the quality and quantity of TIS are high, the FUND is sufficient, and the MAR is open. The specific theoretical framework diagram is shown in Figure 1.

Figure 1.

Theoretical framework diagram of AGG affecting INNO.

Model Construction and Variables Measurement

Model Construction

Baseline Model

To evaluate the three hypotheses put forward in this article, this article respectively examines the direct, indirect, and non-linear effects of AGG on the INNO using the fixed effects model, the mediating effects model, and the threshold regression model, and constructs the following benchmark regression model:

INN O_{it} = α_{0} + α_{1} AG G_{it} + α_{n} X_{it} + λ_{i} + ε_{it}

(1)

where, $INN O_{it}$ represents INNO of province i in year t, $AG G_{it}$ is the core variable of this article, which represents AGG of province i in year t, and $X_{it}$ represents the control variables in the model, which includes the technology market turnover ( $tec h_{it}$ ), the industrial enterprises size ( $siz e_{it}$ ), the urbanization level ( $ur b_{it}$ ), the educational development level ( $ed u_{it}$ ), the transport infrastructure level ( $traffi c_{it}$ ), and the industrial development level ( $indu s_{it}$ ). Furthermore, $α_{0}$ symbolizes the intercept term in Equation 1, $λ_{i}$ represents the particular fixed effect that cannot be observed, and $ε_{it}$ is the random disturbance term.

Mediating Effect Model

Further, this article introduces three types of mediating variables, namely, technological innovation subjects ( $TI S_{it}$ ), R&D funds ( ${FUND}_{it}$ ) and market environment ( $MA R_{it}$ ), and uses the mediating effect model to explore the indirect influence mechanism of AGG on INNO, and establishes the model as follows:

mediatio n_{it} = β_{0} + β_{1} AG G_{it} + β_{n} X_{it} + λ_{i} + ε_{it}

(2)

INN O_{it} = ω_{0} + ω_{1} AG G_{it} + ω_{2} mediatio n_{it} + ω_{n} X_{it} + λ_{i} + ε_{it}

(3)

Among them, $β_{1}$ , $ω_{1}$ , and $ω_{2}$ are the coefficients to be estimated, $mediatio n_{it}$ represents the mediating variables, including three types of mediating variables, namely, the technological innovation subjects ( $TI S_{it}$ ), R&D funds ( ${FUND}_{it}$ ), and the market environment ( $MA R_{it}$ ), and the rest of the variables are the same as above.

Threshold Model

Taking into account INNO’s pre-dependence and dynamic features, this article draws on Hou et al. (2018), introduces the lagged term variable of INNO, and constructs a dynamic threshold regression model using the systematic GMM estimation method with technological innovation subjects ( $TI S_{it}$ ), R&D funds ( ${FUND}_{it}$ ), and the market environment (MAR _it ) as the threshold variables, respectively. This can make up for the shortcomings of the traditional static threshold model, to solve the model estimation bias caused by the endogeneity of variables. The threshold model is constructed as follows:

INN O_{it} = θ_{0} INN O_{it - 1} + θ_{1} AG G_{it} \cdot I (threshol d_{it} \leq γ) + θ_{2} AG G_{it} \cdot I (threshol d_{it} > γ) + θ_{n} X_{it} + λ_{i} + ε_{it}

(4)

$θ_{0}$ , $θ_{1}$ , and $θ_{2}$ are the coefficients to be evaluated, $threshol d_{it}$ are the threshold variables, including technological innovation subjects, R&D funds, and the market environment. γ is the threshold value, $I (•)$ is the indicator function, which takes the value of 1 when the corresponding condition is established, otherwise it is 0, and the rest of the variables are the same as above. The specific research model is shown in Figure 2.

Figure 2.

Research model.

Variables Measurement and Description

Explained Variable

INNO. Currently, domestic research has not established a unified standard for measuring INNO, with primary focus on indicators such as patent knowledge breadth, patent grant rate, and the proportion of invention patents. Notably, the forward citation counts of a patent, defined as the number of times it is cited by subsequent patents, is a widely accepted indicator of its technological impact and economic value (Mann, 2018). This indicator has gained widespread academic recognition globally due to its relatively accurate reflection of INNO. However, due to measurement complexities, its application remains limited among domestic scholars. This article adopts the evaluation framework proposed by N. Xu et al. (2025), utilizing patent citation counts as the core proxy for patent quality. Advanced data mining techniques were employed to systematically retrieve China National Intellectual Property Administration databases and China Stock Market and Accounting Research Database, with listed companies’ parent firms, subsidiaries, and joint ventures as key search terms. Comprehensive annual searches were conducted to ensure temporal granularity. Subsequently, based on the four core indicators of securities code, patent quantity, province of origin, and year, the data was systematically and meticulously organized and analyzed. Eventually, the citation frequency data of patents in each province was accurately obtained.

Core Explanatory Variable

AGG. To operationalize the concept of AGG, this article follows the methodology proposed by Chao et al. (2020), employing four key indicators: Internet broadband penetration rate, enterprise website ownership, number of e-commerce transaction enterprises, and mobile phone penetration rate. The evaluation utilizes an accelerated genetic algorithm-optimized projection pursuit method (RAGA-PP) to assess AGG. The RAGA-PP algorithm optimizes the projection direction vector $a (j)_{t}$ , effectively reducing high-dimensional data into low-dimensional space while preserving structural features and critical information from the original dataset. Through global optimization of the projection direction, the numerical values of the optimal direction represent weights. When the projection index function $Q (a)$ reaches its maximum value, the one-dimensional optimal projection value $z (i)_{t}$ for AGG is obtained. The mathematical formulation is presented below.

(1) Standardized sample indicators.

Regarding positive indicators:

X^{+} (i, j)_{t} = \frac{x {(i, j)}_{t} - x_{\min} {(j)}_{t}}{x_{\max} {(j)}_{t} - x_{\min} {(j)}_{t}}

(5)

Regarding negative indicators:

X^{-} (i, j)_{t} = \frac{x_{\max} {(j)}_{t} - x {(i, j)}_{t}}{x_{\max} {(j)}_{t} - x_{\min} {(j)}_{t}}

(6)

In the equation, $x (i, j)_{t}$ denotes the value of the j-th indicator for region i in year t. $x_{\max} (j)_{t}$ and $x_{\min} (j)_{t}$ represent the maximum and minimum values of the j-th variable across 30 provinces in year t, respectively. $X^{+} (i, j)_{t}$ and $X^{-} (i, j)_{t}$ correspond to the dimensionless data after positive and negative variable standardization.

(2) Establish the projection index function $Q (a)$ .

z {(i)}_{t} = \sum_{j = 1}^{P} a {(j)}_{t} x {(i, j)}_{t}

(7)

Q (a) = S_{z} D_{z}

(8)

S_{z} = \sqrt{\frac{\sum_{i = 1}^{n} {(z {(i)}_{t} - E_{z})}^{2}}{n - 1}}

(9)

D_{z} = \sum_{i = 1}^{n} \sum_{j = 1}^{n} (R - r (i, j)) \times u (R - r (i, j))

(10)

where, $z {(i)}_{t}$ represents the projected value of the AGG index, $a (j)_{t}$ represents the unit projection direction of the j-th index, $Q (a)$ represents the projection index function, $E_{z}$ represents the average value of $z {(i)}_{t}$ , $S_{z}$ represents the standard deviation of $z {(i)}_{t}$ , $D_{z}$ represents the local density of $z {(i)}_{t}$ , and R represents the window radius of the local density, $r (i, j)$ represents the distance between samples, $r (i, j) = | z (i) - z (j) |$ . $u (t)$ represents a unit step function, and its value is 1 when $t \geq 0$ ; The function is 0 for $t < 0$ .

(3) Optimize the projection index function.

max Q (a) = S_{z} D_{z}

(11)

s . t . \sum_{j = 1}^{4} a^{2} (j)_{t} = 1

(12)

where, $max Q (a)$ represents the maximization of the objective function.

(4) By substituting the optimal projection direction $a^{*} {(i)}_{t}$ obtained from step (3) into Equation 7, the projection value $z^{*} {(i)}_{t}$ for each sample can be derived, which represents the index value of AGG.

Mediating and Threshold Variables

(1) TIS: TIS represented by enterprises is the subject of risk-taking, value creation, and distribution (Jiao et al., 2016). This article adopts the number of enterprises in the high-tech industry to take the natural logarithm to measure TIS (Hu & Hou, 2023).

(2) FUND: Taking into account the differences in the scale of economic development across regions, this article draws on the methodology of Zhou and Shen (2018), which uses the natural logarithm of the internal expenditure of R&D funds as the indicator for measuring FUND.

(3) MAR: This article adopts the comprehensive indicators of MAR in the China Regional Innovation Capability Evaluation Report, which measures the MAR of the region in terms of market openness, scientific and technological services, and consumption of the resident level (He et al., 2012).

Control Variables

The improvement of INNO will be affected by many internal and external factors in addition to AGG. This article includes several control variables to guarantee the robustness of the model results, including the urbanization level (urb), the industrial enterprises size (size), the educational development level (edu), the technology market turnover (tech), the industrial development level (indus) and the transport infrastructure level (traffic). (1) urb: This article uses the proportion of people living in cities to all people as a means of measurement. (2) size: An important aspect influencing the INNO is the scale of industrial firms, which is closely related to the efficiency of enterprises in carrying out innovation activities, and the ratio of industrial value added to the total number of businesses in the area is used in this article (T. C. Li et al., 2023). (3) edu: This article measures it by using the ratio of educational expenditure to general fiscal budget expenditure. (4) tech: This article uses the technology market turnover to measure. (5) indus: This article uses the value added of industrial enterprises to measure. (6) traffic: The ratio of total road mileage to total population is used in this article.

Data Sources and Descriptive Statistics

This article selected 30 provincial-level regions in mainland China as samples from 2012 to 2022. Given the significant data deficiencies in Hong Kong, Macao, Taiwan, and Tibet, these regions were excluded from the analysis. Additionally, logarithmic transformation was applied to relevant variables to effectively mitigate heteroskedasticity and avoid multicollinearity issues. The specific data sources for these variables are presented in Table 1, while descriptive statistics results are shown in Table 2, and the correlation coefficient matrix is illustrated in Figure 3. Correlation tests revealed that the selected explanatory variables (AGG), control variables, and explained variable (INNO) all exhibited significant correlations at the 1% significance level, preliminarily validating the theoretical relationships among them. To ensure the reliability of regression results, variance inflation factor (VIF) tests were further conducted for multicollinearity assessment, which demonstrated that all explanatory variables had VIF values below 5, indicating no severe multicollinearity issues in the model and confirming its reasonable specification.

Table 1.

Sources of Variables Data.

Type	Variable	Measurement	Source
Explanatory variable	AGG	Calculated by the RAGA-PP, including Internet broadband access, enterprise website ownership, number of e-commerce trading enterprises, mobile phone penetration rate	China statistical yearbook
Explained variable	INNO	The citation count of a patent by province is taken as a natural logarithm	China national intellectual property administration and CSMAR
Mechanism variables	TIS	Natural logarithm of the number of enterprises in high-tech industries	China high-tech industry statistical yearbook
	FUND	The natural logarithm of the internal expenditure of R&D funds	China high-tech industry statistical yearbook
	MAR	Market openness, scientific and technological services, and consumption of the resident level	China regional innovation capability evaluation report
Control variables	urb	Number of urban population/total population	China statistical yearbook
	size	Value added of industry/number of industrial enterprises	China statistical yearbook
	edu	Educational expenditure/Fiscal budget expenditure	China statistical yearbook
	tech	Technology market turnover	China statistical yearbook
	indus	Value added by industry	China statistical yearbook
	traffic	Road mileage/total population	China statistical yearbook

Table 2.

Results of Descriptive Statistics of Relevant Variables.

Variable	Obs	Mean	SD	Min	Median	Max	VIF
INNO	330	7.5354	1.6940	2.3625	7.3806	11.8538
AGG	330	.3713	.3099	.0467	.2710	1.7634	2.7800
urb	330	.6017	.1175	.3641	.5838	.8960	4.0700
size	330	1.0202	.3958	.4107	.9256	3.9113	1.5400
edu	330	.1635	.0269	.0989	.1653	.2221	2.2000
tech	330	2.6504	1.1362	2.1567	2.6644	2.9009	2.4400
indus	330	2.1515	.1170	1.1821	2.1851	2.3787	2.8800
traffic	330	3.5039	.6419	1.6410	3.5926	4.9871	2.9200

Figure 3.

Correlation coefficient matrix.

Empirical Results and Analyses

Baseline Regression Analysis

Following the Hausman test, this article concludes that the fixed-effects model is more appropriate. Table 3 presents the baseline regression results of AGG’s impact on INNO, where column (1) provides nationwide estimates for the full sample, while columns (2) to (4) focus on China’s eastern, central, and western regions, presenting region-specific model estimates. In column (1), the coefficient of AGG’s impact on INNO is 2.4075, significant at the 1% level, indicating a significantly positive effect of AGG on improving INNO overall. These results confirm Hypothesis 1: AGG can enhance INNO, consistent with C. M. Liu et al.’s (2023) findings on AGG’s impact on technological innovation. Regarding control variables, the results show that urb has a significantly negative effect on INNO. Rapid urbanization may lead to imbalances in factor allocation, with labor and capital excessively concentrated in non-productive sectors, crowding out R&D investment and weakening effective agglomeration of innovative factors. Additionally, negative externalities such as traffic congestion and environmental pollution caused by urban expansion may offset the positive effects of knowledge spillovers, creating “crowding costs” in the innovation environment, thereby exerting a negative effect on innovation quality, a result that aligns with Shi and Xu (2025). The edu also negatively affects INNO at the 1% significance level. When the education system experiences an adaptation gap with regional innovation needs, it creates a “quality-efficiency separation” phenomenon in human capital accumulation, making the education system less responsive to market demand-driven breakthrough innovations, thereby negatively impacting INNO—a finding that corroborates S. W. Wang et al.’s (2025) conclusions. The indus is similarly detrimental to INNO, consistent with Han et al.’s (2025) empirical results. This phenomenon may stem from enterprises’ reliance on extensive production expansion for industrial value-added, prioritizing scale expansion over technological innovation and quality improvement—a critical challenge China currently faces in innovation quality (T. C. Li et al., 2023).

Table 3.

Baseline Regression Results.

Variable	(1)	(2)	(3)	(4)
Variable	Nationwide	Eastern region	Central region	Western region
AGG	2.4075*** (.6420)	1.5046* (.7652)	7.4387*** (2.0990)	2.8477** (1.4873)
urb	−7.4141***(2.5827)	−9.9625** (4.7725)	−9.7006 (6.8926)	−.6089 (3.6174)
size	−.3608 (.2479)	.0621 (.1735)	.6978 (1.0395)	−1.2927*** (.3810)
edu	−17.8745*** (4.0186)	−17.9970*** (4.8457)	−26.4933*** (6.9669)	−12.4832* (6.8013)
tech	−.7992 (1.1074)	1.5652 (2.2539)	−9.5430*** (3.2194)	−1.1900 (1.3213)
indus	−10.5058*** (2.5912)	−5.8814* (3.0977)	−29.8741*** (8.9512)	−11.7081** (4.6715)
traffic	.3767 (.9478)	−.0372 (1.5971)	4.3941*** (1.4456)	−2.6580* (1.5502)
_cons	38.0041*** (6.4346)	26.0684*** (8.7523)	90.7050*** (18.2618)	48.1653*** (11.6916)
F	12.6200	6.8300	4.8700	15.9300
R ²	.8468	.8725	.7463	.7690
N	330	143	66	121

Note. Robust standard errors are in parentheses.

***

Significance at the .01 level.

Significance at the .05 level.

Significance at the .1 level.

The analysis results from columns (2) to (4) reveal that the impact of AGG on INNO follows a regional gradient: central region > western region > eastern region, consistent with T. C. Li et al.’s (2023) findings. This article argues that, as the frontier of reform and opening-up, the eastern region possesses first-mover advantages in data infrastructure and the digital economy. Consequently, data-factor input has already approached its saturation threshold. According to the endogenous growth theory’s technological convergence effect, excessive factor density may paradoxically lead to data redundancy and path-locking in innovation, manifesting as diminishing marginal returns in factor allocation efficiency. Moreover, innovation activities in the eastern region exhibit capital-intensive characteristics, diluting the marginal contribution of data element. For instance, while leading enterprises like Huawei and Tencent invest over 100 billion RMB annually in R&D, small and medium-sized enterprises (SME) are forced to reduce data-sharing investments due to high costs. This “capital substituting for data” model ultimately undermines innovation efficiency. In contrast, central region is undergoing a critical window for industrial upgrading, with government policies actively stimulating societal innovation vitality. Henan Province, as a national pilot for factor marketization reforms, has established comprehensive data governance through its data trading center system. The Data Element Market Cultivation Action Plan implements an “eight major initiatives” policy framework, focusing on institutional innovations in data property rights and circulation mechanisms. The six central provinces leverage national pilot cities for SME digital transformation, offering specialized subsidies for intelligent upgrading and digital transformation initiatives. These policy measures collectively form the institutional foundation of central China’s innovation advantage by releasing institutional dividends through factor marketization and lowering innovation thresholds via infrastructure upgrades. The region is currently in its golden age of data element development, where synergistic values from traditional industry upgrading and emerging business models are fully realized. Han et al.’s (2024) research empirically validates these policy-driven mechanisms that amplify central China’s innovation potential. Western region benefits from the compensation mechanism embedded in policy-guided unbalanced development strategies. Grounded in the latecomer advantage hypothesis, its innovation system achieves technological leapfrogging through compensatory AGG. However, this catch-up effect is constrained by dual limitations of human capital and institutional environments, resulting in weaker impact intensity compared to central China’s advantages in complete industrial ecosystems and market scale.

Mechanism Analysis

Table 4 presents the mediation mechanism test results of AGG’s impact on INNO. Based on the regression coefficients and significance levels of core variables and mediating variables in the model, we can conclude that under the influence of TIS, FUND, and MAR factors, AGG exerts indirect effects on INNO to a certain extent, providing robust empirical support for validating Hypothesis 2 proposed in this article. Specifically, columns (1) and (2) employ TIS as mediating variables in a stepwise regression analysis of AGG’s impact on INNO. The results show a significantly positive effect of AGG on TIS (coefficient: 1.1074), alongside a direct effect coefficient of 1.8283 on INNO. Holding other factors constant, a one-unit increase in TIS leads to a significant .3451-unit enhancement in INNO. Therefore, through the mediating role of TIS, AGG indirectly boosts INNO by .3822 units ( $1.1074 \times 0.3451$ ). Columns (3) and (4) present the model estimation results with FUND as the mediating variable, revealing significant mediation effects in the process of AGG influencing INNO. Specifically, a 1% change in the AGG index drives a .6975% increase in FUND, thereby generating an indirect effect of .8990 on regional INNO through FUND mediation ( $0.8115 \times 1.1078$ ). Finally, columns (5) and (6) display the model estimation results when MAR serves as the mediating variable, where AGG indirectly improves INNO by .3723 units ( $1.0156 \times 0.3666$ ). These findings confirm that AGG can optimize MAR, thereby indirectly enhancing INNO.

Table 4.

Mechanism Test Results.

Variable	(1)	(2)	(3)	(4)	(5)	(6)
Variable	TIS	INNO	FUND	INNO	MAR	INNO
AGG	1.1074*** (.1160)	1.8283*** (.3573)	.6975*** (.0780)	1.4378*** (.3418)	1.0156*** (.0985)	1.8381*** (.3643)
urb	−.5161 (.3701)	−2.0289** (1.0096)	1.6892*** (.2489)	−4.0782*** (1.0439)	1.2920*** (.3144)	−2.6807** (1.0342)
size	−.4705*** (.0675)	−.1960 (.1971)	−.0721 (.0454)	−.2784 (.1789)	−.0171 (.0574)	−.3521* (.1840)
edu	3.7542*** (1.1890)	−9.4306*** (3.2837)	1.6021** (.7997)	−9.9097*** (3.1565)	−1.8321* (1.0102)	−7.4633** (3.2554)
tech	2.5902*** (.2471)	.6147 (.7785)	2.8922*** (.1662)	−1.6953* (.9082)	.4274** (.2100)	1.3519** (.6775)
indus	4.6869*** (.3126)	−.3816 (1.1081)	5.1757*** (.2103)	−3.7343*** (1.4002)	−.5280** (.2656)	2.1926** (.8568)
traffic	−.3572*** (.0574)	−.9427*** (.1652)	−.2644*** (.0386)	−.7730*** (.1621)	.2226*** (.0488)	−1.1475*** (.1613)
TIS		.3451** (.1516)
FUND				1.1078*** (.2186)
MAR						.3666** (.1787)
_cons	−9.8301*** (.7983)	8.5259*** (2.6334)	−4.5001*** (.5369)	10.1186*** (2.3246)	1.5104** (.6783)	4.5798** (2.1913)
R ²	.9228	.6154	.9617	.6382	.5158	.6143
N	330	330	330	330	330	330

Note. Robust standard errors are in parentheses.

***

Significance at the .01 level.

Significance at the .05 level.

Significance at the .1 level.

Table 5 presents the Boostrap test results for the mediation mechanism of AGG’s impact on INNO. The findings indicate that at the 95% confidence level, the confidence intervals for both the mediation effects and direct effects of the three mediating variables clearly exclude zero, demonstrating the statistical significance of both effects. This confirms the existence of partial mediation effects. Overall, AGG exerts indirect effects on INNO through three distinct pathways. These results suggest that AGG not only directly influences INNO, but also significantly impacts it through multiple transmission channels, thereby validating Hypothesis 2.

Table 5.

Boostrop Test.

Boostrop test	TIS	FUND	MAR
Indirect effect	.3822**	.7727***	.3724**
Z	2.2000	3.6300	2.3300
Confidence interval	[.0422, .7221]	[.3556, 1.1898]	[.0587, .6860]
Direct effect	1.8283***	1.4378***	1.8381***
Z	5.3300	3.6600	4.6200
Confidence interval	[1.1560, 2.5006]	[.6669, 2.2087]	[1.0590, 2.6172]

Note. Robust standard errors are in parentheses.

***

Significance at the .01 level.

Significance at the .05 level.

Significance at the .1 level.

Threshold Regression Analysis

Before doing the threshold regression, the existence of the model threshold effect needs to be tested. This article has undergone Bootstrap repeated sampling, as shown in Table 6. The findings indicate that the significance test of a single threshold effect has been passed by TIS, FUND, and MAR, but none of the three have passed the double threshold test, confirming Hypothesis 3 that the effect of AGG on INNO is non-linear and moderated by these three variables. Specifically, the threshold value for TIS is 4.7958, the threshold value for FUND is 14.4638, and the threshold value for MAR is 2.7991 (as shown in Table 7).

Table 6.

Threshold Effect Tests.

Threshold variable	Threshold effect	F-value	p-value	BS number	The critical value
Threshold variable	Threshold effect	F-value	p-value	BS number	1%	5%	10%
TIS	Single threshold	21.8100	.0940	500	30.6046	24.3196	21.5561
TIS	Double threshold	4.9400	.7960	500	28.6508	20.2992	28.6508
FUND	Single threshold	34.9500	.0420	500	43.3972	34.0519	29.9953
FUND	Double threshold	12.0200	.2960	500	32.8340	22.6535	18.3900
MAR	Single threshold	13.3800	.0840	500	21.2172	15.3693	12.6209
MAR	Double threshold	4.2000	.2760	500	12.0351	8.5613	6.9180

Table 7.

Thresholds and Confidence Intervals.

Threshold variable	Test	Threshold estimates	Confidence interval
TIS	Single threshold value	4.7958	[4.7748, 4.8122]
FUND	Single threshold value	14.4638	[14.3758, 14.4812]
MAR	Single threshold value	2.7991	[2.7470, 2.8003]

To more precisely evaluate the threshold value estimation and its confidence interval, this article employs the likelihood ratio statistic (LR) based on ordinary least squares to determine the threshold value. The LR statistic reaches zero when the threshold variable attains the threshold value. Figure 4 illustrates the likelihood ratio functions for different threshold variables.

Figure 4.

Likelihood ratio function plots for each threshold variable: TIS (a), FUND (b), MAR (c).

Table 8 presents the results of dynamic panel threshold regression, where columns (1), (2), and (3) demonstrate remarkable consistency in the regression outcomes. Notably, across all three models, the lagged terms of the explained variable (INNO) exhibit statistically significant positive correlations at the 1% significance level, providing robust empirical evidence for the necessity of controlling dynamic lag effects in this article. The results in Table 9 indicate that the Hansen test yields a significance level exceeding .1, suggesting the absence of over-identification issues in the model’s instrumental variables. The p-values from the autocorrelation tests of disturbance terms (AR(1) and AR(2)) further confirm the model’s reasonable specification.

Table 8.

Threshold Regression Results.

Variable	(1)	(2)	(3)
Variable	INNO	INNO	INNO
$L . INNO$	1.6048*** (.0505)	1.6275*** (.0517)	1.4009*** (.0448)
$AGG (TIS \leq 4.7958)$	.8467 (2.2667)
$AGG (TIS > 4.7958$ )	.9086** (.4084)
$AGG (FUND \leq 14.4638$ )		.4856 (.5368)
$AGG (FUND > 14.4638)$		.5991* (.3318)
$AGG (MAR \leq 2.7991$ )			.0866 (.6007)
$AGG (MAR > 2.7991$ )			.8026*** (.2687)
urb	−4.0389*** (.5848)	−4.1170*** (.4493)	−7.3301*** (.5316)
size	−.0855** (.0308)	−.0911* (.0504)	.0179 (.0568)
edu	−21.5848*** (3.1279)	−20.5652*** (3.0623)	−22.5954*** (1.9483)
tech	−3.8021*** (1.0484)	−3.2862*** (.7409)	−3.9833*** (.7036)
indus	1.7050** (.6686)	1.4702*** (.4157)	1.0525** (.4388)
traffic	−.5841*** (.1284)	−.6044*** (.0847)	−.9442*** (.0882)
_cons	13.1297*** (2.7151)	12.2107*** (1.7041)	17.6180*** (1.3825)
N	330	330	330

Note. Robust standard errors are in parentheses.

***

Significance at the .01 level.

Significance at the .05 level.

Significance at the .1 level.

Table 9.

The Results of the AR(1) and AR(2) Tests.

Variable	TIS		FUND		MAR
Variable	Z-value	p-Value	Z-value	p-value	Z-value	p-Value
AR(1)	−3.7300	.0000	−3.9200	.0000	−3.7400	.0000
AR(2)	−.2300	.8180	−.4200	.6770	.3700	.7130
Hansen test	28.3800	.9460	28.8500	.9390	28.5100	.3340

Column (1) reports regression results with TIS as the threshold variable, revealing a monotonically increasing driving effect of AGG on INNO—providing robust evidence for Hypothesis 1. Specifically, AGG exerts significantly positive impacts on INNO. When surpassing the first threshold value of 4.7958, the influence coefficient increases from .8467 to .9086, demonstrating stronger and more significant driving effects in the high-threshold region compared to the low-threshold interval. This phenomenon stems from the critical threshold effects inherent in TIS as core carriers of innovation activities. Below the threshold, AGG faces challenges in transforming into actual innovation outputs due to insufficient technological absorption capacity, rigid organizational structures, or inefficient innovation processes. Upon crossing the threshold, however, these entities develop enhanced data integration capabilities and more open innovation ecosystems, enabling data element to significantly boost INNO through pathways such as mitigating information asymmetry, optimizing R&D resource allocation, and accelerating knowledge spillover. Column (2) presents model estimates with FUND as the threshold variable. As FUND crosses the threshold of 14.4638, the positive effect of AGG on INNO becomes increasingly pronounced, with the influence coefficient rising from .4856 to .5991. This indicates that when R&D investment remains below the threshold, the potential value of data element remains unrealized due to insufficient technological capabilities. Continued R&D investment, however, strengthens the positive feedback mechanism between AGG and INNO by reducing data transaction costs and optimizing data governance structures. Column (3) displays threshold regression estimates for MAR. When MAR crosses the threshold of 2.7991, the promoting effect of AGG on INNO intensifies, with the influence coefficient increasing from .0866 to .8026. This occurs because MAR, as institutional infrastructure, determines the allocation efficiency of data element. Below the threshold, AGG is constrained by ambiguous property rights, high transaction costs, and inefficient circulation mechanisms, resulting in weaker marginal contributions to INNO. Upon crossing the threshold, sound market mechanisms guide data element toward high-value innovation domains through price signals, eliminate data monopolies, and facilitate cross-domain data integration, thereby triggering a quantum leap in INNO. Collectively, under the constraints of these three threshold variables, AGG exhibits significant nonlinear effects on innovation quality, providing robust evidence for Hypothesis 3.

Robustness Tests

To verify the robustness of the results, this article employs four methodological approaches: (1) introducing lagged explained variable, (2) reducing control variables, (3) adjusting the sample size, and (4) constructing a dummy variable (NBD) using national big data comprehensive pilot zones as the treatment group to serve as a new explanatory variable. These approaches collectively provide a comprehensive robustness check for the impact of AGG on INNO. Taking into account the time lag effect of patent effectiveness, authorization and promotion, this paper has decided to use the number of citations that occurred 1 year later as the new explained variable. The robustness test results presented in column (1) of Table 10 demonstrate that the model coefficients and significance levels remain consistent with the baseline regression outcomes, thereby validating the model’s robustness. This article further conducts regression analysis by reducing control variables, as shown in column (2) of Table 10. The resulting coefficients and significance levels exhibit minimal deviation from the core findings, providing additional evidence for the model’s robustness. To evaluate potential biases caused by outliers and reinforce the robustness of the conclusions, the article systematically adjusts the sample size. Specifically, regions exhibiting either the highest or lowest levels of AGG (approximately 1%, 5%, and 10% of the sample) are sequentially excluded, with robustness tests conducted separately for 28, 26, and 24 remaining regions. The test results demonstrate remarkable consistency between the coefficients and significance levels of explanatory variables across all specifications and the baseline outcomes, with no significant deviations observed. This finding provides robust empirical support for the conclusions (detailed results for the 28-region sample are presented in column (3) of Table 10 for conciseness). The article also constructs a dummy variable based on the national big data comprehensive pilot zones approved in 2016 (as officially announced by the Chinese government) as a new core explanatory variable. The empirical results in column (4) reveal a significantly positive impact of this variable on INNO, demonstrating that the agglomeration effects of data as a production factor can significantly enhance innovation quality, thereby further confirming the robustness of the model results.

Table 10.

Robustness Tests.

Variable	(1)	(2)	(3)	(4)
Variable	L.INNO	INNO	INNO	INNO
AGG	2.3989*** (.5338)	1.5627*** (.5269)	2.8306*** (.7147)
NBD				.2452* (.1407)
urb	.2303 (1.9589)		−9.3260*** (2.6741)	−3.6954* (2.2451)
size	−.1259 (.1415)	−.2879 (.2341)	−.3247 (.2436)	−.3753 (.2727)
edu	−8.9392** (3.5805)	−17.1817*** (4.0524)	−18.7198*** (4.0521)	−19.5259*** (3.9700)
tech	−.1806 (.8189)	−1.8485* (1.1175)	−1.0401 (1.1463)	.0900 (1.0821)
indus	−8.9615*** (1.6512)	−13.2075*** (2.6482)	−10.7676*** (2.7115)	−9.2412*** (2.7452)
traffic	.6835 (.7929)	−1.1136 (.7496)	.6370 (.9561)	.3426 (.9013)
_cons	25.7754*** (4.6044)	47.5390*** (5.8303)	39.6865*** (6.9760)	34.3108*** (6.4428)
R ²	.9173	.8409	.8406	.8384
N	300	330	308	330

Note. Robust standard errors are in parentheses.

***

Significance at the .01 level.

Significance at the .05 level.

Significance at the .1 level.

Discussions

In summary, this article examines the impact of AGG on INNO, demonstrating that these relationships are not only robust in China but also exhibit broad applicability worldwide. Future research could further explore specific practices and experiences of different countries and regions in leveraging data to drive innovation, thereby providing valuable references for global innovation and development. Through empirical analysis, this article elucidates the mechanisms through which AGG influences INNO and arrives at the following conclusions.

The Universality of the Effect of AGG on INNO

The facilitating effect of AGG on INNO has established transnational empirical support. Quantitative analyses based on Chinese contexts reveal that a one-unit increase in AGG leads to a 2.4075-unit enhancement in INNO. This conclusion not only withstands a battery of robustness checks but also demonstrates cross-regional universal applicability. Tracking studies by Stanford University on Silicon Valley enterprises confirm a significant positive correlation between spatial distribution density of corporate data-sharing platforms and patent quality indices. The European Data Space framework established under the European Data Strategy (European Commission, 2020) is accelerating the penetration of data element into innovation chains at institutional levels. This transnational consistency phenomenon suggests that data-driven innovation paradigms are fundamentally reshaping the underlying logic of global innovation systems. The operational mechanisms warrant more systematic international comparative research.

AGG Indirectly Affects INNO Through the Improvement of the Regional Innovation Ecosystem from a Global Perspective

AGG enhances regional INNO through multidimensional transmission mechanisms: it not only directly promotes innovation output but also optimizes regional innovation ecosystem via indirect pathways including expansion of TIS, intensification of FUND, and MAR liberalization. Germany’s case exemplifies this approach. Since 2015, its Data Sharing Platform Initiative has established a government-enterprise-research institution collaborative innovation network. By developing application scenarios for advanced technologies like blockchain and artificial intelligence, the initiative has significantly improved technology transfer efficiency among innovation entities. Japan’s approach through the Automotive Data Consortium has created a sharing mechanism for R&D resources and technological achievements among automakers. This model not only reduces resource waste from redundant R&D but also enhances patent output per 100 million yuan of R&D investment through data-driven precision R&D. At the European Union level, the Financial Services Action Plan has provided institutional safeguards for corporate innovation activities by establishing unified financial market regulations across member states. Heller’s (2024) research indicates this policy increased corporate patent application likelihood by 25%, demonstrating the leveraging effect of MAR openness on INNO.

Regional Heterogeneity and Threshold Effects of AGG on INNO

Research demonstrates that the driving effect of AGG on INNO exhibits significant threshold characteristics, with its non-linear mechanism originating from the synergistic influence of three key components in regional innovation ecosystem: TIS, FUND, and MAR. Regarding TIS, when the synergistic effect between AGG and innovation entities surpasses critical thresholds, the marginal contribution to INNO shows quantum leap improvements. In terms of FUND, the agglomeration effect of data element significantly amplifies innovation output only when research funding reaches specific threshold levels. For MAR dimensions, once market openness crosses critical points, innovation entities, and environmental systems develop more efficient dynamic adaptation mechanisms.

This article further reveals that AGG plays a pivotal enabling role in enhancing INNO, strongly corroborating the findings of Maryam and Goran (2019) regarding data-enabled innovation efficiency and effectiveness in U.S. enterprises. AGG accelerates knowledge and technology spillovers, enhances inter-agent collaboration efficiency, restructures R&D expenditure toward higher-yield projects, and invigorates market-level innovation, thereby raising INNO. This phenomenon not only highlights data’s crucial role in innovation processes but also emphasizes its key function in counteracting negative “data depreciation” effects, providing novel theoretical explanations for sustaining high-quality innovation.

Conclusions and Recommendations

Conclusions

This article examines how AGG drives innovation development, utilizing Chinese provincial panel data from 2012 to 2022 to analyze the specific effects of AGG on INNO through three dimensions: baseline regression results, transmission mechanisms, and nonlinear impacts. To comprehensively analyze the intricate relationship network between AGG and INNO, we sequentially employed fixed-effects models, mediating effect models, and dynamic threshold regression models. These analyses encompassed both nationwide investigations and more focused examinations of eastern, central, and western regions down to provincial levels, aiming for more detailed and comprehensive understanding. The main findings can be summarized as follows: First, AGG demonstrates significant positive effects on INNO improvement, exhibiting a regional distribution characteristic of “central > western > eastern” regions. This suggests AGG has become a crucial factor in overcoming China’s innovation development bottlenecks. This conclusion remains valid after robustness tests including lagged explained variable, reducing control variables, adjusting sample size and constructing a dummy variable using national big data comprehensive pilot zones as the treatment group to serve as a new explanatory variable. Second, this article constructs a theoretical model to explain how AGG affects INNO through regional innovation ecosystem. Empirical analysis reveals that fundamental elements of regional innovation ecosystem (TIS, FUND, and MAR) exert varying degrees of indirect influence on the relationship between AGG and INNO. Specifically, they enhance INNO through three pathways: increasing TIS, boosting FUND, and opening MAR. Finally, the impact of AGG on INNO demonstrates significant nonlinear characteristics. When TIS reach sufficient quality and quantity thresholds, AGG’s promotion effect on INNO becomes more pronounced. When FUND exceeds certain threshold levels, AGG further amplifies its positive effects on INNO. Simultaneously, as MAR become more open, the positive effects of AGG on INNO correspondingly strengthen.

Theoretical Implications

First, this article systematically investigates the multidimensional effects of AGG on INNO, expanding the theoretical boundaries of data empowerment theory while deepening the research connotation of agglomeration effects in the field of INNO. Second, existing studies on mechanisms promoting innovation quality primarily focuses on digital finance, environmental regulation, or ESG performance (Dong & Shi, 2025; S. Y. Li et al., 2022; Rao et al., 2022). However, incorporates regional innovation ecosystem into the analytical framework, thereby deepening the understanding of innovation quality impact pathways. Furthermore, by adopting China’s unique institutional context as the research setting, it provides novel explanatory dimensions to innovation ecosystem theory. Finally, leveraging China’s distinctive regional heterogeneity, this article reveals the non-linear effects of AGG in enhancing INNO, which extends Metcalfe’s Law in network economics and strengthens the applicability and reliability of AGG policies (T. Zhao et al., 2020), thereby offering more actionable managerial insights.

Practical Implications

The conclusions of this article demonstrate the significant policy implications of AGG in promoting INNO.

First, establishing data collaboration mechanisms and implementing differentiated regional strategies can systematically enhance both innovation processes and outputs through AGG. The findings confirm that AGG significantly improves INNO. Therefore, governments should adopt proactive measures to leverage this effect. Regarding innovation process optimization, cross-departmental data collaboration mechanisms should be established to break down industry data silos. This can be achieved by developing industrial data platforms and standardizing data interfaces to facilitate seamless data flow throughout the R&D chain. Strengthening data infrastructure for fundamental research is crucial, including building national scientific data centers and prioritizing open access to public research datasets (e.g., meteorological and geographical data), complemented by robust data quality evaluation systems to ensure research reliability. For innovation output enhancement, a data-driven commercialization system should be implemented. This includes establishing special funds for big data-based patent analytics, creating intelligent decision-making platforms covering technology assessment and market forecasting, and developing value distribution mechanisms that incorporate data element. Supporting policies should balance short-term incentives with long-term cultivation, such as piloting data element accounting systems, refining data-related intellectual property rights, and establishing fault-tolerant regulatory frameworks tailored to data characteristics. Finally, regional strategies should align with local industrial features. For instance, digitally advanced regions could explore cross-border data flow pilots, while traditional industrial bases may focus on digitizing production data. A dynamic policy monitoring system should track indicators like data utilization rates and element conversion efficiency to ensure sustained translation of AGG into INNO improvement.

Second, implementing systematic policy packages including multi-stakeholder collaborative innovation networks, optimized R&D funding allocation, and deepened market-oriented reforms of data element will facilitate balanced regional development. The research findings reveal that AGG exerts varying degrees of indirect effects on INNO through fundamental components of regional innovation ecosystem (TIS, FUND, and MAR). These positive effects become more pronounced when TIS achieve both quality and quantity, FUND reaches sufficient levels, and MAR maintain openness. First, regarding the cultivation of TIS, a multi-agent collaborative innovation network should be established. Policy incentives such as tax credits and R&D subsidies should encourage enterprises to increase data element inputs, with particular support for specialized and sophisticated SMEs to establish joint laboratories with universities and research institutes. Concurrently, digital talent development programs should be implemented alongside improved cross-regional talent mobility mechanisms, with special emphasis on enhancing the quality of innovation entities in central and western regions. Second, for FUND allocation, we recommend creating a diversified investment system guided by government and led by markets. The R&D expense super-deduction policy should be optimized, offering additional deduction ratios for data-intensive R&D projects. Regional innovation funds should prioritize central and western regions, while PPP models should mobilize social capital for digital infrastructure construction. Third, concerning MAR optimization, accelerating market-oriented reforms of data element is imperative. This includes establishing tiered data property rights confirmation and trading systems, while improving intellectual property protection and dispute resolution mechanisms to provide institutional safeguards for data element circulation.

Third, precisely formulating and effectively implementing region-specific, dynamically adjusted AGG strategies. The research findings reveal that the impact of AGG on INNO exhibits regional heterogeneity, characterized by a “central > western > eastern” gradient. Therefore, governments should adopt differentiated policy packages to maximize innovation benefits. For central regions, leveraging existing industrial foundations and data infrastructure advantages, policymakers should prioritize deep integration between data element and manufacturing. Encouraging leading enterprises to establish industry-wide data-sharing platforms, coupled with tax incentives and targeted subsidies to reduce data acquisition costs for SMEs, will further unleash the innovation dividends of data agglomeration. In western regions, the priority should be addressing digital infrastructure gaps. Pilot AGG zones should be established in comparatively advanced areas, supported by computing power centers and high-speed networks. Privacy-preserving computation and blockchain technologies should be introduced to resolve data security and cross-border flow challenges, preventing the formation of “data trough.” Eastern regions should transition toward high-quality AGG. Enterprises should be supported in participating in international data standard-setting, while platforms like the Yangtze River Delta and Beijing-Tianjin-Hebei should establish data element benefit-sharing mechanisms to extend innovation chains toward high-end applications. Additionally, a regionally coordinated data element trading market should be established to facilitate cross-regional data flow through initiatives like “East Data West Computing.” Performance indicators for AGG efficiency should be incorporated into local government evaluations, forming a multi-tiered policy system with central coordination, local implementation, and enterprise participation. Ultimately, this will achieve balanced enhancement of INNO through AGG.

Deficiency and Prospect

Although this article empirically analyses the impact of AGG on INNO and makes suggestions based on the actual situation, there are still some limitations, which are the focus of our team’s future research: first, the subject of this article’s research mainly focuses on the macro-provincial level, and there is a lack of research on the behavior and decision-making of enterprises in different industries at the micro-level. During the phase of digital transformation for enterprises, the importance of data element is highlighted, how to use data to empower enterprise innovation as well as to prevent the disorderly expansion of enterprise data capital behavior deserves further in-depth research. Second, this article is not systematic and comprehensive enough to study the internal mechanism of AGG affecting INNO, and more intermediary variables or moderating variables will be introduced to reveal the “black box” of the mechanism of AGG affecting INNO more comprehensively.

Footnotes

Acknowledgements

We are very grateful to editors and anonymous reviews for reviewing this article.

ORCID iDs

Yijia Yuan

Jialin Zhang

Ethical Considerations

This article does not contain any studies with human participants performed by any of the authors.

Consent to Participate

Not applicable.

Consent for Publication

Not applicable.

Author Contributions

Conceptualization: Yijia Yuan; Methodology: Yingying Zhu; Software: Yingying Zhu, Min Kang, and Jialin Zhang; Writing—original draft: Yijia Yuan and Yingying Zhu; Writing—review and editing: Yijia Yuan, Jialin Zhang, and Min Kang; Funding acquisition: Min Kang; Resources: Yijia Yuan, Yingying Zhu, and Jialin Zhang; Supervision: Min Kang.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This article was supported by the Shandong Provincial Natural Science Foundation (ZR2025QC778). Shandong Provincial Higher Education Philosophy and Social Sciences Research Program (2025ZSYB075).

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data Availability Statement

The data in this manuscript is from 2012 to 2022 and includes 11 variables, including one independent variable, one dependent variable, three mechanism variables, and six control variables. The data for all variables is analyzable. The disclosure of the materials analyzed during the current study is subject to the restrictions under an ongoing project. The corresponding author is willing to share the datasets upon any reasonable request under necessary confidentiality agreements.

References

Adner

(2006). Match your innovation strategy to your innovation ecosystem. Harvard Business Review, 84(4), Article 98. https://doi.org/10.1177/1059601104273065

Aghion

Bergeaud

Boppart

Klenow

P. J.

(2023). A theory of falling growth and rising rents. Review of Economic Studies, 90(6), 2675–2702. https://doi.org/10.1093/restud/rdad016

Broekel

(2012). Collaboration intensity and regional innovation efficiency in Germany—a conditional efficiency approach. Industry and Innovation, 19(2), 155–179. https://doi.org/10.1080/13662716.2012.650884

Cai

S. H.

L. P.

(2017). Innovation quantity, innovation quality and firm benefit. China Soft Science, 32(5), 30–37. https://doi.org/10.3969/j.issn.1002-9753.2017.05.004

Cai

Y. Z.

W. J.

(2021). How data influence high-quality development as a factor and the restriction of data flow. Journal of Quantitative & Technological Economics, 38(3), 64–83. https://doi.org/10.13653/j.cnki.jqte.2021.03.002

Calic

Ghasemaghaei

(2020). Big data for social benefits: Innovation as a mediator of the relationship between big data and corporate social performance. Journal of Business Research, 131, 391–401. https://doi.org/10.1016/j.jbusres.2020.11.003

Carbonara

(2018). Competitive success of Italian industrial districts: A network-based approach. Journal of Interdisciplinary Economics, 30(1), 78–104. https://doi.org/10.1177/0260107917700470

Ceccagnoli

Forman

Huang

D. J.

(2012). Cocreation of value in a platform ecosystem: The case of enterprise software. MIS Quarterly, 36(1), 263–290. https://doi.org/10.1016/j.ijmedinf.2011.12.001

Chao

X. J.

Xue

Z. X.

Sui

Y. M.

(2020). How the new digital infrastructure affects the upgrading of foreign trade: Evidence from Chinese cities. Economic Science, 42(3), 46–59. https://doi.org/10.12088/PKU.jjkx.2020.03.04

10.

Chinese Academy of Science. (2025). China Nanotechnology Industry White Paper (2025). Chinese Academy of Science. https://www.cas.cn/yx/202509/t20250901_5080959.shtml

11.

Clarysse

Wright

Bruneel

Mahajan

(2014). Creating value in ecosystems: Crossing the chasm between knowledge and business ecosystems. Research Policy, 43(7), 1164–1176. https://doi.org/10.1016/J.RESPOL.2014.04.014

12.

Cooke

Uranga

M. G.

Etxebarria

(1997). Regional innovation systems: Institutional and organisational dimensions. Research Policy, 26(4–5), 475–491. https://doi.org/10.1016/S0048-7333(97)00025-5

13.

Deng

Y. M.

Ding

Y. L.

S. C.

(2025). Shareholding by industry peers, firm size, and innovation quality. Finance Research Letters, 80, Article 107343. https://doi.org/10.1016/j.frl.2025.107343

14.

Doloreux

(2002). What we should know about regional systems of innovation. Technology in Society, 24(3), 243–263. https://doi.org/10.1016/S0160-791X(02)00007-6

15.

Dong

Shi

J. Y.

(2025). Environmental regulation and the widening inequality in urban green innovation: Evidence from China. Journal of Environmental Management, 374, Article 124181. https://doi.org/10.1016/j.jenvman.2025.124181

16.

European Commission. (2020). European Data Strategy. European Commission. https://digital-strategy.ec.europa.eu/en/policies/strategy-data

17.

Forés

Camisón

(2016). Does incremental and radical innovation performance depend on different types of knowledge accumulation capabilities and organizational size? Journal of Business Research, 69(2), 831–848. https://doi.org/10.1016/j.jbusres.2015.07.006

18.

Gao

D. Y.

Liu

Sun

Z. W.

(2025). Data elements and corporate innovation: A discussion of corporate innovation strategy. Finance Research Letters, 76, Article 106970. https://doi.org/10.1016/j.frl.2025.106970

19.

Granstrand

Holgersson

(2020). Innovation ecosystems: A conceptual review and a new definition. Technovation, 90, Article 102098. https://doi.org/10.1016/j.technovation.2019.102098

20.

Guo

J. H.

Guo

S. F.

T. D.

(2024). Research on the configuration path of multi-actor interaction influencing green innovation in the innovation ecosystem. Science and Technology Management, 45(2), 100–114.

21.

Han

D. R.

Liu

P. S.

(2025). Multiplication of number and quality rise: Data elements enable low-carbon innovation. Humanities and Social Sciences Communications, 12(1), Article 810. https://doi.org/10.1057/s41599-025-05194-z

22.

Han

D. R.

H. S.

(2024). The effect of data element agglomeration on green innovation vitality in China. Humanities and Social Sciences Communications, 11(1), 1–10. https://doi.org/10.1057/s41599-024-03844-2

23.

Haner

(2002). Innovation quality—a conceptual framework. International Journal of Production Economics, 80(1), 31–37. https://doi.org/10.1016/S0925-5273(02)00240-2

24.

Dan

M. Y.

Qiu

J. H.

(2012). A study of the impact of innovation network elements and synergy on technology innovation performance. Management Review, 24(8), 58–68. https://doi.org/10.14120/j.cnki.cn11-5057/f.2012.08.008

25.

Heller

(2024). Financial market integration and the effects of financing constraints on innovation. Research Policy, 53(4), Article 104988. https://doi.org/10.1016/j.respol.2024.104988

26.

Hou

Teo

T. S.

Zhou

F. L.

Lim

M. K.

Chen

(2018). Does industrial green transformation successfully facilitate a decrease in carbon intensity in China? An environmental regulation perspective. Journal of Cleaner Production, 184, 1060–1071. https://doi.org/10.1016/j.jclepro.2018.02.311

27.

N. N.

Hou

G. Y.

(2023). How does the regional innovation ecosystem drive the innovation performance of high-tech industries? The NCA and fsQCA analysis based on the cases of 30 provinces in China. Science & Technology Progress and Policy, 40(10), 100–109. https://doi.org/10.6049/kjjbydc.2022100025

28.

Jiao

Zhou

J. H.

Gao

T. S.

Liu

X. L.

(2016). The more interactions the better? The moderating effect of the interaction between local producers and users of knowledge on the relationship between R&D investment and regional innovation systems. Technological Forecasting & Social Change, 110(9), 13–20. https://doi.org/10.1016/j.techfore.2016.03.025

29.

Krammer

(2009). Drivers of national innovation in transition: Evidence from a panel of Eastern European countries. Research Policy, 38(5), 845–860. https://doi.org/10.1016/j.respol.2009.01.022

30.

S. Y.

Liu

Y. J.

(2022). Does ESG performance improve the quantity and quality of innovation? The mediating role of internal control effectiveness and analyst coverage. Sustainability, 15(1), Article 104. https://doi.org/10.3390/su15010104

31.

T. C.

Shi

Z. Y.

Han

D. R.

Zeng

J. W.

(2023). Digital economy development and provincial innovation quality: Evidence from the quality of the patents. Statistical Research, 40(9), 92–106. https://doi.org/10.19343/j.cnki.11-1302/c.2023.09.007

32.

X. D.

Zhang

X. Y.

(2018). Research of the influence mechanism of regional innovation ecosystem on regional innovation performance. Frontiers of Science and Technology of Engineering Management, 37(5), 22–28.

33.

Liao

K. Z.

C. C.

Zhang

J. X.

Wang

Z. H.

(2024). Does big data infrastructure development facilitate bank fintech innovation? Evidence from China. Finance Research Letters, 65, Article 105540. https://doi.org/10.1016/j.frl.2024.105540

34.

Liu

C. M.

Chen

Wei

X. M.

(2023). Impact of data element agglomeration on scientific and technological innovation: A quasi-natural experiment based on big data comprehensive pilot areas. Journal of Shanghai University of Finance and Economics, 25(5), 107–121. https://doi.org/10.16538/j.cnki.jsufe.2023.05.008

35.

Liu

L. Y.

Yang

Y. F.

(2025). Empowering green innovation through data: Corporate green technology innovation under data factor market development: Evidence of synergy from the “Green Finance-Green Information” double helix policy. Commercial Research, 68(2), 107–117.

36.

Liu

Q. L.

Zhang

Lei

Y. Y.

Chen

G. J.

(2022). Research on process, logic and implementation mechanism of digital enabling enterprise innovation. Studies in Science of Science, 40(1), 150–159. https://doi.org/10.16192/j.cnki.1003-2053.20210712.001

37.

J. Y.

Zhao

Z. X.

Y. S.

(2025). Can big data aggregation help businesses save energy and reduce emissions? Quasi-natural experiment in big data comprehensive test. Structural Change and Economic Dynamics, 72, 89–102. https://doi.org/10.1016/j.strueco.2024.12.003

38.

C. Y.

Xiang

Y. J.

(2025). Can institutional opening-up enhance enterprise innovation quality? International Review of Economics and Finance, 101, Article 104200. https://doi.org/10.1016/j.iref.2025.104200

39.

S. Y.

Zhang

Y. F.

J. X.

Y. T.

Yang

H. D.

(2020). Big data driven predictive production planning for energy-intensive manufacturing industries. Energy, 211, Article 118320. https://doi.org/10.1016/j.cnergy.2020.118320

40.

Mann

(2018). Creditor rights and innovation: Evidence from patent collateral. Journal of Financial Economics, 130(1), 25–47. https://doi.org/10.1016/j.jfineco.2018.07.001

41.

Maryam

Goran

(2020). Assessing the impact of big data on firm innovation performance: Big data is not always better data. Journal of Business Research, 108, 147–162. https://doi.org/10.1016/j.jbusres.2019.09.062

42.

Meng

T. T.

D. N.

L. D.

Yahya

M. H.

Zariyawati

M. A.

(2023). Impact of digital city competitiveness on total factor productivity in the commercial circulation industry: Evidence from China’s emerging first-tier cities. Humanities and Social Sciences Communications, 10(1), Article 927. https://doi.org/10.1057/s41599-023-02390-7

43.

Moore

J. F.

(1993). Predators and prey: A new ecology of competition. Harvard Business Review, 71(3), 75–86. https://doi.org/10.1111/j.1744-1714.1993.tb00677.x

44.

Nan

Niu

L. X.

(2024). The impact of the industrial innovation ecosystem on innovation performance—Using the equipment manufacturing industry as an example. Systems, 12(12), Article 578. https://doi.org/10.3390/systems12120578

45.

Pang

R. Z.

Liu

Zhang

(2023). How does digitalization affect business innovation? Based on the perspective of human capital and transaction cost transmission mechanism. Nankai Economic Studies, 39(2), 102–120. https://doi.org/10.14116/j.nkes.2023.02.006

46.

Pilloni

(2018). How data will transform industrial processes: Crowdsensing, crowdsourcing and big data as pillars of industry 4.0. Future Internet, 10(3), Article 24. https://doi.org/10.3390/fi10030024

47.

Rao

S. Y.

Pan

J. N.

Shangguan

X. M.

(2022). Digital finance and corporate green innovation: Quantity or quality? Environmental Science and Pollution Research International, 29(37), 56772–56791. https://doi.org/10.1007/s11356-022-19785-9

48.

Rong

Lin

Zhang

Radziwon

(2020). Exploring regional innovation ecosystems: An empirical study in China. Industry and Innovation, 28(5), 545–569. https://doi.org/10.1080/13662716.2020.1830042

49.

Russell

M. G.

Smorodinskaya

N. V.

(2018). Leveraging complexity for ecosystemic innovation. Technological Forecasting and Social Change, 136, 114–131. https://doi.org/10.1016/j.techfore.2017.11.024

50.

Shi

R. G.

W. X.

(2025). Does new quality productivity improve the quality and quantity of green innovation: Based on the moderating effect of industrial collaborative agglomeration. Environmental Science. Advance online publication. https://doi.org/10.13227/j.hjkx.202410135

51.

Sun

Yang

Z. D.

Xia

X. C.

Zhu

S. S.

Zhang

X. F.

(2024). The impact of regional innovation ecosystems on carbon emission reduction: A configurational path analysis based on QCA. China Population, Resources and Environment, 34(10), 57–65. https://doi.org/10.12062/cpre.20240714

52.

Sun

Zhang

Y. F.

J. X.

Zhang

S. H.

(2023). Evolutionary game analysis of data resale governance in data trading. Systems, 11(7), Article 363. https://doi.org/10.3390/systems11070363

53.

Talukder

S. M.

Shen

Talukder

H. F. M.

Bao

Y. K.

(2018). Determinants of user acceptance and use of open government data (OGD): An empirical investigation in Bangladesh. Technology in Society, 56, 147–156. https://doi.org/10.1016/j.techsoc.2018.09.013

54.

Tao

C. Q.

Ding

(2022). How data elements become innovation dividends?—Evidence from human capital matching. China Soft Science, 37(5), 45–56. https://doi.org/10.3969/j.issn.1002-9753.2022.05.005

55.

Teemu

Tommi

(2014). Innovation quality in knowledge cities: Empirical evidence of innovation award competitions in Finland. Expert Systems with Applications, 41(12), 5597–5604. https://doi.org/10.1016/j.eswa.2014.02.010

56.

Thomas

(2014). Early warning signals for war in the news. Journal of Peace Research, 51(1), 5–18. https://doi.org/10.1177/0022343313507302

57.

Wang

S. W.

X. Y.

J. W.

Zhang

H. B.

(2025). A study on the green technology innovation effect of FinTech: Based on the perspective of quantitative growth and qualitative enhancement of innovation. Macroeconomics, 47(3), 36–49. https://doi.org/10.16304/j.cnki.11-3952/f.2025.03.007

58.

Wang

Liu

K. C.

Zeng

J. W.

(2022). How dose big data capability contribute to transformation and upgrading of enterprises? A multiple mediation model of technological innovation and business model innovation. Contemporary Finance & Economics, 43(7), 76–86. https://doi.org/10.13676/j.cnki.cn36-1030/f.2022.07.008

59.

World Intellectual Property Organization. (2025). 2025 Global Innovation Index. World Intellectual Property Organization. https://www.gov.cn/yaowen/liebiao/202509/content_7039335.htm

60.

Z. C.

(2020). Regional collaborative innovation performance based on stochastic frontier analysis: Innovative network structure perspective. Journal of Technology Economics, 39(4), 120–131. https://doi.org/10.3969/j.issn.1002-980X.2020.04.015

61.

G. N.

Y. C.

Minshall

Zhou

(2018). Exploring innovation ecosystems across science, technology, and business: A case of 3D printing in China. Technological Forecasting and Social Change, 136, 208–221. https://doi.org/10.1016/j.techfore.2017.06.030

62.

Yan

W. X.

(2025). Data factor marketization empowering enterprise innovation quality: New evidence from Chinese patent citations. International Review of Economics and Finance, 103, Article 104433. https://doi.org/10.1016/j.iref.2025.104433

63.

Zhao

M. F.

S. Z.

(2023). Data factor and enterprise innovation: The perspective of R&D competition. Economic Research Journal, 58(2), 39–56.

64.

L. R.

(2024). How knowledge recombination affect technology innovation quality in Chinese high-tech firms: The role of relation dynamics and knowledge network decomposability. SAGE Open, 14(3). https://doi.org/10.1177/21582440241267786

65.

Yuan

S. J.

L. P.

Zhong

C. B.

Chen

Y. F.

(2020). Does the innovation policy promote innovation quantity or innovation quality? China Soft Science, 35(3), 32–45. https://doi.org/10.3969/j.issn.1002-9753.2020.03.004

66.

Zhang

G. T.

Chen

X. D.

H. D.

(2011). Research on inequality of regional innovation quality in China. Studies in Science of Science, 29(11), 1709–1719. https://doi.org/10.16192/j.cnki.1003-2053

67.

Zhang

H. F.

Guo

R. T.

(2025). How does the spatial agglomeration of human capital affect strategic innovation in China? Journal of Asian Economics, 96, Article 101868. https://doi.org/10.1016/j.asieco.2024.101868

68.

Zhang

X. F.

Y. T.

(2024). Data factor economics: Characteristics, right confirmation, pricing and transaction. Economist, 4(4), 35–44. https://doi.org/10.16158/j.cnki.51-1312/f.2024.04.003

69.

Zhang

M. L.

B. Z.

Yin

(2020). Configurational paths to regional innovation performance: The interplay of innovation elements based on a fuzzy-set qualitative comparative analysis approach. Technology Analysis and Strategic Management, 32(12), 1422–1435. https://doi.org/10.1080/09537325.2020.1773423

70.

Zhang

Z. G

G. L.

Y. S.

(2025). Big data and sustainability: Exploring the role of data element allocation in enhancing corporate performance. International Review of Economics and Finance, 103, Article 104402. https://doi.org/10.1016/j.iref.2025.104402

71.

Zhao

S. L.

Cacciolatti

Lee

S. H.

Song

(2015). Regional collaborations and indigenous innovation capabilities in China: A multivariate method for the analysis of regional innovation systems. Technological Forecasting and Social Change, 94, 202–220. https://doi.org/10.1016/j.techfore.2014.09.014

72.

Zhao

Zhang

Liang

S. K.

(2020). Digital economy, entrepreneurship, and high-quality economic development: Empirical evidence from urban China. Journal of Management World, 36(10), 65–76. https://doi.org/10.19744/j.cnki.11-1235/f.2020.0154

73.

Zhou

Shen

W. J.

(2018). Research on the mechanism of R&D investment on regional innovation capacity: Based on empirical evidence of intellectual property protection intensity. Science of Science and Management of S&T, 39(8), 26–39.

Data Element Agglomeration,Regional Innovation Ecosystem and Innovation Quality

Abstract

Plain Language Summary

Keywords

Introduction

Literature Review

Research on the Data Element Agglomeration

Research on the Innovation Quality

Research on the Innovation Effects of Data Element

Research on Regional Innovation Ecosystem

Mechanistic Analyses and Research Hypotheses

Direct Transmission Mechanisms and Research Hypotheses

Indirect Transmission Mechanisms and Research Hypotheses

The Mediating Role of TIS in the Pathway of AGG Affecting INNO

The Mediating Role of FUND in the Pathway of AGG Affecting INNO

The Mediating Role of MAR in the Pathway of AGG Affecting INNO

Non-Linear Transmission Mechanisms and Research Hypotheses

Model Construction and Variables Measurement

Model Construction

Baseline Model

Mediating Effect Model

Threshold Model

Variables Measurement and Description

Explained Variable

Core Explanatory Variable

Mediating and Threshold Variables

Control Variables

Data Sources and Descriptive Statistics

Empirical Results and Analyses

Baseline Regression Analysis

Mechanism Analysis

Threshold Regression Analysis

Robustness Tests

Discussions

The Universality of the Effect of AGG on INNO

AGG Indirectly Affects INNO Through the Improvement of the Regional Innovation Ecosystem from a Global Perspective

Regional Heterogeneity and Threshold Effects of AGG on INNO

Conclusions and Recommendations

Conclusions

Theoretical Implications

Practical Implications

Deficiency and Prospect

Footnotes

Acknowledgements

ORCID iDs

Ethical Considerations

Consent to Participate

Consent for Publication

Author Contributions

Funding

Declaration of Conflicting Interests

Data Availability Statement

References