Abstract
The global economic landscape is currently undergoing profound structural adjustments, characterized by the ascent of emerging economies and concurrent challenges facing established economic powers. Within this context, innovation has emerged as a pivotal strategy for enhancing national competitiveness and securing strategic advantages in future development. We utilize citation counts of a patent in this article, which is more internationally recognized, as a characterizing variable of innovation quality, and use the accelerated genetic algorithm-optimized projection pursuit method (RAGA-PP) to calculate data element agglomeration. Based on the 2012 to 2022 Chinese provincial panel data, we first employ a fixed effect model to examine the direct effect of data element agglomeration on innovation quality. Secondly, the research framework includes a regional innovation ecosystem. The mediating effect model and the threshold regression model are used to investigate the indirect and non-linear effects of data element agglomeration on innovation quality, with the technological innovation subjects, R&D funds, and market environment serving as the respective constraints. It is found that: innovation quality is positively impacted by data element agglomeration, with a “central > west > east” effect. Data element agglomeration can indirectly improve innovation quality by boosting the number of technological innovation subjects, increasing the investment of R&D funds, and creating an open market environment structure; With the thresholds of technological innovation subjects, R&D funds, and market environment crossed, data element agglomeration can exert a more favorable impact on innovation quality. The conclusions offer governments some pointers for developing regulations for regional differentiation and creating a welcoming digital environment.
Plain Language Summary
This paper employs data empowerment theory and innovation ecosystem theory as its foundational frameworks. Utilizing the 2012-2022 Chinese provincial panel data, this paper employs a baseline regression model to examine the direct impact of data element agglomeration on innovation quality. Concurrently, the regional innovation ecosystem is integrated into the research framework. Mediating effect model and threshold regression model are applied to test the transmission mechanisms through which the regional innovation ecosystem influences the relationship between data element agglomeration and innovation quality. Results indicate that data element agglomeration exerts a significant positive direct effect on innovation quality, while also generating indirect transmission effects and nonlinear effects through regional innovation ecosystem.
Introduction
Innovation is the fundamental driving force of development and a crucial cornerstone for building a modernized economic system and safeguarding national security. As the world’s second-largest economy, China places high importance on enhancing innovation quality (INNO). The Third Plenary Session of the 20th CPC Central Committee emphasized the importance of establishing comprehensive institutional mechanisms to support innovation, highlighting the coordinated reform of education, science and technology, and talent systems, aiming to optimize the new national innovation system and upgrade its overall capacity. Under the incentive of innovation policies, China’s science and technology indicators have shown steady growth. In September 2025, the World Intellectual Property Organization (WIPO) released the 2025 Global Innovation Index in Hong Kong, listing the top 100 innovation clusters. China had 24 clusters on the list, ranking first globally in terms of quantity for the third consecutive year (World Intellectual Property Organization, 2025). At the micro-technology level, nanotechnology has yielded remarkable achievements. According to the China Nanotechnology Industry White Paper (Chinese Academy of Science, 2025), the global total of granted nanotechnology patents exceeded 1.078 million between 2000 and 2025, with China accounting for 464,000 (43%) of the total, thereby consolidating its global leadership (Chinese Academy of Sciences, 2025). However, domestic innovation also faces shortcomings. By 2024, China’s industrialization rate for invention patents had risen to 53.3%. Yet compared to developed nations, there remains a gap in innovation capabilities within high-end technology sectors, with breakthrough technologies being scarce. Concerns persist regarding the “high quantity but low quality” of patents. Currently, the global competitive landscape is undergoing profound adjustments. As China’s economy advances toward high-quality development, the traditional growth model reliant on factors of production and investment is becoming unsustainable. To transform from a manufacturing powerhouse into an innovation powerhouse, China urgently needs to optimize its innovation system and overcome innovation bottlenecks.
Amidst the surging waves of technological revolution and industrial transformation, continuous breakthroughs in cutting-edge fields such as artificial intelligence, biotechnology, and quantum computing are profoundly reshaping global industrial division of labor and competitive landscapes. Against this backdrop, data is increasingly emerging as a vital new factor of production. Based on the data empowerment theory, the deep integration and widespread application of cutting-edge technologies like digital twins and artificial intelligence have significantly expanded the coverage and penetration depth of data element within the real economy, gradually positioning them as the core driving force behind technological innovation. Data element exhibits high permeability, scalability, and low-cost sharing, providing innovation activities with a robust technological foundation and rich information resources (Calic & Ghasemaghaei, 2020) that foster an efficient, open innovation ecosystem. Facing the current predicament of innovation characterized by “high quantity but low quality,” China urgently needs to align with macroeconomic development trends, further unleash the potential of data empowerment, continuously stimulate innovation vitality, and drive the innovation system’s transition from scale expansion to quality enhancement. In the process, the data element agglomeration (AGG) is probably going to play a major role in resolving China’s present innovation conundrum. Then, a realistic question worth pondering is whether the AGG can effectively promote the improvement of INNO. If the answer is yes, what is the transmission mechanism of AGG affecting INNO? Further, based on China’s regional heterogeneity, does the INNO effect of AGG show unique patterns and characteristics? In-depth analyses of the impact of the AGG on INNO, and clarification of the unique role of AGG in the process of influencing the INNO, to formulate targeted development policies, will create important theoretical and practical value for enhancing China’s strength in science and technology, as well as its competitiveness abroad, and building a strong innovation country.
While existing studies have extensively examined AGG, most remain focused on enhancing innovation quantity (Liao et al., 2024; Liu & Yang, 2025), with relatively insufficient attention paid to its impact on INNO and underlying mechanisms. Especially, there is a lack of empirical tests on the mediating transmission and nonlinear regulatory effects of AGG through optimizing the regional innovation ecosystem and thereby enhancing INNO, based on the systematic path of “AGG—regional innovation ecosystem—INNO.” This limitation not only constrains the in-depth development of data empowerment theory in China’s unique context but also fails to effectively support differentiated governance requirements for regional innovation practices. Therefore, this article adopts data empowerment theory and innovation ecosystem theory as its foundational frameworks. Using panel data from 30 Chinese provinces spanning 2012 to 2022, it employs benchmark regression, mediating effects, and threshold regression models to examine the impact of AGG on INNO and its transmission mechanisms. This article aims to provide theoretical guidance for leveraging AGG to enhance INNO, offer practical insights for regions to develop suitable innovation ecosystem, and further expand the research frontiers of data-driven innovation.
The potential marginal contribution of this article lies in the following aspects: First, from the perspective of AGG, systematically establishes the theoretical connection between it and INNO, reveals the deep-seated interaction mechanism between the two, and thereby expands the research scope of INNO. Second, it incorporates innovation ecosystem into the research framework, exploring the intrinsic mechanisms through which AGG influences INNO from perspectives including technological innovation subjects (TIS), R&D funds (FUND), and market environment (MAR), thereby deepening and broadening the understanding of data element’s innovative efficacy. Finally, this article conducts an in-depth analysis and discussion of the non-linear effects of AGG on INNO. It aims to provide a basis for formulating regional innovation policies and offer empirical support for enhancing China’s overall INNO.
Following the introduction, the remainder of this article is organized as follows: The second part presents a literature review, synthesizing existing studies on AGG, INNO, the innovation effects of data element, and regional innovation ecosystem. The third part outlines the article’s theoretical framework and research hypotheses. The fourth part details the research design, including model construction, variable measurement and specification, data sources, and descriptive statistics. The fifth part conducts empirical testing to validate and analyze the theoretical hypotheses. The sixth part discusses findings within a global context. Finally, the article presents research conclusions and policy recommendations, while also identifying limitations and future directions.
Literature Review
Research on the Data Element Agglomeration
With the rapid advancement of digital technology, the importance of data as a key factor of production has become increasingly prominent. Research on data as a factor and its value creation has emerged as a focal point for scholars both domestically and internationally. Relevant research primarily covers four aspects: the connotation of data element, the formation mechanism of AGG, influencing factors, and post-effect studies. In exploring the connotation of data element, Thomas (2014) was among the first to argue that data itself, akin to capital and labor, constitutes a factor of production. Its “techno-economic” characteristics: including low cost, immediacy, pervasiveness, mass availability, and self-organization (Cai & Ma, 2021; X. Xu et al., 2023). These attributes collectively raise corporate productivity. Regarding the formation mechanism of AGG, research indicates that AGG is driven by technological foundations such as digital infrastructure, platform scale, and algorithmic capabilities (Talukder et al., 2018), while also benefiting from government policy guidance, data openness strategies, and market demand (Han et al., 2024), exhibiting characteristics of synergistic agglomeration across physical and virtual spaces. Existing studies have identified multiple factors that significantly influence the AGG, including the level of digital infrastructure, industrial structure and scale, the concentration of high-tech talent, and data property rights and transaction systems (Gao et al., 2025; Y. Sun et al., 2023). In terms of post-effect studies, scholars have primarily analyzed from the perspectives of corporate decision-making and production, highlighting that deepening the extraction of data value plays a positive role in enhancing the quality and efficiency of manufacturing. Particularly in the era of Industry 4.0, data and information have become especially crucial. The collective intelligence perception and crowdsourcing effects triggered by their agglomeration will bring new advantages and challenges to enterprises (Pilloni, 2018). On one hand, the information embedded in data can optimize decision-making processes, accelerate resource circulation, and stimulate output growth (Ma et al., 2020). It also significantly drives the expansion of commerce and distribution (Meng et al., 2023) and enhances logistics efficiency (Lv et al., 2025). On the other hand, some scholars hold differing views (Aghion et al., 2023), arguing that data technology does not necessarily enhance data accuracy. Instead, it may dampen the impetus for innovation, slow economic growth, and pose societal challenges.
Research on the Innovation Quality
Innovation quantity typically refers to the number and frequency of innovative outputs such as patents and academic papers, emphasizing the scale, and output efficiency of innovation. INNO, however, places greater emphasis on the technological sophistication, economic benefits, and sustainable impact of innovation. It manifests in the breakthrough nature of outcomes, their transformative potential, and their contribution to regional competitive advantages, serving as a deeper reflection of innovation capacity (Cai & Yu, 2017). Existing studies on INNO primarily revolve around three dimensions: connotation definition, measurement methods, and influencing factors. Regarding conceptual definition, Haner (2002) first framed AGG as a core input that permeates the entire innovation process, defining it through product or service attributes, production processes, and firm management. Some scholars contend that INNO encompasses not only technological value but also the commercial value generated (Teemu & Tommi, 2014). The framework has gradually expanded from product, operational, and process dimensions to encompass services, processes, and culture (Cai & Yu, 2017). Regarding measurement methods since the number of patent applications and authorizations directly reflects the results of technological innovation, academics mostly use patent indicators as proxy variables for INNO. Common metrics include the number of applications or grants (Krammer, 2009), patent grant ratios and duration of payment periods (G. T. Zhang et al., 2011), the proportion of invention patents in total applications (Cai & Yu, 2017), and novel quality indicators such as knowledge breadth and patent citation frequency based on IPC classifications (T. C. Li et al., 2023). Regarding influencing factors, the existing literature suggests that knowledge reorganization (Yu & Yu, 2024), the company size (Deng et al., 2025), and the selection of supportive policies (Ma & Xiang, 2025) respectively exert significant influences on INNO from the micro, meso, and macro perspectives.
Research on the Innovation Effects of Data Element
Regarding the innovation effects of data element, existing studies primarily focus on how they empower innovation activities as a new type of production factor. Through deep integration with labor, capital, and knowledge, data not only optimizes factor allocation efficiency but also significantly enhances corporate innovation performance (Maryam & Goran, 2019; Z. G. Zhang et al., 2025), driving disruptive technological innovations and fundamental transformations in business models, thereby providing sustained momentum for corporate transformation and upgrading (X. Wang et al., 2022). Mechanism studies indicate that data element, through synergistic interactions with traditional factors on both the supply and demand sides, effectively reduce information asymmetry and trial-and-error costs in the innovation process, forming a crucial foundation for enterprises to achieve high-quality innovation (Q. L. Liu et al., 2022). However, the innovative efficacy of data element is constrained by multiple external conditions and may even generate innovation compensation effects due to inadequate alignment in certain contexts. For instance, its effective integration with human capital directly impacts the establishment of innovation incentive mechanisms and the realization of innovation benefits (Tao & Ding, 2022). Data element lacking support from high-quality talent struggles to fully unleash their innovative value-added potential. Furthermore, enterprise scale significantly influences innovation strategy selection. Large enterprises often leverage abundant data resources to advance incremental and iterative innovation. However, when confronting highly uncertain disruptive innovation, their organizational inertia and limitations in intrinsic incentive structures may result in relatively insufficient motivation (Forés & Camisón, 2016).
Research on Regional Innovation Ecosystem
As a vital component of innovation ecosystem, regional innovation ecosystem has garnered significant attention within academic circles. Relevant research primarily focuses on three aspects: conceptual origins, fundamental characteristics, and constituent elements. Regarding conceptual origins, academic studies on ecosystems initially stemmed from business ecosystem (Moore, 1993). In recent years, the ecosystem concept has garnered extensive attention from scholars in economic management fields, leading to the subsequent introduction of concepts such as innovation ecosystem (Adner, 2006) and platform ecosystem (Ceccagnoli et al., 2012). Among these, regional innovation ecosystem serves as the foundation of innovation ecosystem and is crucial for understanding and analyzing regional innovation activities (Cooke et al., 1997). Drawing upon innovation systems theory, a regional innovation ecosystem can be defined as a self-organizing system characterized by symbiotic competition and dynamic evolution within a specific geographical space. It consists of diverse innovation species, populations, and communities that engage in value co-creation through material, knowledge, and information exchanges with their innovation environment, all based on shared value propositions (Granstrand & Holgersson, 2020). Regarding fundamental characteristics, regional innovation ecosystem exhibits traits similar to natural ecosystem. Simultaneously, within complex and open environments, shifts in innovation subjects and contexts give rise to new features such as network effects, symbiotic collaboration, local embeddedness, and adaptability (Broekel, 2012; Carbonara, 2018; Russell & Smorodinskaya, 2018). Regarding constituent elements, current approaches to classifying the foundational components of regional innovation ecosystem primarily follow either a “binary” or “ternary” approach. The binary approach posits that innovation ecosystem comprises innovation subjects and the innovation environment, while the ternary approach further separates innovation resources from the broader innovation environment, emphasizing the critical role of innovation resources within regional innovation ecosystem (Doloreux, 2002; G. N. Xu et al., 2018). The synergy among these elements facilitates technological, economic, and social development (Rong et al., 2020). Some scholars have employed the “tripartite approach” to demonstrate that regional innovation ecosystem plays a vital role in driving innovation performance (Nan & Niu, 2024), influencing green innovation (Guo et al., 2024), and promoting carbon emission reduction (H. Sun et al., 2024).
In light of the above studies, there is still room for improvement in the research on data element and innovation as follows: First, existing studies have largely focused on the quantitative impact of data element on innovation output, while relatively neglecting their deeper influence on INNO. In particular, there is a lack of systematic explanations from the perspective of element agglomeration regarding the mechanism by which AGG empowers improvements in INNO. Second, research focusing on AGG and INNO has failed to closely integrate with the evolving characteristics of regional innovation ecosystem in the digital era, thereby struggling to reveal their key mediating role in this relationship. Third, taking into account the digital economy’s network features, the complex non-linear relationship between AGG and INNO needs to be explored urgently.
Building upon the aforementioned analysis, this article integrates AGG, regional innovation ecosystem, and INNO within a unified research framework grounded in data empowerment theory. The investigation primarily focuses on examining the quality-enhancing effects of AGG on INNO, while simultaneously addressing the mediating and threshold roles played by regional innovation ecosystem in this influence process.
Mechanistic Analyses and Research Hypotheses
Direct Transmission Mechanisms and Research Hypotheses
Data element is the micro-foundation of the digital economy, and AGG pulls the flow of traditional factors such as capital and labor, and builds a bridge connecting the virtual digital space and the real physical space. Unlike traditional factors, data exhibit non-competitiveness, non-homogeneity, externality, and other significant economic characteristics (M. Zhang et al., 2024), and will have a direct impact on INNO from the perspective of both the innovation process and the innovation output (Li et al., 2023), to achieve the climb of INNO.
On the one hand, in terms of innovation process, based on the data empowerment theory, AGG enhances INNO in multiple dimensions through sustained resource investment and efficient resource utilization. Specifically, the sharing and opening of data resources enable enterprises to leverage big data for precise forecasting of talent needs, thereby achieving flexible allocation of labor resources. At the same time, AGG helps enterprises make full use of the multi-level capital market’s diversified financing channels, eases the enterprise financing constraints, increases enterprise R&D investment, and then improves the enterprise’s fund-raising ability, optimizes the elastic allocation of capital (C. M. Liu et al., 2023), transforms the human capital advantage into the region’s high-quality innovation advantage, and exerts the sustained driving effect of the accumulation of human capital in improving INNO (Pang et al., 2023). AGG can mitigate the negative impact of “data depreciation” and continue to promote high-quality innovation through timely data iteration (Maryam & Goran, 2019), improved data quality, enhanced data sharing, and the formation of scale effects and abundant resources. In addition, non-competitive digital and intelligent transformation based on data element accelerates the conversion of innovation momentum. This helps expand the depth and breadth of innovation, enhances corporate innovation competitiveness and the utilization efficiency of innovation resources, and provides solid support for enterprises to achieve both qualitative improvement and quantitative growth in innovation.
On the other hand, in terms of innovation output, AGG can achieve a high degree of matching of supply and demand as well as key technological breakthroughs, thus enhancing INNO. Based on disruptive innovation theory, AGG forms a powerful network effect, and the leading user demand insights, user demand disassembly, technology simulation experiments, internal knowledge sharing platforms, open innovation platforms, etc., spawned by data element can precisely locate customer demand, guide the direction of enterprise innovation, achieve a high degree of supply and demand matching, and further enhance INNO. In addition, “data,” as a novel kind of crucial production factor, can break the situation of independent data storage and sharing barriers of business systems within the organization, strengthen the linkage of the cooperation network of innovation subjects within the region, and further encourage the sharing and consistent use of knowledge stock and data resources within the region to promote key core technologies to make breakthroughs, which actively promotes the improvement of INNO of enterprises (Yuan et al., 2020).
In summary, this article proposes Hypothesis 1: AGG can directly promote the improvement of INNO.
Indirect Transmission Mechanisms and Research Hypotheses
Innovation itself is an activity that combines high investment, high risk, and high cost with high benefit, high growth, and long-term characteristics. Therefore, efficient resource allocation and a favorable innovation environment are crucial safeguards for enhancing innovation efficiency and quality. Based on innovation ecosystem theory and considering the substitutability among elements within regional innovation ecosystem and its significance in influencing INNO (Hu & Hou, 2023), this article adopts a “three-part framework” for innovation ecosystem. Within this framework, innovation subjects refer to core organizations engaged in knowledge creation and technology application, serving as the direct executors of innovation activities; innovation resources refer to the aggregate of various inputs deployed in innovation activities, providing foundational support; innovation environment encompasses the totality of external factors influencing the innovation process, offering fundamental safeguards. This article examines the mediating roles of innovation subjects, innovation resources, and innovation environment in the pathway through which AGG impacts INNO, analyzing these aspects respectively from the perspectives of TIS, FUND, and MAR.
The Mediating Role of TIS in the Pathway of AGG Affecting INNO
Within regional innovation ecosystem, digital enterprises serve as pivotal TIS. They not only function as vital engines for stimulating technological demand and driving innovation (S. L. Zhao et al., 2015), but also play a central role in enhancing INNO. Based on innovation network theory, AGG significantly promotes data openness, trading, and sharing by optimizing data organization and circulation mechanisms. This attracts digital enterprises to form spatial and business clusters. These clusters accelerate digital-technology development and revitalize technological innovation, establishing the preconditions for sustained gains in INNO. Furthermore, digital enterprises implement vertical and horizontal network integration strategies to strengthen connections and collaboration among TIS. This facilitates the efficient transfer and diffusion of innovation outcomes, knowledge, and technologies within the system, promoting the commercialization of scientific and technological achievements (Clarysse et al., 2014). Through complex nonlinear interactions, this process ultimately achieves a significant leap in INNO.
The Mediating Role of FUND in the Pathway of AGG Affecting INNO
Both the resource-based theory and the knowledge-based perspective emphasize that the scarce resources and knowledge capabilities owned and allocated by an organization constitute the fundamental source for building and sustaining its long-term competitive advantage, as well as the key driver for achieving high-level innovation performance in a region. According to the new economic growth theory, human capital plays an irreplaceable role in promoting regional innovation (Zhang & Guo, 2025). Within this theoretical framework, AGG, through in-depth analysis and application of big data, significantly enhances the efficiency of matching R&D resources with funding, enabling enterprises to more accurately identify innovation opportunities and optimize decision-making processes. This enables enterprises to more accurately identify innovation opportunities and optimize decision-making processes. This mechanism strengthens innovation subjects’ capacity to allocate and utilize data element while improving risk management capabilities in responding to uncertainties. Consequently, it fully unleashes the benefits of data through innovation incentive effects (Tao & Ding, 2022). On the other hand, adequate FUND support facilitates the establishment of systematic, high-level R&D systems and platforms. This drives the in-depth advancement of data-centric R&D activities and accelerates the innovation iteration process, raising both the probability of success and the quality of outputs while ensuring that data element is efficiently converted into concrete innovative achievements.
The Mediating Role of MAR in the Pathway of AGG Affecting INNO
Based on collaborative innovation theory, the abundance of innovation factors within a system and their significant synergistic effects constitute the key mechanism for enhancing regional innovation capabilities and performance. As a vital external driver of regional innovation ecosystem, the openness and demand structure of MAR influence innovation efficiency and quality. According to demand-driven innovation theory, increased market openness facilitates the efficient flow and optimal allocation of key innovation factors such as talent, technology, and data. This significantly reduces the economic and time costs associated with factor acquisition, thereby laying the foundation for enhancing INNO (Wu, 2020). Simultaneously, a robust MAR facilitates the identification and response to consumer value orientations and practical needs. This not only provides stable guidance for innovation direction but also enhances the practicality and market success rate of innovation outcomes through demand alignment. Consequently, it drives the effective realization of innovation value, ultimately fostering continuous improvement in INNO.
In summary, this article proposes Hypothesis 2: TIS, FUND, and MAR in the regional innovation ecosystem all have a mediating effect on AGG affecting INNO.
Non-Linear Transmission Mechanisms and Research Hypotheses
Based on the characteristics of network externality and Metcalfe’s Law, AGG may have a non-linear effect on INNO (T. Zhao et al., 2020). (1) TIS. When TIS is of high quality and high quantity, the number and diversity of TIS can promote the cooperation between different functional subjects and meet the needs of innovation activities, which can significantly increase the innovation rate and success rate of the regional innovation ecosystem, and can play a “supportive” role for AGG to promote INNO (Maryam & Goran, 2019). When the TIS is “small and single,” the internal operation efficiency of the it is reduced, and the ability to obtain external information, technology, and resources is weakened, resulting in the mismatch between supply and demand of factors and product markets, which is not conducive to the enhancement of regional INNO. (2) FUND. Innovation resources are the determinant of regional innovation capacity (M. L. Zhang et al., 2020), when the capital investment in innovation activities is sufficient, it can maintain the continuous innovation R&D investment of the innovation subjects in the region, stimulate the independent innovation and breakthrough innovation, and at the same time, help the transformation of the innovation results and improve the systematic innovation output, which will improve INNO to a greater extent. When capital investment in innovation activities is severely constrained, it crowds out innovation activities to a certain extent, leading to weak innovation and an inability to fully release the positive impact of AGG on INNO. (3) MAR. The stronger the openness of the market, the better resource integration efficiency, the more favorable conditions for the growth of cooperative innovation activities among diverse innovation subjects, and the more favorable the innovation environment’s adaptive cooperation with innovation subjects (Li & Zhang, 2018), thus providing convenient conditions for innovation activities and incentivizing the continuation of innovation activities. At the same time, it helps to promote the circulation and sharing of data, break the “information silo,” maximize the value of data, and further promote INNO. Closed MAR will lead to the blockage of data element circulation, reduce its gathering efficiency and scale, and make it difficult to offer each data element its maximum potential value, thus inhibiting the improvement of INNO. Furthermore, it is easy to form a data monopoly, interfering with the effective integration and utilization of data element, hindering the exchange and sharing of knowledge and technology, and making it difficult to form a good innovation ecosystem. This will have a negative impact on INNO.
In summary, this article proposes Hypothesis 3: AGG exerts a positive non-linear effect on INNO, and this positive effect is more significant when the quality and quantity of TIS are high, the FUND is sufficient, and the MAR is open. The specific theoretical framework diagram is shown in Figure 1.

Theoretical framework diagram of AGG affecting INNO.
Model Construction and Variables Measurement
Model Construction
Baseline Model
To evaluate the three hypotheses put forward in this article, this article respectively examines the direct, indirect, and non-linear effects of AGG on the INNO using the fixed effects model, the mediating effects model, and the threshold regression model, and constructs the following benchmark regression model:
where,
Mediating Effect Model
Further, this article introduces three types of mediating variables, namely, technological innovation subjects (
Among them,
Threshold Model
Taking into account INNO’s pre-dependence and dynamic features, this article draws on Hou et al. (2018), introduces the lagged term variable of INNO, and constructs a dynamic threshold regression model using the systematic GMM estimation method with technological innovation subjects (

Research model.
Variables Measurement and Description
Explained Variable
INNO. Currently, domestic research has not established a unified standard for measuring INNO, with primary focus on indicators such as patent knowledge breadth, patent grant rate, and the proportion of invention patents. Notably, the forward citation counts of a patent, defined as the number of times it is cited by subsequent patents, is a widely accepted indicator of its technological impact and economic value (Mann, 2018). This indicator has gained widespread academic recognition globally due to its relatively accurate reflection of INNO. However, due to measurement complexities, its application remains limited among domestic scholars. This article adopts the evaluation framework proposed by N. Xu et al. (2025), utilizing patent citation counts as the core proxy for patent quality. Advanced data mining techniques were employed to systematically retrieve China National Intellectual Property Administration databases and China Stock Market and Accounting Research Database, with listed companies’ parent firms, subsidiaries, and joint ventures as key search terms. Comprehensive annual searches were conducted to ensure temporal granularity. Subsequently, based on the four core indicators of securities code, patent quantity, province of origin, and year, the data was systematically and meticulously organized and analyzed. Eventually, the citation frequency data of patents in each province was accurately obtained.
Core Explanatory Variable
AGG. To operationalize the concept of AGG, this article follows the methodology proposed by Chao et al. (2020), employing four key indicators: Internet broadband penetration rate, enterprise website ownership, number of e-commerce transaction enterprises, and mobile phone penetration rate. The evaluation utilizes an accelerated genetic algorithm-optimized projection pursuit method (RAGA-PP) to assess AGG. The RAGA-PP algorithm optimizes the projection direction vector
(1) Standardized sample indicators.
Regarding positive indicators:
Regarding negative indicators:
In the equation,
(2) Establish the projection index function
where,
(3) Optimize the projection index function.
where,
(4) By substituting the optimal projection direction
Mediating and Threshold Variables
(1) TIS: TIS represented by enterprises is the subject of risk-taking, value creation, and distribution (Jiao et al., 2016). This article adopts the number of enterprises in the high-tech industry to take the natural logarithm to measure TIS (Hu & Hou, 2023).
(2) FUND: Taking into account the differences in the scale of economic development across regions, this article draws on the methodology of Zhou and Shen (2018), which uses the natural logarithm of the internal expenditure of R&D funds as the indicator for measuring FUND.
(3) MAR: This article adopts the comprehensive indicators of MAR in the China Regional Innovation Capability Evaluation Report, which measures the MAR of the region in terms of market openness, scientific and technological services, and consumption of the resident level (He et al., 2012).
Control Variables
The improvement of INNO will be affected by many internal and external factors in addition to AGG. This article includes several control variables to guarantee the robustness of the model results, including the urbanization level (urb), the industrial enterprises size (size), the educational development level (edu), the technology market turnover (tech), the industrial development level (indus) and the transport infrastructure level (traffic). (1) urb: This article uses the proportion of people living in cities to all people as a means of measurement. (2) size: An important aspect influencing the INNO is the scale of industrial firms, which is closely related to the efficiency of enterprises in carrying out innovation activities, and the ratio of industrial value added to the total number of businesses in the area is used in this article (T. C. Li et al., 2023). (3) edu: This article measures it by using the ratio of educational expenditure to general fiscal budget expenditure. (4) tech: This article uses the technology market turnover to measure. (5) indus: This article uses the value added of industrial enterprises to measure. (6) traffic: The ratio of total road mileage to total population is used in this article.
Data Sources and Descriptive Statistics
This article selected 30 provincial-level regions in mainland China as samples from 2012 to 2022. Given the significant data deficiencies in Hong Kong, Macao, Taiwan, and Tibet, these regions were excluded from the analysis. Additionally, logarithmic transformation was applied to relevant variables to effectively mitigate heteroskedasticity and avoid multicollinearity issues. The specific data sources for these variables are presented in Table 1, while descriptive statistics results are shown in Table 2, and the correlation coefficient matrix is illustrated in Figure 3. Correlation tests revealed that the selected explanatory variables (AGG), control variables, and explained variable (INNO) all exhibited significant correlations at the 1% significance level, preliminarily validating the theoretical relationships among them. To ensure the reliability of regression results, variance inflation factor (VIF) tests were further conducted for multicollinearity assessment, which demonstrated that all explanatory variables had VIF values below 5, indicating no severe multicollinearity issues in the model and confirming its reasonable specification.
Sources of Variables Data.
Results of Descriptive Statistics of Relevant Variables.

Correlation coefficient matrix.
Empirical Results and Analyses
Baseline Regression Analysis
Following the Hausman test, this article concludes that the fixed-effects model is more appropriate. Table 3 presents the baseline regression results of AGG’s impact on INNO, where column (1) provides nationwide estimates for the full sample, while columns (2) to (4) focus on China’s eastern, central, and western regions, presenting region-specific model estimates. In column (1), the coefficient of AGG’s impact on INNO is 2.4075, significant at the 1% level, indicating a significantly positive effect of AGG on improving INNO overall. These results confirm Hypothesis 1: AGG can enhance INNO, consistent with C. M. Liu et al.’s (2023) findings on AGG’s impact on technological innovation. Regarding control variables, the results show that urb has a significantly negative effect on INNO. Rapid urbanization may lead to imbalances in factor allocation, with labor and capital excessively concentrated in non-productive sectors, crowding out R&D investment and weakening effective agglomeration of innovative factors. Additionally, negative externalities such as traffic congestion and environmental pollution caused by urban expansion may offset the positive effects of knowledge spillovers, creating “crowding costs” in the innovation environment, thereby exerting a negative effect on innovation quality, a result that aligns with Shi and Xu (2025). The edu also negatively affects INNO at the 1% significance level. When the education system experiences an adaptation gap with regional innovation needs, it creates a “quality-efficiency separation” phenomenon in human capital accumulation, making the education system less responsive to market demand-driven breakthrough innovations, thereby negatively impacting INNO—a finding that corroborates S. W. Wang et al.’s (2025) conclusions. The indus is similarly detrimental to INNO, consistent with Han et al.’s (2025) empirical results. This phenomenon may stem from enterprises’ reliance on extensive production expansion for industrial value-added, prioritizing scale expansion over technological innovation and quality improvement—a critical challenge China currently faces in innovation quality (T. C. Li et al., 2023).
Baseline Regression Results.
Note. Robust standard errors are in parentheses.
Significance at the .01 level.
Significance at the .05 level.
Significance at the .1 level.
The analysis results from columns (2) to (4) reveal that the impact of AGG on INNO follows a regional gradient: central region > western region > eastern region, consistent with T. C. Li et al.’s (2023) findings. This article argues that, as the frontier of reform and opening-up, the eastern region possesses first-mover advantages in data infrastructure and the digital economy. Consequently, data-factor input has already approached its saturation threshold. According to the endogenous growth theory’s technological convergence effect, excessive factor density may paradoxically lead to data redundancy and path-locking in innovation, manifesting as diminishing marginal returns in factor allocation efficiency. Moreover, innovation activities in the eastern region exhibit capital-intensive characteristics, diluting the marginal contribution of data element. For instance, while leading enterprises like Huawei and Tencent invest over 100 billion RMB annually in R&D, small and medium-sized enterprises (SME) are forced to reduce data-sharing investments due to high costs. This “capital substituting for data” model ultimately undermines innovation efficiency. In contrast, central region is undergoing a critical window for industrial upgrading, with government policies actively stimulating societal innovation vitality. Henan Province, as a national pilot for factor marketization reforms, has established comprehensive data governance through its data trading center system. The Data Element Market Cultivation Action Plan implements an “eight major initiatives” policy framework, focusing on institutional innovations in data property rights and circulation mechanisms. The six central provinces leverage national pilot cities for SME digital transformation, offering specialized subsidies for intelligent upgrading and digital transformation initiatives. These policy measures collectively form the institutional foundation of central China’s innovation advantage by releasing institutional dividends through factor marketization and lowering innovation thresholds via infrastructure upgrades. The region is currently in its golden age of data element development, where synergistic values from traditional industry upgrading and emerging business models are fully realized. Han et al.’s (2024) research empirically validates these policy-driven mechanisms that amplify central China’s innovation potential. Western region benefits from the compensation mechanism embedded in policy-guided unbalanced development strategies. Grounded in the latecomer advantage hypothesis, its innovation system achieves technological leapfrogging through compensatory AGG. However, this catch-up effect is constrained by dual limitations of human capital and institutional environments, resulting in weaker impact intensity compared to central China’s advantages in complete industrial ecosystems and market scale.
Mechanism Analysis
Table 4 presents the mediation mechanism test results of AGG’s impact on INNO. Based on the regression coefficients and significance levels of core variables and mediating variables in the model, we can conclude that under the influence of TIS, FUND, and MAR factors, AGG exerts indirect effects on INNO to a certain extent, providing robust empirical support for validating Hypothesis 2 proposed in this article. Specifically, columns (1) and (2) employ TIS as mediating variables in a stepwise regression analysis of AGG’s impact on INNO. The results show a significantly positive effect of AGG on TIS (coefficient: 1.1074), alongside a direct effect coefficient of 1.8283 on INNO. Holding other factors constant, a one-unit increase in TIS leads to a significant .3451-unit enhancement in INNO. Therefore, through the mediating role of TIS, AGG indirectly boosts INNO by .3822 units (
Mechanism Test Results.
Note. Robust standard errors are in parentheses.
Significance at the .01 level.
Significance at the .05 level.
Significance at the .1 level.
Table 5 presents the Boostrap test results for the mediation mechanism of AGG’s impact on INNO. The findings indicate that at the 95% confidence level, the confidence intervals for both the mediation effects and direct effects of the three mediating variables clearly exclude zero, demonstrating the statistical significance of both effects. This confirms the existence of partial mediation effects. Overall, AGG exerts indirect effects on INNO through three distinct pathways. These results suggest that AGG not only directly influences INNO, but also significantly impacts it through multiple transmission channels, thereby validating Hypothesis 2.
Boostrop Test.
Note. Robust standard errors are in parentheses.
Significance at the .01 level.
Significance at the .05 level.
Significance at the .1 level.
Threshold Regression Analysis
Before doing the threshold regression, the existence of the model threshold effect needs to be tested. This article has undergone Bootstrap repeated sampling, as shown in Table 6. The findings indicate that the significance test of a single threshold effect has been passed by TIS, FUND, and MAR, but none of the three have passed the double threshold test, confirming Hypothesis 3 that the effect of AGG on INNO is non-linear and moderated by these three variables. Specifically, the threshold value for TIS is 4.7958, the threshold value for FUND is 14.4638, and the threshold value for MAR is 2.7991 (as shown in Table 7).
Threshold Effect Tests.
Thresholds and Confidence Intervals.
To more precisely evaluate the threshold value estimation and its confidence interval, this article employs the likelihood ratio statistic (LR) based on ordinary least squares to determine the threshold value. The LR statistic reaches zero when the threshold variable attains the threshold value. Figure 4 illustrates the likelihood ratio functions for different threshold variables.

Likelihood ratio function plots for each threshold variable: TIS (a), FUND (b), MAR (c).
Table 8 presents the results of dynamic panel threshold regression, where columns (1), (2), and (3) demonstrate remarkable consistency in the regression outcomes. Notably, across all three models, the lagged terms of the explained variable (INNO) exhibit statistically significant positive correlations at the 1% significance level, providing robust empirical evidence for the necessity of controlling dynamic lag effects in this article. The results in Table 9 indicate that the Hansen test yields a significance level exceeding .1, suggesting the absence of over-identification issues in the model’s instrumental variables. The p-values from the autocorrelation tests of disturbance terms (AR(1) and AR(2)) further confirm the model’s reasonable specification.
Threshold Regression Results.
Note. Robust standard errors are in parentheses.
Significance at the .01 level.
Significance at the .05 level.
Significance at the .1 level.
The Results of the AR(1) and AR(2) Tests.
Column (1) reports regression results with TIS as the threshold variable, revealing a monotonically increasing driving effect of AGG on INNO—providing robust evidence for Hypothesis 1. Specifically, AGG exerts significantly positive impacts on INNO. When surpassing the first threshold value of 4.7958, the influence coefficient increases from .8467 to .9086, demonstrating stronger and more significant driving effects in the high-threshold region compared to the low-threshold interval. This phenomenon stems from the critical threshold effects inherent in TIS as core carriers of innovation activities. Below the threshold, AGG faces challenges in transforming into actual innovation outputs due to insufficient technological absorption capacity, rigid organizational structures, or inefficient innovation processes. Upon crossing the threshold, however, these entities develop enhanced data integration capabilities and more open innovation ecosystems, enabling data element to significantly boost INNO through pathways such as mitigating information asymmetry, optimizing R&D resource allocation, and accelerating knowledge spillover. Column (2) presents model estimates with FUND as the threshold variable. As FUND crosses the threshold of 14.4638, the positive effect of AGG on INNO becomes increasingly pronounced, with the influence coefficient rising from .4856 to .5991. This indicates that when R&D investment remains below the threshold, the potential value of data element remains unrealized due to insufficient technological capabilities. Continued R&D investment, however, strengthens the positive feedback mechanism between AGG and INNO by reducing data transaction costs and optimizing data governance structures. Column (3) displays threshold regression estimates for MAR. When MAR crosses the threshold of 2.7991, the promoting effect of AGG on INNO intensifies, with the influence coefficient increasing from .0866 to .8026. This occurs because MAR, as institutional infrastructure, determines the allocation efficiency of data element. Below the threshold, AGG is constrained by ambiguous property rights, high transaction costs, and inefficient circulation mechanisms, resulting in weaker marginal contributions to INNO. Upon crossing the threshold, sound market mechanisms guide data element toward high-value innovation domains through price signals, eliminate data monopolies, and facilitate cross-domain data integration, thereby triggering a quantum leap in INNO. Collectively, under the constraints of these three threshold variables, AGG exhibits significant nonlinear effects on innovation quality, providing robust evidence for Hypothesis 3.
Robustness Tests
To verify the robustness of the results, this article employs four methodological approaches: (1) introducing lagged explained variable, (2) reducing control variables, (3) adjusting the sample size, and (4) constructing a dummy variable (NBD) using national big data comprehensive pilot zones as the treatment group to serve as a new explanatory variable. These approaches collectively provide a comprehensive robustness check for the impact of AGG on INNO. Taking into account the time lag effect of patent effectiveness, authorization and promotion, this paper has decided to use the number of citations that occurred 1 year later as the new explained variable. The robustness test results presented in column (1) of Table 10 demonstrate that the model coefficients and significance levels remain consistent with the baseline regression outcomes, thereby validating the model’s robustness. This article further conducts regression analysis by reducing control variables, as shown in column (2) of Table 10. The resulting coefficients and significance levels exhibit minimal deviation from the core findings, providing additional evidence for the model’s robustness. To evaluate potential biases caused by outliers and reinforce the robustness of the conclusions, the article systematically adjusts the sample size. Specifically, regions exhibiting either the highest or lowest levels of AGG (approximately 1%, 5%, and 10% of the sample) are sequentially excluded, with robustness tests conducted separately for 28, 26, and 24 remaining regions. The test results demonstrate remarkable consistency between the coefficients and significance levels of explanatory variables across all specifications and the baseline outcomes, with no significant deviations observed. This finding provides robust empirical support for the conclusions (detailed results for the 28-region sample are presented in column (3) of Table 10 for conciseness). The article also constructs a dummy variable based on the national big data comprehensive pilot zones approved in 2016 (as officially announced by the Chinese government) as a new core explanatory variable. The empirical results in column (4) reveal a significantly positive impact of this variable on INNO, demonstrating that the agglomeration effects of data as a production factor can significantly enhance innovation quality, thereby further confirming the robustness of the model results.
Robustness Tests.
Note. Robust standard errors are in parentheses.
Significance at the .01 level.
Significance at the .05 level.
Significance at the .1 level.
Discussions
In summary, this article examines the impact of AGG on INNO, demonstrating that these relationships are not only robust in China but also exhibit broad applicability worldwide. Future research could further explore specific practices and experiences of different countries and regions in leveraging data to drive innovation, thereby providing valuable references for global innovation and development. Through empirical analysis, this article elucidates the mechanisms through which AGG influences INNO and arrives at the following conclusions.
The Universality of the Effect of AGG on INNO
The facilitating effect of AGG on INNO has established transnational empirical support. Quantitative analyses based on Chinese contexts reveal that a one-unit increase in AGG leads to a 2.4075-unit enhancement in INNO. This conclusion not only withstands a battery of robustness checks but also demonstrates cross-regional universal applicability. Tracking studies by Stanford University on Silicon Valley enterprises confirm a significant positive correlation between spatial distribution density of corporate data-sharing platforms and patent quality indices. The European Data Space framework established under the European Data Strategy (European Commission, 2020) is accelerating the penetration of data element into innovation chains at institutional levels. This transnational consistency phenomenon suggests that data-driven innovation paradigms are fundamentally reshaping the underlying logic of global innovation systems. The operational mechanisms warrant more systematic international comparative research.
AGG Indirectly Affects INNO Through the Improvement of the Regional Innovation Ecosystem from a Global Perspective
AGG enhances regional INNO through multidimensional transmission mechanisms: it not only directly promotes innovation output but also optimizes regional innovation ecosystem via indirect pathways including expansion of TIS, intensification of FUND, and MAR liberalization. Germany’s case exemplifies this approach. Since 2015, its Data Sharing Platform Initiative has established a government-enterprise-research institution collaborative innovation network. By developing application scenarios for advanced technologies like blockchain and artificial intelligence, the initiative has significantly improved technology transfer efficiency among innovation entities. Japan’s approach through the Automotive Data Consortium has created a sharing mechanism for R&D resources and technological achievements among automakers. This model not only reduces resource waste from redundant R&D but also enhances patent output per 100 million yuan of R&D investment through data-driven precision R&D. At the European Union level, the Financial Services Action Plan has provided institutional safeguards for corporate innovation activities by establishing unified financial market regulations across member states. Heller’s (2024) research indicates this policy increased corporate patent application likelihood by 25%, demonstrating the leveraging effect of MAR openness on INNO.
Regional Heterogeneity and Threshold Effects of AGG on INNO
Research demonstrates that the driving effect of AGG on INNO exhibits significant threshold characteristics, with its non-linear mechanism originating from the synergistic influence of three key components in regional innovation ecosystem: TIS, FUND, and MAR. Regarding TIS, when the synergistic effect between AGG and innovation entities surpasses critical thresholds, the marginal contribution to INNO shows quantum leap improvements. In terms of FUND, the agglomeration effect of data element significantly amplifies innovation output only when research funding reaches specific threshold levels. For MAR dimensions, once market openness crosses critical points, innovation entities, and environmental systems develop more efficient dynamic adaptation mechanisms.
This article further reveals that AGG plays a pivotal enabling role in enhancing INNO, strongly corroborating the findings of Maryam and Goran (2019) regarding data-enabled innovation efficiency and effectiveness in U.S. enterprises. AGG accelerates knowledge and technology spillovers, enhances inter-agent collaboration efficiency, restructures R&D expenditure toward higher-yield projects, and invigorates market-level innovation, thereby raising INNO. This phenomenon not only highlights data’s crucial role in innovation processes but also emphasizes its key function in counteracting negative “data depreciation” effects, providing novel theoretical explanations for sustaining high-quality innovation.
Conclusions and Recommendations
Conclusions
This article examines how AGG drives innovation development, utilizing Chinese provincial panel data from 2012 to 2022 to analyze the specific effects of AGG on INNO through three dimensions: baseline regression results, transmission mechanisms, and nonlinear impacts. To comprehensively analyze the intricate relationship network between AGG and INNO, we sequentially employed fixed-effects models, mediating effect models, and dynamic threshold regression models. These analyses encompassed both nationwide investigations and more focused examinations of eastern, central, and western regions down to provincial levels, aiming for more detailed and comprehensive understanding. The main findings can be summarized as follows: First, AGG demonstrates significant positive effects on INNO improvement, exhibiting a regional distribution characteristic of “central > western > eastern” regions. This suggests AGG has become a crucial factor in overcoming China’s innovation development bottlenecks. This conclusion remains valid after robustness tests including lagged explained variable, reducing control variables, adjusting sample size and constructing a dummy variable using national big data comprehensive pilot zones as the treatment group to serve as a new explanatory variable. Second, this article constructs a theoretical model to explain how AGG affects INNO through regional innovation ecosystem. Empirical analysis reveals that fundamental elements of regional innovation ecosystem (TIS, FUND, and MAR) exert varying degrees of indirect influence on the relationship between AGG and INNO. Specifically, they enhance INNO through three pathways: increasing TIS, boosting FUND, and opening MAR. Finally, the impact of AGG on INNO demonstrates significant nonlinear characteristics. When TIS reach sufficient quality and quantity thresholds, AGG’s promotion effect on INNO becomes more pronounced. When FUND exceeds certain threshold levels, AGG further amplifies its positive effects on INNO. Simultaneously, as MAR become more open, the positive effects of AGG on INNO correspondingly strengthen.
Theoretical Implications
First, this article systematically investigates the multidimensional effects of AGG on INNO, expanding the theoretical boundaries of data empowerment theory while deepening the research connotation of agglomeration effects in the field of INNO. Second, existing studies on mechanisms promoting innovation quality primarily focuses on digital finance, environmental regulation, or ESG performance (Dong & Shi, 2025; S. Y. Li et al., 2022; Rao et al., 2022). However, incorporates regional innovation ecosystem into the analytical framework, thereby deepening the understanding of innovation quality impact pathways. Furthermore, by adopting China’s unique institutional context as the research setting, it provides novel explanatory dimensions to innovation ecosystem theory. Finally, leveraging China’s distinctive regional heterogeneity, this article reveals the non-linear effects of AGG in enhancing INNO, which extends Metcalfe’s Law in network economics and strengthens the applicability and reliability of AGG policies (T. Zhao et al., 2020), thereby offering more actionable managerial insights.
Practical Implications
The conclusions of this article demonstrate the significant policy implications of AGG in promoting INNO.
First, establishing data collaboration mechanisms and implementing differentiated regional strategies can systematically enhance both innovation processes and outputs through AGG. The findings confirm that AGG significantly improves INNO. Therefore, governments should adopt proactive measures to leverage this effect. Regarding innovation process optimization, cross-departmental data collaboration mechanisms should be established to break down industry data silos. This can be achieved by developing industrial data platforms and standardizing data interfaces to facilitate seamless data flow throughout the R&D chain. Strengthening data infrastructure for fundamental research is crucial, including building national scientific data centers and prioritizing open access to public research datasets (e.g., meteorological and geographical data), complemented by robust data quality evaluation systems to ensure research reliability. For innovation output enhancement, a data-driven commercialization system should be implemented. This includes establishing special funds for big data-based patent analytics, creating intelligent decision-making platforms covering technology assessment and market forecasting, and developing value distribution mechanisms that incorporate data element. Supporting policies should balance short-term incentives with long-term cultivation, such as piloting data element accounting systems, refining data-related intellectual property rights, and establishing fault-tolerant regulatory frameworks tailored to data characteristics. Finally, regional strategies should align with local industrial features. For instance, digitally advanced regions could explore cross-border data flow pilots, while traditional industrial bases may focus on digitizing production data. A dynamic policy monitoring system should track indicators like data utilization rates and element conversion efficiency to ensure sustained translation of AGG into INNO improvement.
Second, implementing systematic policy packages including multi-stakeholder collaborative innovation networks, optimized R&D funding allocation, and deepened market-oriented reforms of data element will facilitate balanced regional development. The research findings reveal that AGG exerts varying degrees of indirect effects on INNO through fundamental components of regional innovation ecosystem (TIS, FUND, and MAR). These positive effects become more pronounced when TIS achieve both quality and quantity, FUND reaches sufficient levels, and MAR maintain openness. First, regarding the cultivation of TIS, a multi-agent collaborative innovation network should be established. Policy incentives such as tax credits and R&D subsidies should encourage enterprises to increase data element inputs, with particular support for specialized and sophisticated SMEs to establish joint laboratories with universities and research institutes. Concurrently, digital talent development programs should be implemented alongside improved cross-regional talent mobility mechanisms, with special emphasis on enhancing the quality of innovation entities in central and western regions. Second, for FUND allocation, we recommend creating a diversified investment system guided by government and led by markets. The R&D expense super-deduction policy should be optimized, offering additional deduction ratios for data-intensive R&D projects. Regional innovation funds should prioritize central and western regions, while PPP models should mobilize social capital for digital infrastructure construction. Third, concerning MAR optimization, accelerating market-oriented reforms of data element is imperative. This includes establishing tiered data property rights confirmation and trading systems, while improving intellectual property protection and dispute resolution mechanisms to provide institutional safeguards for data element circulation.
Third, precisely formulating and effectively implementing region-specific, dynamically adjusted AGG strategies. The research findings reveal that the impact of AGG on INNO exhibits regional heterogeneity, characterized by a “central > western > eastern” gradient. Therefore, governments should adopt differentiated policy packages to maximize innovation benefits. For central regions, leveraging existing industrial foundations and data infrastructure advantages, policymakers should prioritize deep integration between data element and manufacturing. Encouraging leading enterprises to establish industry-wide data-sharing platforms, coupled with tax incentives and targeted subsidies to reduce data acquisition costs for SMEs, will further unleash the innovation dividends of data agglomeration. In western regions, the priority should be addressing digital infrastructure gaps. Pilot AGG zones should be established in comparatively advanced areas, supported by computing power centers and high-speed networks. Privacy-preserving computation and blockchain technologies should be introduced to resolve data security and cross-border flow challenges, preventing the formation of “data trough.” Eastern regions should transition toward high-quality AGG. Enterprises should be supported in participating in international data standard-setting, while platforms like the Yangtze River Delta and Beijing-Tianjin-Hebei should establish data element benefit-sharing mechanisms to extend innovation chains toward high-end applications. Additionally, a regionally coordinated data element trading market should be established to facilitate cross-regional data flow through initiatives like “East Data West Computing.” Performance indicators for AGG efficiency should be incorporated into local government evaluations, forming a multi-tiered policy system with central coordination, local implementation, and enterprise participation. Ultimately, this will achieve balanced enhancement of INNO through AGG.
Deficiency and Prospect
Although this article empirically analyses the impact of AGG on INNO and makes suggestions based on the actual situation, there are still some limitations, which are the focus of our team’s future research: first, the subject of this article’s research mainly focuses on the macro-provincial level, and there is a lack of research on the behavior and decision-making of enterprises in different industries at the micro-level. During the phase of digital transformation for enterprises, the importance of data element is highlighted, how to use data to empower enterprise innovation as well as to prevent the disorderly expansion of enterprise data capital behavior deserves further in-depth research. Second, this article is not systematic and comprehensive enough to study the internal mechanism of AGG affecting INNO, and more intermediary variables or moderating variables will be introduced to reveal the “black box” of the mechanism of AGG affecting INNO more comprehensively.
Footnotes
Acknowledgements
We are very grateful to editors and anonymous reviews for reviewing this article.
Ethical Considerations
This article does not contain any studies with human participants performed by any of the authors.
Consent to Participate
Not applicable.
Consent for Publication
Not applicable.
Author Contributions
Conceptualization: Yijia Yuan; Methodology: Yingying Zhu; Software: Yingying Zhu, Min Kang, and Jialin Zhang; Writing—original draft: Yijia Yuan and Yingying Zhu; Writing—review and editing: Yijia Yuan, Jialin Zhang, and Min Kang; Funding acquisition: Min Kang; Resources: Yijia Yuan, Yingying Zhu, and Jialin Zhang; Supervision: Min Kang.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This article was supported by the Shandong Provincial Natural Science Foundation (ZR2025QC778). Shandong Provincial Higher Education Philosophy and Social Sciences Research Program (2025ZSYB075).
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data Availability Statement
The data in this manuscript is from 2012 to 2022 and includes 11 variables, including one independent variable, one dependent variable, three mechanism variables, and six control variables. The data for all variables is analyzable. The disclosure of the materials analyzed during the current study is subject to the restrictions under an ongoing project. The corresponding author is willing to share the datasets upon any reasonable request under necessary confidentiality agreements.
