Abstract
Digital transformation has accelerated corporate innovation and strengthened competitive advantage, yet its impact on one of the most critical innovation inputs—corporate R&D expenditure—remains underexplored. This study addresses this gap by employing machine learning-based text analysis on 27,163 observations of Chinese A-share listed firms from 2012 to 2021. By integrating the perspectives of agency theory and dynamic capabilities, I find a positive relationship between digital transformation and corporate R&D expenditure. Additionally, five key dimensions of digital transformation—artificial intelligence, blockchain, cloud computing, big data, and digital technology applications—are shown to significantly increase R&D expenditure. The study further provides evidence that digital transformation enhances the effectiveness of R&D expenditure in producing innovative outcomes. These findings not only advance the theoretical frameworks of agency theory and dynamic capabilities but also offer practical insights for firms seeking to optimize digital strategies to maximize innovation outcomes. While focused on Chinese firms, the results have broader implications for other emerging markets where digital transformation is evolving rapidly.
Keywords
Introduction
Technological advances in artificial intelligence, blockchain, cloud computing, big data, and digital technology applications have catalyzed a paradigm shift toward digital transformation on a global scale, fundamentally altering industrial structures and competitive dynamics (Guo et al., 2023; Peng & Tao, 2022; Tian et al., 2022; Tu & He, 2023). For firms today, digital transformation has become essential for maintaining competitiveness, reshaping traditional business models, and enabling new approaches to value creation (Loonam et al., 2018; Zhuo & Chen, 2023). The economic implications of digital transformation—through its stimulation of product innovation, improvement in operational efficiency, and enhancement of financial performance—are now widely acknowledged, and this trend has attracted growing interest from economists and policymakers alike (Hess et al., 2020; Petrů et al., 2020; Reuschl et al., 2022; Verhoef et al., 2021).
Within this digital transformation landscape, firms increasingly leverage their digital capabilities to foster innovative products and business models, fundamentally shifting how they allocate resources, especially research and development (R&D). Innovation remains a critical driver of economic growth, with R&D expenditure as a primary input that fuels sustainable innovation (Kogan et al., 2017; Radonić et al., 2021). Yet, despite the essential role of R&D in driving growth, firms often face challenges in securing sufficient funding, limiting their ability to sustain long-term innovation efforts (Du et al., 2022; Foray et al., 2012). Given the transformative shifts introduced by digital technologies across production and governance structures, examining the specific impact of digital transformation on corporate R&D expenditures presents a crucial, though relatively underexplored, research area with significant implications for both corporate strategy and economic policy.
While existing studies emphasize the transformative potential of digital technologies, few have disaggregated digital transformation into its distinct technological components—namely artificial intelligence, blockchain, cloud computing, big data, and digital technology application—and examined their unique effects on corporate R&D expenditure. This disaggregation is vital for firms, as different digital capabilities inform strategic choices and guide the allocation of resources, ultimately determining the direction, and effectiveness of their digital transformation efforts (Zhuo & Chen, 2023). Additionally, R&D expenditure is widely recognized as a critical determinant of innovation performance, which is of paramount importance to both sustained economic growth and competitive advantage. As digital transformation accelerates the adoption of new technologies globally, a clearer understanding of whether, and how, these digital investments facilitate the conversion of R&D efforts into measurable innovation outcomes holds considerable relevance, particularly in emerging economies where digital adoption and innovation are pivotal to economic resilience and growth.
Based on the aforementioned research context, this study seeks to address these gaps by exploring the relationship between digital transformation and corporate R&D expenditure, with a focus on China—a major player in the digital economy, ranking second globally on the digital economic scale. According to the “White Paper on China’s Digital Economy Development (2021)” by the China Academy of Information and Communications Technology, China’s digital economy reached 39.2 trillion yuan (roughly 5.4 trillion USD), accounting for 38.6% of its GDP (Tian et al., 2022). Investigating Chinese firms provides valuable insights into how digital transformation might drive innovation outcomes, especially within emerging markets undergoing rapid digital advancement. Specifically, this study aims to answer three core questions: (1) Does digital transformation increase corporate R&D expenditure? (2) If so, which dimensions of digital transformation play a key role in driving this enhancement? (3) Can digital transformation moderate the relationship between corporate R&D expenditure and innovation performance, potentially enhancing innovation outcomes?
This study makes three key contributions. First, I construct a comprehensive digital transformation index by applying machine learning-based text mining to annual reports from listed firms. I employ the unsupervised learning algorithm word2vec, which generates vector representations of words based on their contextual relationships across large text corpora. This technique is particularly suited for capturing nuanced insights, as it identifies word associations and semantic similarities, thereby enabling a refined measure of digital transformation across firms. By calculating word vector similarities within our dataset, I move beyond general metrics to gain firm-specific insights into digital transformation activities, providing an in-depth perspective on its impact.
Second, I disaggregate digital transformation into five primary dimensions—artificial intelligence, blockchain, cloud computing, big data, and digital technology applications—and investigate each dimension’s influence on R&D expenditure. By opening the “black box” of digital transformation, this approach deepens our understanding of how specific digital strategies affect economic outcomes, shedding light on the mechanisms by which distinct digital tools and technologies drive corporate innovation investments.
Third, I examine whether digital transformation enhances the translation of R&D investments into innovation performance, thereby informing a broader economic perspective on digital transformation’s role in fostering productive innovation. By focusing on this moderating effect, I contribute to the broader discourse on how digital initiatives can amplify R&D’s impact on innovation, a topic of critical interest to economists, and policymakers striving to support innovation-driven growth, particularly in digitally evolving markets.
The paper is organized as follows: Section 1 reviews relevant literature on digital transformation and its influence on corporate innovation and R&D, establishing the research hypotheses. Section 2 describes the research design, including data sources, variable definitions, and empirical models. Section 3 presents the empirical findings, robustness checks, and extended analyses. Finally, Section 4 concludes by summarizing the study’s key insights and discussing implications for future research and policy.
Literature Review and Hypothesis Development
Literature Review
In terms of the literature on digital transformation, many recent studies have examined digital transformation, establishing a general consensus on its definition. Specifically, digital transformation refers to an organization’s strategic adoption and utilization of digital technologies to innovate and reshape its products, services, and operational processes. Vial (2019) describes digital transformation as an adaptive response to environmental changes, achieved by integrating digital tools like mobile computing, artificial intelligence, and cloud computing to overhaul value creation. Likewise, Hess et al. (2020) define digital transformation as incorporating advanced technologies to foster novel product lines, innovative value creation methods, and restructured organizational frameworks. Gong and Ribiere (2021) expand on this by framing digital transformation as a significant shift driven by technology, leading to the generation of advanced value. Additionally, Verhoef et al. (2021) outline digital transformation as a three-stage process: digitization of analog information, digital adaptation of business processes, and strategic transformation of the business model.
Most research investigating the economic impact of digital transformation emphasizes its role in enhancing production and operational aspects within organizations. One central viewpoint suggests that digital technologies enable firms to transcend internal resource constraints, allowing access to external resources for strategic growth in new markets and product development (Çera et al., 2020; Chan et al., 2019; Jum’a & Alkhodary, 2024; Rakovic et al., 2024). Some scholars argue that businesses can improve production efficiency by leveraging digital transformation. This enhancement is achievable through the collection and utilization of consumer data (Earley, 2014), adaptation of organizational structures (Kretschmer & Khashabi, 2020), and adjustments in value creation methods (Rachinger et al., 2019). Furthermore, corporate digital change extends organizational reach by facilitating connectivity and collaboration within supply chains (Raj et al., 2024; Reuschl et al., 2022; Zheng & Chen, 2024), while concurrently improving operational efficiency, value generation, and innovation through ecosystems and platforms (Chan et al., 2019; Gong & Ribiere, 2021; Rachinger et al., 2019). Recent studies have also shown how digital platforms contribute to optimizing innovation ecosystems, driving sustainable corporate growth (e.g., Pu, 2025; Secundo et al., 2024).
Previous studies highlight that sustained R&D expenditures enhance product, service, and technological optimization within the innovation process, contributing to corporate productivity (e.g., Doraszelski & Jaumandreu, 2013; Parisi et al., 2006). Moreover, R&D spending allows firms to secure long-term competitive advantages, transforming R&D practices and strategies into profitable technological innovations that yield substantial returns and increased market valuation (Falk, 2012; Polívka & Dvořáková, 2023; Stacho et al., 2023; Toader et al., 2023). For instance, J. Zhang and Liang (2012) examined the Chinese new energy sector, finding that firm-level R&D expenditures positively impact eco-innovation performance. Their research also highlighted the importance of external resources, including ownership structures and government subsidies, in enhancing R&D-driven performance. At the corporate governance level, Rodrigues et al. (2020) investigated the influence of governance mechanisms on corporate investment, particularly in R&D. Their findings suggest that ownership concentration alleviates agency issues linked to R&D investments, underscoring the importance of governance and external support in maximizing R&D’s benefits for innovation outcomes. Recent literature similarly notes the significance of external financing and collaborative innovation efforts in sustaining R&D-driven growth, particularly in high-tech industries (Leung & Sharma, 2021; Li et al., 2024).
In exploring the relationship between digital transformation and corporate innovation activities, initial studies primarily focused on the role of the Internet. The rapid adoption of the Internet has been instrumental in accelerating the diffusion of knowledge and information, thus fostering corporate innovation (Peng & Tao, 2022; Vial, 2019). Furthermore, Ferreira et al. (2019) analyzed how digital transformation influences innovation, particularly in service and process innovation, shedding light on its positive impact on specific corporate innovation dimensions. More recent research has expanded this view, showing that digital tools like AI and cloud computing are now core to enhancing innovation by enabling data-driven decision-making, automation, and adaptive business processes (Bag et al., 2021; Duan et al., 2019).
In summary, the existing literature on digital transformation primarily focuses on characterizing its features and exploring its influence on business models, organizational structures, and management strategies. However, there remains limited research on the direct effects of digital transformation on corporate R&D expenditures, signifying a need for deeper investigation and analysis in this area.
Hypothesis Development
The Relationship Between Digital Transformation and Corporate R&D Expenditure
Digital transformation has emerged as a critical factor in enhancing corporate R&D capabilities, as it significantly reshapes the way firms manage information, resources, and strategic decision-making. From the perspective of agency theory, one of the core advantages of digital transformation is its ability to reduce information asymmetry between managers and shareholders, thereby addressing long-standing agency conflicts. In typical corporate structures, managers are responsible for operational oversight, including the management of innovation activities and financial disclosures (Iqbal et al., 2022). However, shareholders often face challenges in accessing complete and accurate insights into these operations, resulting in a potential misalignment of interests and decision-making priorities (O’Connor & Rafferty, 2012). Moreover, the inherently risky and long-term nature of R&D investments may encourage managers to prioritize short-term financial performance over strategic, long-term innovation, potentially compromising shareholder interests (Y. Chen & Huang, 2023; Yuan & Wen, 2018).
Digital technologies, such as big data, cloud computing, and blockchain, have transformed corporate information landscapes by enhancing transparency and reducing the scope for managerial discretion in financial reporting (Cong & He, 2019). For instance, these technologies enable firms to automate the processing and analysis of financial data, significantly reducing the risk of information distortion and ensuring more accurate and timely R&D disclosures (Zhuo & Chen, 2023). This increased transparency can effectively minimize the information gaps between managers and shareholders, thereby reducing the risk of misinformation and enhancing overall governance (Ferreira et al., 2019; Iqbal et al., 2022; Peng & Tao, 2022).
Additionally, digital transformation facilitates the integration and synchronization of dispersed resources, creating a more cohesive and efficient innovation ecosystem. Automated financial systems can streamline resource allocation, reduce manual errors, and improve decision-making accuracy, which collectively strengthen the financial sustainability of R&D activities (Llopis-Albert et al., 2021; Urbinati et al., 2020). This approach not only reduces the likelihood of managerial opportunism but also aligns short-term operational goals with long-term innovation objectives, promoting a more stable and transparent financial environment. Based on these considerations, I propose the following hypothesis:
The Relationship Between the Sub-dimensions of Digital Transformation and Corporate R&D Expenditure
Dynamic capabilities, as derived from the resource-based view, were introduced by Teece et al. (1997), emphasizing the ability of firms to build, integrate, and reconfigure internal and external competences to address rapidly changing environments. According to Teece (2007), dynamic capabilities primarily involve three functions: sensing opportunities and threats, seizing them, and reconfiguring resources. These capabilities are critical for fostering innovation and R&D investments, as they enable firms to respond swiftly to technological shifts, market demand changes, and competitive pressures (Chakrabarty & Wang, 2012). The five dimensions of digital transformation—artificial intelligence, blockchain, cloud computing, big data, and digital technology applications—serve as key drivers of firms’ dynamic capabilities, thereby enhancing R&D expenditure through their respective attributes.
Based on the discussion above, the five dimensions of digital transformation significantly enhance a firm’s dynamic capabilities and strengthen its competitiveness in the rapidly evolving R&D landscape. These dimensions likely encourage increased R&D expenditure by improving the firm’s ability to sense, seize, and reconfigure resources. Therefore, I posit:
The Role of Digital Transformation on the Conversion of Corporate R&D Expenditure Into Innovation Outcomes
The Agency Theory Perspective
According to agency theory (Jensen & Meckling, 1976), there is often a misalignment between the interests of managers and shareholders, particularly when it comes to investments in R&D (X. Wang, 2024). Managers may be reluctant to engage in high-risk, long-term R&D projects due to the uncertainty involved, potentially resulting in inefficiencies and underutilization of R&D resources (Muhammad et al., 2024).
Digital transformation, through technologies such as AI, big data analytics, and blockchain, improves managerial oversight, transparency, and accountability (Li et al., 2024). These technologies reduce agency costs by providing better monitoring mechanisms and clearer data on R&D performance, which aligns managerial actions more closely with shareholder interests. As a result, the positive effects of R&D expenditure are amplified, leading to greater innovation outcomes. For instance, AI and data analytics can significantly improve project selection by using predictive modeling to assess the potential success of R&D projects (Sharma & Chanda, 2017). This reduces uncertainty and ensures that capital is allocated toward projects with higher expected returns, thereby increasing the efficiency of R&D expenditures (Lăzăroiu et al., 2023). Blockchain technology enhances trust in multi-party R&D collaborations, enabling better resource utilization and ultimately contributing to higher innovation performance (Queiroz & Fosso Wamba, 2019).
The Dynamic Capabilities Perspective
Dynamic capabilities emphasize the ability of firms to integrate, build, and reconfigure internal and external competencies to address rapidly changing environments (Teece, 2007; Teece et al., 1997). Digital transformation enhances a firm’s dynamic capabilities by enabling more agile responses to market shifts and technological changes (Agrawal et al., 2019; Chakrabarty & Wang, 2012). Specifically, digital tools like cloud computing, AI, and big data improve a firm’s ability to sense new innovation opportunities, seize them through strategic decision-making, and reconfigure resources to maximize R&D efficiency (Bag et al., 2021; Duan et al., 2019; Heo et al., 2024).
By utilizing digital technologies, firms can better identify emerging market trends and shifts in consumer demand, which improves the alignment between R&D efforts and external opportunities. For example, cloud computing provides the flexibility to scale R&D operations according to evolving needs, ensuring that resources are deployed efficiently and in real time (H. Chen et al., 2015). This heightened agility in resource allocation allows firms to extract greater value from R&D expenditure, translating it more effectively into innovation performance (Liu et al., 2018).
Moreover, digital transformation supports the reconfiguration of R&D processes to optimize knowledge flows, reduce redundancies, and improve collaboration both within and across organizations (Camarinha-Matos et al., 2019; Warner & Wäger, 2019), all of which contribute to enhanced innovation outcomes. This dynamic adaptability ensures that R&D expenditures are not only maintained but also strategically enhanced, leading to superior innovation performance.
In summary, the above discussion leads me to hypothesize that:
Data and Methodology
Data and Sample
This study is based on data from Chinese A-share listed companies from 2012 to 2021. To ensure the sample’s relevance and quality, a purposive sampling technique was employed, focusing on companies that meet specific criteria aligned with the research objectives. The data used in the analysis includes R&D expenditure and patent application data sourced from the CNRDS database, digital transformation information extracted from corporate annual reports, and additional financial and firm-specific data obtained from the CSMAR database. The data preprocessing involved the following steps: (1) excluding financial firms, given their distinct governance structures and financial performance characteristics compared to non-financial firms in China; (2) removing companies under Special Treatment (ST) status due to consecutive losses over 2 years, as per Chinese securities regulations, to eliminate the potential impact of abnormal financial conditions and possible delisting risks; (3) excluding firms with missing data, to enhance the robustness and usability of the dataset. After data preprocessing, a total of 27,163 firm-year observations were retained for the analysis.
Variables
Dependent Variable
The dependent variable of our research is R&D expenditure (RD), with a natural logarithm of the R&D expenditure disclosed by firms used as the measure of RD. Drawing on relevant literature in the R&D field (e.g., Hu et al., 2020; Iqbal et al., 2022; Mao et al., 2020; Yuan & Wen, 2018), I employ four additional metrics for robustness checks: the ratio of current year R&D expenditure to current year total assets (RD1), the ratio of current year R&D expenditure to current year total revenue (RD2), the ratio of current year R&D expenditure to previous year total assets (RD3), and the ratio of current year R&D expenditure to previous year total revenue (RD4).
Independent Variable
The independent variable in this study is digital transformation (DT), which captures the extent to which firms adopt and integrate digital technologies. DT reflects a company’s strategic focus on leveraging digital tools to enhance operations and innovation capabilities, often articulated through corporate disclosures such as annual reports (Kindermann et al., 2021). Given the strategic importance of these documents, text mining methods were applied to systematically assess firms’ digitalization levels.
Following the approach of Tu and He (2023) and Pu and Zulkafli (2024a), this study utilized Python to gather annual reports from the selected sample firms. The digital transformation measure was constructed based on five core dimensions: Artificial Intelligence (AI), Blockchain (BC), Cloud Computing (CC), Big Data (BD), and Digital Technology Application (DTA). The construction of this variable involved several key steps: (1) seed word identification. Initially, a set of foundational keywords, including “artificial intelligence,”“cloud computing,”“blockchain,”“big data,” and “digital technology application,” was identified. These terms represent the core technologies associated with digital transformation; (2) text preprocessing. Using Python’s Jieba library for Chinese word segmentation and Re module for regular expression matching, all extracted textual content underwent a rigorous preprocessing phase. This included the removal of stop words, punctuation, and irrelevant characters to ensure a clean, analyzable text corpus; (3) word embedding model. The preprocessed text was then analyzed using the word2vec model, a widely adopted machine learning algorithm for generating word vectors. This model effectively captures the contextual relationships between words, enabling the identification of semantically similar terms beyond the initial seed words; (4) seed dictionary construction and frequency calculation. The trained word2vec model produced a seed dictionary encompassing the five digital transformation dimensions. This dictionary was subsequently employed to identify and count the occurrences of digital transformation-related keywords within the annual reports; (5) frequency normalization. Given the typical right-skewed distribution of keyword frequencies in corporate disclosures, the resulting counts were transformed using the natural logarithm function, specifically LN (keyword occurrence frequency + 1). This approach mitigates the impact of extreme outliers and allows for a more balanced representation of firms’ digital transformation intensity.
Control Variables
This study controlled for several factors closely linked to R&D expenditure, drawing on prior research (e.g., Choi et al., 2011; Hou et al., 2017; Hu et al., 2020; Yuan & Wen, 2018; R. Zhang et al., 2022). Firm Size (FS) is measured as the logarithm of total assets, as larger firms typically have more financial resources and a greater capacity for R&D investment. Firm Age (FA), defined as the natural logarithm of the years since establishment plus one, often correlates with accumulated experience and stability, which may impact R&D spending. Financial Leverage (LEV), calculated as the ratio of total debts to total assets, indicates debt reliance, where higher leverage may limit funds available for R&D due to debt repayment obligations. Return on Assets (ROA), the ratio of net income to total assets, reflects profitability, with more profitable firms potentially having greater resources to allocate toward R&D. Sales Growth (SG), the percentage change in operating income from the previous year, indicates revenue expansion, as firms with rapid sales growth may increase R&D investments to sustain their competitive advantage. Board Size (BS), defined as the logarithm of the total number of directors, influences corporate governance and strategic decisions, including R&D commitments. Ownership Concentration (OC), or the percentage of shares held by the largest shareholder, acts as a proxy for concentrated ownership, potentially impacting strategic decisions that either limit, or foster R&D. Finally, Institutional Ownership (IO), represented by the ratio of shares held by institutional investors to total shares, reflects investor influence, as institutional investors often prefer long-term value creation, including R&D investment.
Empirical Specification and Data Analysis Techniques
Following prior studies (e.g., Iqbal et al., 2022; Oliver & Gujarati, 1993; Yuan & Wen, 2018; R. Zhang et al., 2022), I employ a fixed effects model (FEM) to examine our hypotheses concerning the impact of digital transformation on corporate R&D expenditure (H1 and H2). Among the three primary approaches for panel data estimation—Pooled Ordinary Least Squares (POLS), Fixed Effects Model (FEM), and Random Effects Model (REM)—the FEM was selected based on diagnostic tests including the
where
To address heteroscedasticity and potential autocorrelation issues, I use firm-level clustered robust standard errors (Cameron & Miller, 2015). I also perform a series of diagnostic tests to confirm the model’s suitability. First, I check for multicollinearity using Variance Inflation Factors (VIF) to ensure no high intercorrelation among variables distorts our estimates. The model framework and selected control variables align with established literature (e.g., Choi et al., 2011; Hou et al., 2017; Hu et al., 2020; Yuan & Wen, 2018; R. Zhang et al., 2022), which highlights that those specific firm characteristics are significant factors influencing corporate R&D activities. Additionally, all continuous variables are winsorized at the 1st and 99th percentiles to reduce the influence of outliers.
Empirical Results and Discussion
Descriptive Statistics
The descriptive statistics for the main variables in our study are presented in Table 1, including the number of observations, mean, standard deviation, minimum, and maximum values for each variable.
Descriptive Statistics.
Corporate R&D Expenditure (RD) has a mean of 15.643 and a standard deviation of 6.125, with values ranging from 0 to 21.669, which shows substantial variation in R&D investment levels across firms. RD1, RD2, RD3, and RD4 are indicators of specific R&D investment components used in robustness tests, reflecting alternative measurements of R&D allocation. Their descriptive statistics also indicate significant variability, which is consistent with the diversity of corporate R&D expenditure strategies documented in previous literature (e.g., Pu & Zulkafli, 2024b).
Digital Transformation (DT) displays a mean of 1.415 and a standard deviation of 1.388, with values ranging from 0 to 5.056. This spread implies that while digital transformation adoption is advancing among Chinese firms, the majority are at relatively early stages. The binary indicator, DT_Dummy, has a mean of 0.663, showing that approximately 66.3% of the firms have initiated digital transformation to some extent. Additionally, digital transformation is captured through five specific dimensions: Artificial Intelligence (AI), Blockchain (BC), Cloud Computing (CC), Big Data (BD), and Digital Technology Application (DTA). AI has a mean of 0.348, BC a mean of 0.015, CC a mean of 0.543, BD a mean of 0.514, and DTA a mean of 0.944, each showing distinct distributions across firms. These dimensions reflect the multi-faceted nature of digital transformation and highlight substantial variability in firms’ engagement across different digital technologies.
Innovation Performance Indicators are captured through two measures, IP1 and IP2. Following Yuan and Wen (2018), IP1 represents the natural logarithm of 1 plus the total number of the company’s patents (including invention patents, design patents, and utility model patents), with a mean of 2.642. IP2 represents the natural logarithm of 1 plus the company’s invention patents specifically, with a mean of 1.863. These indicators enable us to capture different aspects of firms’ innovation performance.
The control variables are as follows: Firm Size (FS), with a mean of 22.267, reflects the typical scale of firms in our sample, while Firm Age (FA), with a mean of 2.921, represents a balanced sample in terms of the firms’ operational maturity. Financial Leverage (LEV) has an average of 0.422, showing moderate debt levels across firms. Return on Assets (ROA), a measure of profitability, ranges from −0.239 to 0.222, suggesting that while some firms operate at a loss, others are profitable, with variability in performance across the sample. Sales Growth (SG) also shows positive and negative values, indicating the presence of both expanding and contracting firms in the sample. Board Size (BS) has a mean of 2.122, indicating that most firms have relatively standard board sizes. Ownership Concentration (OC) and Institutional Ownership (IO) have means of 34.206 and 44.383, respectively, indicating a moderately high concentration of ownership and institutional investor presence in the sample.
Overall, the level of descriptive analysis provides a foundation for understanding the range and distribution of key variables, aligning well with established literature (e.g., Yuan & Wen, 2018; R. Zhang et al., 2022).
Correlation and Variance Inflation Factor Analysis
The Pearson correlation matrix of the major variables is shown in Table 2. Generally, the correlation coefficients between independent and control variables almost are less than 0.50, and this study further conducts the multicollinearity diagnostic test among the continuous variables. Each of the control variables shows a low VIF from the test (less than 2), which indicates no serious multicollinearity issue in our model.
Pearson Correlation and Variance Inflation Factor Analysis.
Univariate Analysis
In the pursuit of univariate analysis, this study introduced a dummy variable, denoted as Dummy (DT). The variable assumes a value of 1 when the annual report of a firm incorporates keywords or their associated synonyms indicative of digital transformation through machine learning; conversely, it assumes a value of 0 otherwise.
Table 3 presents the results of the T-test regarding the differences between firms with DT and firms without DT in our sample firms. The mean of RD is 16.191 for firms having DT and 14.566 for firms without DT, and the differences are both statistically significant at the 1% level. This means that firms with digital transformation have higher R&D expenditure than firms without digital transformation. Additionally, the positively sloped linear regression line in Figure 1 indicates a positive correlation between DT and RD. These findings imply that firms undergoing digital transformation exhibit higher levels of R&D expenditure compared to those that do not engage in digital transformation
Univariate Analysis.
Denotes statistical significance at the 1% level.

The linear fit of DT and RD.
Multivariate Results
In this section, I present the baseline regression results examining the relationship between DT and RD to test Hypothesis 1 (H1). All regression models incorporate both year and industry fixed effects to control for industry-specific heterogeneity and the influence of temporal shocks or policies. The FE estimation employs corrected standard errors clustered at the firm level for robust statistical inference.
Column (1) of Table 4 details the regression outcomes testing the impact of DT on RD. The coefficient for DT is significantly positive at the 1% level (α1 = .618), indicating a robust positive effect of DT on RD without control variables. As I progress to Columns (2) and (3), which sequentially incorporate control variables, I observe that the positive relationship between DT and RD persists and remains significant at the 1% level, even with the addition of these controls. This consistency underscores that digital transformation significantly boosts R&D expenditure, independent of other variables.
Main Results.
, *** indicate statistical significance at the 10% and 1% levels, respectively; robust standard errors are clustered at the firm, and the related
The results also provide further insights into the control variables. FS, with a significant positive coefficient at the 1% level, indicates that larger firms tend to invest more in RD, likely due to their broader resource base and capacity for innovation. ROA also shows a positive association with RD, implying that firms with higher profitability are more inclined to allocate funds for R&D, possibly to sustain long-term growth. On the other hand, FA, LEV, and IO exhibit significantly negative coefficients, suggesting that older firms, those with higher debt obligations, and those with larger institutional ownership might face constraints or strategic orientations that limit R&D spending. These results align with previous studies (e.g., Hou et al., 2017; Pu & Zulkafli, 2024b; Sunder et al., 2017) that highlight these characteristics as possible inhibitors to R&D investment. It’s notable that other factors such as SG, BS, and OC did not show significant effects on R&D expenditure, suggesting that their influence may not be as critical for RD decisions within this sample.
Overall, the result confirms a positive relationship between the digital transformation indicators obtained through machine learning and R&D expenditure, emphasizing the significance of digital technologies in driving innovation. This finding aligns with the work of Lukić (2017), who argue that organizations leveraging digital technologies are better positioned to enhance their innovation capabilities, thereby fostering a more dynamic R&D environment. While the measure of RD in this study differs from that used by Ferreira and Teixeira (2019), who employed new product development as a metric of innovation, the conclusions drawn are similar. Notably, Kane et al. (2015) highlight that firms investing in digital transformation initiatives tend to experience a greater alignment between their R&D strategies and market demands, resulting in improved innovation outcomes. This evidence further substantiates the role of digital transformation as a driving force in enhancing innovation inputs by breaking organizational boundaries and bridging information gaps.
Robustness Test
So far, the estimation reveals a positive relationship between digital transformation and corporate R&D expenditure, this section performs a variety of additional tests to check the robustness of the baseline results.
(1) Alternative dependent variables. I utilize R1, R2, R3, and R4 as dependent variables to re-examine our primary baseline model (1). The regression results are presented in columns (1) to (4) of Table 5. The estimated coefficients of the key variables exhibit similar magnitudes and consistent signs compared to the previous analysis, further confirming the robustness of our baseline conclusions.
(2) Alternative independent variable. To ensure the robustness of our results, I replaced the continuous variable DT with a binary variable, DT_Dummy, which takes the value of 1 if the firm has undergone digital transformation and 0 otherwise. The regression analysis in column (5) of Table 5 shows that the coefficients for DT_Dummy have the same signs and consistent significance as those of DT.
(3) Subsample regression test. The COVID-19 pandemic led to the rapid adoption of remote work, compelling companies to quickly implement digital tools to support their employees’ R&D activities (Bachmann & Frutos-Bencze, 2022). Additionally, high-tech firms inherently possess advanced technological capabilities, making it easier for them to implement and leverage emerging digital technologies (Sergei et al., 2023). Considering the unique nature of the sample period and the specific characteristics of high-tech firms, I accordingly excluded the samples from 2020 and 2021 as well as the high-tech firm samples. The DT results in columns (6) and (7) of Table 5 support the previous regression results.
(4) Adjusting the robust standard errors from firm level to industry level. To address potential industry-level correlations that may be overlooked by firm-level adjustments, I further transitioned robust standard errors from the firm level to the industry level to mitigate heteroscedasticity and autocorrelation issues. The estimates in column (8) of Table 5 show consistent magnitude and significance, confirming the robustness of our findings.
(5) Extra fixed effects. To further control for geographical variations and potential local-specific factors, I introduced province and city-level fixed effects based on the location of company registration. The findings in column (9) of Table 5 remain consistent with our previous conclusions, reinforcing the robustness of the results.
Robustness Test.
indicate statistical significance at the 1% level; robust standard errors are clustered at the firm or industry level, and the related t-statistics are reported in brackets.
Endogenous Treatment
The potential endogeneity issues in the DT-RD relationship can lead to a bias in the OLS estimates. Therefore, I employed the propensity score matching (PSM) method and generalized method of moments (GMM) estimation techniques to address sample selection biases, omitted variable concerns, and reverse causality issues effectively.
On the one hand, employing the PSM method, I strive to ensure comparability between firms with DT (treatment group) and those without DT (control group) across other dimensions. Specifically, I use firm-level control variables as covariates and establish 1:1 and 1:3 nearest neighbor matches (see Figure 2a and 2b). The first and second columns of Table 6 report the corresponding re-estimations of the model using different matching ratios. The results indicate that the impact of DT on RD remains consistent irrespective of the matching ratio used.

(a) Balance of treatment and control (PSM 1:1) and (b) balance of treatment and control (PSM 1:3).
Endogenous Treatment.
, ** indicate statistical significance at the 10% and 1% levels, respectively; robust standard errors are clustered at the firm or industry level, and the related
On the other hand, while controlling for a range of factors influencing RD, alongside industry and time fixed effects, I acknowledge potential issues such as omitted variable bias and endogeneity arising from bidirectional causality. GMM is particularly suitable for mitigating these issues (Bond, 2002). Therefore, I employ two-step Difference GMM (DIF-GMM) and two-step System GMM (SYS-GMM) estimation methods to address endogeneity concerns affecting the relationship between DT and RD. Columns (3) and (4) of Table 6 display the outcomes from the dynamic regression of DT on RD. The coefficients associated with DT exhibit statistical significance at the 1% or 5% level following rigorous Arellano-Bond and Hansen overidentification tests, underscoring the robustness of our regression findings.
Placebo Test
Following previous studies (e.g., Z. Chen & Jiang, 2024), I estimated the impact of randomly generated DT on RD using a placebo approach to further address concerns regarding unobserved variables. I created a counterfactual set by randomly selecting numbers and running model (1) 500 and 1,000 times. Figures 3a and 3b visually depict the estimated values and

(a) Coefficients of DT randomly assigned on RD (500) and (b) coefficients of DT randomly assigned on RD (1,000).
Extended Analysis
Dimensions of Digital Transformation
The empirical results previously presented indicate that DT enhances RD. An intriguing question arises regarding the dimensions driving this effect.
As discussed in the hypothesis development, numerous studies highlight the transformative potential of artificial intelligence (AI), blockchain (BC), cloud computing (CC), big data (BD), and digital technology applications (DTA) in reshaping organizational operations and fostering innovation (e.g., Guo et al., 2023; Simsek et al., 2019; Srinivasan & Venkatraman, 2018). These perspectives generally support the view that advanced digital technologies provide unprecedented opportunities for firms to optimize innovation processes, enhance R&D decision-making capabilities, promote R&D sustainability, and mitigate agency problems. However, there is currently no clear answer as to which dimension of digital transformation can navigate the complex technological landscape and drive firms to increase RD.
To explore the relationship between the different dimensions of DT and RD, I decompose the indicators, based on digital transformation keywords, into five distinct dimensions: AI, BC, CC, BD, and DTA. Columns (1) to (5) of Table 7 present the results for AI, BC, CC, BD, and DTA, respectively. The overall empirical findings indicate that the five dimensions of digital transformation—AI, BC, CC, BD, and DTA—significantly promote R&D expenditures. These findings are broadly consistent with Wang et al. (2022), who argue that digital technologies not only provide essential tools and infrastructure to support innovative activities but also create a favorable paradigm for continuous improvement and exploration of new R&D support.
Dimensions of digital transformation and corporate R&D expenditures.
indicate statistical significance at the 1% level; robust standard errors are clustered at the firm or industry level, and the related
Moderating Effect of Digital Transformation on Innovation Consequences
Our previous research results indicate that digital transformation promotes R&D across multiple dimensions, providing insights into the dynamics of this relationship. Evidence suggests that continuous R&D investments support firms in developing new technologies and products, fostering technological progress, and thereby improving innovation performance (Leung & Sharma, 2021). In other words, stable and increased R&D not only helps maintain a firm’s competitive advantage in rapidly changing markets but also ensures sustainable innovation returns in the long term.
The moderating role of DT is introduced here for several reasons. First, DT has the potential to accelerate knowledge absorption, cross-departmental communication, and process optimization, all of which enhance the effectiveness of R&D expenditures in producing innovation outputs (Z. Chen & Jiang, 2024; Kretschmer & Khashabi, 2020; Sergei et al., 2023). DT can also facilitate real-time data analytics, which helps firms swiftly respond to market demands, streamline R&D processes, and refine product development efforts (Llopis-Albert et al., 2021; Peng & Tao, 2022; Reuschl et al., 2022). Moreover, DT may play a key role in reducing uncertainty in the R&D-to-innovation pipeline by offering tools for more accurate project monitoring, risk assessment, and resource allocation, thus improving R&D’s impact on innovation performance (Sergei et al., 2023; Tang et al., 2023; J. Zhang & Liang, 2012).
Given these factors, DT can be hypothesized to enhance the R&D-innovation performance link by optimizing the way R&D investments are translated into valuable innovation outputs. DT’s moderating role offers a deeper understanding of how firms can leverage digital advancements to maximize the returns on their R&D investments. Therefore, I employ the following models to analyze these economic effects.
Here, model (2) examines the direct effect of R&D expenditures (RD) on innovation performance (IP), while model (3) tests the moderating effect of digital transformation (DT) on the relationship between RD and IP. Following Yuan and Wen (2018), I measure IP in two ways. The first indicator, IP1, is the natural logarithm of 1 plus the total number of the company’s patents, including invention patents, design patents, and utility model patents. The second indicator, IP2, is the natural logarithm of 1 plus the company’s invention patents. β1 represents the impact of RD on IP; γ3 denotes the moderating effect of DT on the relationship between RD and IP.
The results in columns (1) and (3) of Table 8 show that RD is significantly positively correlated with innovation performance (IP1 and IP2), indicating that R&D expenditures can enhance corporate innovation performance. The moderating effect of DT is shown in columns (2) and (4), indicating that digital transformation positively and significantly strengthens the positive relationship between RD and IP. Additionally, I executed a marginal effects plot and a kernel density plot of the moderating variable in one graph, providing additional information to analyze the moderating effect of DT on innovation consequences. As seen in Figure 4a and 4b, as DT increases, the marginal effect of RD on innovation performance (IP1 and IP2) gradually increases and is statistically significant. This further reveals the potential of DT in optimizing innovation processes, enhancing R&D investment efficiency, and facilitating the conversion of R&D expenditures into innovation outcomes.
Moderating effect of digital transformation on the relationship between R&D expenditures and innovation performance.
indicate statistical significance at the 1% level; robust standard errors are clustered at the firm or industry level, and the related

(a) Effects of DT on the relationship between RD and IP1 and (b) effects of DT on the relationship between RD and IP2.
Discussion
Our findings demonstrate a positive relationship between corporate R&D expenditure and digital transformation, highlighting how digital capabilities have become central to innovation strategies. As the digital revolution progresses, the integration of technologies such as artificial intelligence, cloud computing, and big data continues to reshape traditional business models. Digital transformation’s multidimensional nature allows it to reformulate economic production and service processes, altering organizational structures, operational efficiency, and value creation paradigms (Guo et al., 2023; Peng & Tao, 2022; Reuschl et al., 2022). Our results underscore the importance of digital transformation as a primary driver of R&D expenditures, in alignment with Vial (2019), who suggests that digital technologies elicit adaptive strategic responses within firms, strengthening digital infrastructures’ dynamic capabilities, and, consequently, positively impacting innovation outcomes.
In terms of potential mechanisms, our findings indicate that digital transformation plays a dual role in fostering corporate R&D expenditures. From an agency theory perspective, the integration of digital technologies reduces information asymmetry between managers and shareholders, thereby lowering agency costs and enhancing managerial confidence in R&D investments—a result consistent with Ferreira et al. (2019), Gong and Ribiere (2021), and Rachinger et al. (2019). This reduced information gap smooths decision-making channels, encouraging proactive investments in innovation. From the lens of dynamic capabilities perspective, originally posited by Teece et al. (1997), digital transformation fosters the core capabilities needed to sense, seize, and reconfigure resources in response to rapid changes. Specifically, the five dimensions of digital transformation—artificial intelligence, blockchain, cloud computing, big data, and digital technology applications—serve as key drivers for these capabilities. Each dimension uniquely enhances firms’ abilities to respond flexibly to new opportunities, allocate resources efficiently, and scale R&D efforts in dynamic environments, aligning with prior research (e.g., Chakrabarty & Wang, 2012; Teece, 2007).
Furthermore, our analysis provides new insights into the moderating effects of digital transformation. Specifically, it appears that digital technologies reduce time inefficiencies in R&D processes, which significantly boosts innovation output. This finding supports Srinivasan and Venkatraman (2018), who emphasized that digital platforms streamline R&D operations, resulting in faster innovation cycles. Our results also extend the existing literature by demonstrating that digital transformation’s influence on innovation is multifaceted, operating through both direct R&D expenditure increases and efficiency improvements in the innovation process.
Conclusions
This paper investigates the influence of digital transformation on corporate R&D expenditure, providing novel insights into how digital transformation drives innovation. The findings reveal that digital transformation significantly increases corporate R&D expenditure levels. To address endogeneity concerns, I utilized propensity score matching and generalized method of moments estimation. Additionally, I conducted a placebo test using two counterfactual datasets constructed with the ordinary least squares method, confirming that our results are robust across various econometric treatments and additional robustness checks.
Building on these findings, I further analyzed the five dimensions of digital transformation by constructing independent variables with the word2vec deep learning module. This analysis demonstrated that all five dimensions are significantly associated with changes in R&D expenditure. Given the pivotal role of R&D in corporate innovation processes, I also explored the moderating role of digital transformation in the relationship between R&D expenditure and innovation performance. These results highlight digital transformation’s potential to optimize the innovation process, enhance R&D investment efficiency, and improve the conversion of R&D expenditure into innovative outcomes.
From a theoretical perspective, the study contributes the understanding of digital transformation’s impact on R&D expenditure by applying both agency theory and the dynamic capabilities framework. The agency theory lens elucidates how digital transformation aligns managerial incentives with long-term innovation goals, reducing agency problems in corporate R&D investment. This helps to clarify the underlying economic and strategic mechanisms at play. Additionally, the dynamic capabilities framework underscores how digital transformation empowers firms to adapt resources swiftly and respond to external pressures, fostering continuous innovation. Integrating these two perspectives offers a comprehensive view of the strategic and adaptive mechanisms through which digital transformation influences R&D, advancing our understanding of its role in driving sustained innovation.
This research also provides critical policy and managerial implications. For policymakers, the finding that digital transformation enhances corporate R&D expenditure suggests that priority should be given to advancing digital infrastructure, strengthening digital property rights protections, and providing subsidies for digital technologies. Such measures can encourage firms to leverage digital tools to boost R&D efficiency and innovation outcomes. Additionally, implementing differentiated policies that cater to the specific digitalization needs of various industries can comprehensively elevate the digital innovation capacity of the economy. For economies beyond China undergoing similar digital transformations, it is essential to support companies in developing digital skills and ensuring access to digital resources. Encouraging the use of artificial intelligence, big data, cloud computing, and blockchain can further improve R&D efficiency. For managers, this research highlights that strategically integrating digital transformation into R&D processes can enhance innovation efficiency and sustain long-term innovation. Digital integration empowers firms to translate R&D investment into tangible innovation, strengthening their competitive position in the global market.
However, this study has certain limitations. First, the sample is limited to Chinese listed companies, suggesting that future research could benefit from comparative studies in other emerging and developed economies. Additionally, I tested only the moderating effect of digital transformation on the relationship between R&D expenditure and innovation performance. Future research could delve into the interplay between these factors and other relevant variables, such as regional digitalization levels, R&D subsidies, and innovation sustainability, to yield deeper insights.
Footnotes
Acknowledgements
The initial inspiration for this research stemmed from my academic exchanges with the late Dr. Yuanba Li at Universiti Sains Malaysia. Although I completed this work independently over the past year, his unwavering support, insightful guidance, and generous encouragement during our time together continue to serve as a source of strength and motivation. Despite the 3,300 kilometers that eventually separated us, his intellectual spirit and kindness remained ever close. I am deeply grateful for the many moments of academic exchange and mutual support we shared at USM. On many quiet nights of writing and reflection, I found myself missing those days when we encouraged one another through our academic journeys. This work is dedicated to his memory.
Funding
The author received no financial support for the research, authorship, and/or publication of this article.
Declaration of Conflicting Interests
The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data Availability Statement
The data that support the findings of this study are available from the corresponding author upon reasonable request.
