Abstract
This study investigates how inventors’ collaboration network structures are associated with breakthrough innovation performance in the new energy vehicle industry from a role-configuration perspective. Drawing on a dataset of 2,372 inventors derived from granted invention patents, the research integrates social network analysis with machine-learning techniques to uncover heterogeneous innovation patterns at the individual level. First, a K-means clustering approach based on inventors’ patent output and number of collaborators identifies three distinct inventor roles: peripheral inventors, collaborator inventors, and star inventors. Second, classification and regression tree models are constructed separately for each group to explore how different configurations of network features including degree centrality, closeness centrality, structural holes, clustering coefficient, and collaboration intensity, are associated with high breakthrough innovation performance. The results reveal substantial role heterogeneity in the network–innovation relationship. For collaborator inventors, closeness centrality plays a dominant role, indicating the importance of efficient knowledge access within moderately connected structures. Peripheral inventors rely primarily on the joint conditions of closeness centrality and structural holes, suggesting that selective brokerage positions compensate for limited collaboration breadth. In contrast, star inventors achieve superior breakthrough performance under a configuration characterized by cohesive local structures combined with relatively low collaboration intensity, highlighting the value of focused and less redundant teamwork. By linking role differentiation with network configurations, this study provides new micro-level evidence on the heterogeneous pathways through which collaboration networks are associated with breakthrough innovation and demonstrates the effectiveness of machine-learning methods for uncovering complex innovation mechanisms.
Plain Language Summary
Breakthrough innovations are those outputs that have a major impact on future technology, yet how collaboration patterns influence an inventor’s ability to produce such high-impact work remains poorly understood. This study examines collaboration networks among 2,372 inventors in the new energy vehicle industry to identify how different network configurations relate to breakthrough innovation. The research first categorizes inventors into three distinct types based on patent output and number of collaborators. Peripheral inventors produce few patents and have few collaborators. Collaborator inventors maintain moderate levels of both output and collaboration. Star inventors generate many patents and work with numerous collaborators. For each group, the study then analyzes which network characteristics are associated with producing breakthrough innovations. The findings reveal that each inventor type follows a distinct pathway. Peripheral inventors achieve breakthroughs when they can bridge different groups while maintaining efficient knowledge access through the network. Collaborator inventors benefit most from quick access to information across the network. Star inventors succeed when embedded in tight-knit teams with focused, repeated collaborations rather than dispersing their efforts across too many connections. These results demonstrate that no single formula exists for breakthrough innovation. Different types of inventors succeed through different network configurations, offering valuable insights for understanding and supporting innovation in the new energy vehicle industry.
Keywords
Introduction
Breakthrough innovation has become a central driver of industrial transformation, long-term competitive advantage, and sustainable economic growth (Capponi et al., 2022; Panait et al., 2022). In technology-intensive sectors, it enables firms to transcend existing technological trajectories and establish new knowledge domains, thereby reshaping market structures and redefining industry leadership. This is particularly evident in the new energy vehicle industry, where radical advances in battery systems, powertrain architectures, intelligent control technologies, and integrated platforms are rapidly redefining the technological frontier (S. Zhao et al., 2025). In such a context, breakthrough innovation is no longer the outcome of isolated R&D efforts but increasingly emerges from collaborative knowledge creation among geographically and organizationally distributed inventors. As a result, collaboration networks have become a critical micro-level infrastructure through which inventors access heterogeneous knowledge, recombine technological components, and generate novel solutions (Phelps et al., 2012; Y. Zhao et al., 2025).
A growing body of literature has examined the relationship between collaboration networks and innovation performance through the lens of social network theory (C. Wang et al., 2014; Xiao et al., 2025; Xie & Yu, 2025). Most existing studies, however, have focused on the firm or team level (T. Wang et al., 2025) and treated individual inventors as relatively homogeneous carriers of knowledge (Audretsch & Belitski, 2024; Zhou & Li, 2025). While this stream of research has shown that central network positions facilitate access to diverse information, structural holes create brokerage opportunities, and cohesive local structures enhance trust and coordination (Mariotti & Haider, 2020), it provides limited insight into the micro-foundations of breakthrough innovation. In reality, breakthrough inventions are ultimately generated by individual inventors who differ substantially in their cognitive capabilities, attention allocation, prior knowledge base, and collaborative strategies. These differences shape not only their capacity to recombine distant knowledge domains but also the way they utilize network resources.
From this perspective, collaboration networks do not exert uniform effects; rather, their innovation value is contingent upon the roles that inventors play within them. The same structural position may generate radically different outcomes depending on whether an inventor acts as a knowledge integrator, a peripheral participant, or a technological leader. This role heterogeneity is particularly critical for breakthrough innovation, which requires not only exposure to heterogeneous knowledge but also the ability to filter, integrate, and recombine it in non-redundant ways. Moreover, the creation of radical knowledge involves an inherent tension between exploration and coordination: open network structures facilitate access to novel ideas, whereas cohesive ties support the deep collaboration required for solving complex technological problems. These competing mechanisms suggest that no single network attribute is sufficient to explain breakthrough innovation. Instead, different types of inventors are likely to rely on distinct configurations of network conditions to achieve high-impact outcomes.
Although prior research has increasingly acknowledged the importance of collaboration networks for innovation, the understanding of how these mechanisms operate at the inventor level remains fragmented (Ji, 2024). Existing studies largely adopt a net-effect logic and implicitly assume that network advantages operate uniformly across actors, thereby overlooking the possibility that different inventor roles may depend on fundamentally different network configurations (Zhou & Li, 2025). This limitation becomes more pronounced in emerging industries characterized by rapid technological convergence, where breakthrough innovation is driven by the recombination of diverse and often distant knowledge domains (Buchmann & Wolf, 2024). Furthermore, the empirical evidence on micro-level breakthrough innovation in such contexts remains scarce. Methodologically, most prior studies rely on regression-based approaches that are not well suited to capturing the non-linear, asymmetric, and equifinal relationships implied by social network theory. These observations point to the need for an analytical framework capable of simultaneously identifying heterogeneous inventor roles and uncovering multiple network configurations associated with breakthrough innovation.
To address these gaps, this study adopts a role-configuration perspective to examine how collaboration networks are associated with breakthrough innovation at the inventor level. Specifically, it seeks to answer two interrelated research questions: how can distinct inventor roles be empirically identified based on their collaborative output and interaction scale, and how do different configurations of network features influence breakthrough innovation performance across these roles? The empirical analysis is conducted in the new energy vehicle industry, a setting characterized by intensive technological competition and a strong orientation toward radical innovation. Using patent data retrieved from the Innojoy database, this study constructs a co-invention network and develops an inventor-level dataset. Methodologically, the study integrates social network analysis with machine-learning techniques. It first employs K-means clustering to classify inventors into three role types, star inventors, collaborator inventors, and peripheral inventors. The study then applies classification and regression tree models to each group separately to identify the combinations of network features associated with high breakthrough innovation performance. This classification-comparison design makes it possible to reveal heterogeneous and non-linear innovation mechanisms that would be obscured in pooled analyses. The results show that the relationship between collaboration networks and breakthrough innovation is highly role-specific. For collaborator inventors, breakthrough outcomes are primarily associated with configurations centered on closeness centrality, highlighting the importance of efficient knowledge access within moderately connected structures. Peripheral inventors rely on selective brokerage positions that compensate for their limited collaboration breadth. In contrast, star inventors achieve superior breakthrough performance under a configuration characterized by cohesive local structures combined with relatively low collaboration intensity, suggesting that focused and non-redundant teamwork is more conducive to radical knowledge creation.
By linking role differentiation with network configurations, this study makes three main contributions. First, it introduces a micro-level, role-sensitive perspective to the study of collaboration networks and breakthrough innovation, extending the social network literature beyond the homogeneous actor assumption. Second, it demonstrates the value of machine-learning-based configurational analysis for uncovering complex and heterogeneous innovation pathways. Third, it provides a more nuanced analytical basis for designing differentiated network governance and talent development strategies in technology-intensive industries.
Literature Review and Theory Framework
Breakthrough innovation is widely regarded as a knowledge recombination process that depends on access to heterogeneous resources, the efficiency of knowledge diffusion (Wu & Zhou, 2025), and the governance of collaborative relationships. Social network theory provides a powerful lens for explaining these mechanisms by emphasizing how an actor’s structural position shapes opportunity recognition and resource mobilization. Prior studies have shown that central actors are more likely to access diverse knowledge, brokerage positions facilitate novel recombination, and cohesive local structures enhance trust and coordination in collaborative innovation (Lin et al., 2024; X. Rong et al., 2024). However, these mechanisms do not operate independently. Instead, they jointly influence the balance between knowledge diversity and knowledge integration, which is critical for breakthrough innovation.
Degree centrality reflects the number of direct collaborative ties an inventor maintains and represents the breadth of knowledge access (Coffano et al., 2017). Actors with higher degree centrality are embedded in richer knowledge exchange channels and are more likely to obtain non-redundant technological information, thereby increasing the probability of novel recombination. Recent studies on innovation collaboration networks confirm that degree centrality enhances the inflow of heterogeneous knowledge and improves innovation outcomes by expanding the scope of direct connections (X. Li et al., 2025).
Closeness centrality, in contrast, captures how efficiently an actor can reach others in the network. A shorter path reduces knowledge transmission costs and accelerates the integration of distributed technological information. In the context of breakthrough innovation, rapid access to distant knowledge domains is particularly important because it enables inventors to combine previously unconnected technological trajectories. Therefore, closeness centrality reflects the efficiency of knowledge diffusion and the speed of opportunity recognition.
From a resource-based perspective, these two dimensions represent different but complementary knowledge access mechanisms. Degree centrality emphasizes relational breadth, whereas closeness centrality emphasizes structural efficiency. The innovation outcome is therefore likely to depend not on the absolute level of either dimension, but on whether sufficient knowledge diversity can be accessed and integrated within an effective diffusion structure. This implies that their effects are inherently interdependent rather than additive.
In addition, structural hole theory emphasizes the benefits of bridging disconnected partners. Actors occupying brokerage positions can access heterogeneous and non-redundant knowledge pools and control information flows across subgroups (Becker & Bodin, 2022). This position provides a significant advantage for generating novel technological combinations, which is the core mechanism of breakthrough innovation. Y. Wang et al. (2025) found that key inventors and industrious inventors are more likely to form structural holes than talents. Empirical evidence from global innovation collaboration networks shows that actors with richer structural hole are more capable of matching complementary knowledge and improving their innovation performance (Xu et al., 2024). For inventors, structural holes not only provide exposure to diverse knowledge domains but also enhance autonomy in knowledge recombination. This autonomy reduces cognitive lock-in and increases the likelihood of producing technologically discontinuous outcomes. However, the value of structural holes is contingent on the presence of adequate knowledge processing capacity. Without efficient communication channels or stable collaborative relationships, the heterogeneous knowledge accessed through brokerage positions may remain fragmented and difficult to integrate. The innovation advantage of structural holes is likely to be realized only when combined with network conditions that support coordination and knowledge absorption.
The clustering coefficient reflects the extent to which an inventor’s collaborators are interconnected. A highly cohesive local network fosters trust, shared norms, and efficient coordination, which reduces the uncertainty associated with complex R&D activities. However, excessive cohesion may also lead to redundant knowledge and cognitive convergence, thereby limiting the emergence of radical ideas. Collaboration intensity captures the strength and frequency of repeated collaborations. Strong ties facilitate the transfer of tacit knowledge and support the implementation of complex technological solutions. Yet, overly strong collaboration intensity may lock inventors into stable but cognitively homogeneous teams, which can constrain exploratory innovation (W. Wang et al., 2020). These two dimensions represent the governance mechanism of collaborative relationships: cohesion reflects the structure of the local collaborative environment, whereas collaboration intensity reflects the depth of relational embedding. Nevertheless, both high clustering and strong collaboration intensity may also lead to cognitive convergence and redundant knowledge, which can constrain exploratory search. Their innovation implications depend on whether they are combined with structural conditions that introduce sufficient knowledge diversity, such as brokerage positions or broad external connections.
Although each network attribute has been shown to influence innovation, breakthrough innovation is inherently a combinational process that requires both knowledge diversity and knowledge integration. Different structural conditions may substitute for or reinforce each other, leading to multiple pathways toward the same outcome. This logic implies equifinality and causal asymmetry, which cannot be captured by net-effect models.
By integrating social network theory with a configurational perspective, this study argues that breakthrough innovation is shaped by specific combinations of network centrality, structural holes, local cohesion, and collaboration intensity. These configurations reflect different strategies for balancing exploration and exploitation, as well as diversity and coordination, at the individual inventor level. Accordingly, the theoretical framework of this study shown in Figure 1 focuses on how the five network dimensions, degree centrality, closeness centrality, structural holes, clustering coefficient, and collaboration intensity jointly form role-dependent network configurations that influence breakthrough innovation performance. the theoretical framework of this study does not treat network characteristics as independent predictors. Instead, it conceptualizes them as jointly forming network configurations that shape an inventor’s capability to balance knowledge exploration and knowledge integration. By adopting this configurational perspective, the study responds to the limitations of net-effect models and provides a theoretically grounded explanation for why similar levels of innovation performance may emerge from different collaboration structures.

The framework of the study.
Research Design
Data and Sources
The data used in this study were retrieved from the Innojoy Global Patent Database, a comprehensive patent repository covering global patent documents. The study focused on granted invention patents in the new energy vehicle industry, with application years ranging from 2017 to 2023. The NEV industry was identified based on the International Patent Classification codes and keywords commonly associated with electric vehicles, hybrid vehicles, fuel cells, battery technologies, and related components. The search was conducted in December 2024, and the retrieved records include key bibliographic information such as grant date, applicants, inventors, claims, and cited patents.
The initial dataset comprised 36,847 patents. To ensure data quality and the validity of the subsequent network analysis, two filtering criteria were applied. First, patents with incomplete information including missing inventor names and applicant details were removed. Second, patents with only a single inventor were excluded because they do not contribute to collaboration ties and would result in isolated nodes that do not affect the network structure. After applying these filters, a final sample of 26,431 patents was obtained, which formed the basis for constructing the co-invention network. Inventor name disambiguation is a critical step in network construction, as homonyms can lead to spurious linkages and distort network measures. Initially, inventor names were standardized by removing titles, suffixes, and extra spaces, and by splitting names into first and last components where possible. Next, the applicant information was used as a primary key for disambiguation: inventors working for the same organization were considered distinct individuals only if their names and affiliations clearly differed. In cases where the same name appeared with multiple applicants, a random sample of such occurrences was manually verified; these cases were found to be rare, and most represented genuine name coincidences across different companies. After disambiguation, 2,372 unique inventors were identified in the dataset.
Based on the disambiguated inventor list, an undirected coinvention network was constructed in which nodes represent inventors and edges represent collaborative relationships. An edge was established between two inventors if they co-authored at least one patent in the dataset. If a pair of inventors collaborated on multiple patents, the edge weight reflected the number of collaborations, thereby capturing the strength of the tie. The network construction was performed using the Python library NetworkX, and the resulting network contained 2,372 nodes and 8,416 edges.
Variables
Breakthrough Innovation Performance
Breakthrough innovation performance (BIP) is characterized by its capacity to disrupt established technological trajectories and exert broad influence. To measure this concept, the study employs forward patent citations, a well-established and objective indicator of a patent’s technological significance and impact (Buchmann & Wolf, 2024). Specifically, an inventor’s breakthrough innovation performance is operationalized as the average number of forward citations received by all patents in which the inventor is listed. This measure effectively captures the breadth and depth of an invention’s subsequent influence on the knowledge network, thereby aligning closely with the core notion of breakthrough innovation as a form of trajectory-shaping technological change.
Collaboration Network Characteristics
Drawing on social capital and network theory, this study incorporates a set of key network metrics to capture the multifaceted positions and embeddedness of inventors within collaboration networks (Pinto et al., 2019). These metrics, calculated using the NetworkX package in Python, span both structural and relational dimensions. Their operational definitions and measurement approaches are detailed in Table 1.
Definitions and Measurements of Collaboration Network Characteristics.
Methods
K-Means Clustering
K-means clustering is an unsupervised learning algorithm that partitions observations into K homogeneous groups by minimizing the within-cluster sum of squared distances between each observation and its corresponding cluster centroid (Yıldız & Aykanat, 2015). Through iterative reassignment and centroid updating, the algorithm ensures that observations within the same cluster exhibit high similarity, while those in different clusters are maximally separated. Because it does not rely on predefined labels, K-means is widely used to uncover latent heterogeneity and identify behavioral patterns in large-scale datasets.
This method is particularly suitable for the present study because the roles of inventors in collaborative innovation are not directly observable but are instead reflected in their patterns of knowledge production and collaboration. By grouping inventors according to their patent output and collaboration breadth, K-Means enables the identification of structurally meaningful inventor types in a data-driven manner. Such a role-classification approach provides the analytical foundation for subsequent comparisons of network characteristics and for exploring how different inventor roles are associated with breakthrough innovation performance. In this way, the clustering procedure captures the heterogeneous micro-level actors that underpin the configurational analysis.
CART Decision Tree
To investigate how different configurations of network characteristics are associated with breakthrough innovation across inventor roles, this study employs the classification and regression tree (CART) method. CART is a non-parametric supervised learning technique that recursively partitions the data into increasingly homogeneous subsets by selecting explanatory variables and split points that maximize the reduction in impurity. The final output is a tree-structured model that represents a set of hierarchical “IF–THEN” decision rules linking combinations of predictors to the outcome variable.
Unlike traditional regression models that focus on the net effect of individual variables, CART captures complex, non-linear, and interactive relationships among multiple conditions, allowing for the identification of equifinal pathways leading to breakthrough innovation. In addition, the tree structure provides an intuitive and interpretable representation of how different network features jointly shape innovation outcomes, which is consistent with the role-configuration logic. CART does not impose strict distributional assumptions and can accommodate potential multicollinearity among network measures, making it appropriate for analyzing structural characteristics derived from collaboration networks. By applying CART to each inventor group identified through the clustering analysis, this study is able to reveal heterogeneous decision pathways and to compare the key network configurations associated with breakthrough innovation across different roles.
Data Analysis Results
Descriptive Statistics and Correlation Analysis
Table 2 presents the descriptive statistics and Pearson correlation coefficients for all variables. The sample includes 2,372 inventors in the new energy vehicle industry. Breakthrough innovation performance shows substantial variation (Mean = 0.641, Std = 0.557), indicating significant heterogeneity in innovative outcomes across inventors. The network structural variables also exhibit noticeable dispersion, reflecting the unequal distribution of collaboration opportunities and positional advantages in the co-invention network. In particular, the high average clustering coefficient suggests that inventors are embedded in locally cohesive collaboration structures, whereas the relatively high structural hole values indicate the simultaneous presence of brokerage positions that connect otherwise disconnected partners. These patterns highlight the coexistence of closed and open network structures in the industry’s collaborative innovation system.
Descriptive Statistics and Pearson Correlation Results.
p < .001, **p < .01, *p < .05.
The correlation results indicate that BIP is positively and significantly correlated with degree centrality and structural holes, implying that inventors occupying central and brokerage positions tend to achieve higher breakthrough outcomes. In contrast, local closure and strong repeated collaboration show weak or negative associations with BIP when examined in isolation, suggesting that their effects may depend on how they are combined with other structural conditions. Moreover, although several independent variables are significantly correlated, the correlation coefficients remain below critical thresholds, indicating no severe multicollinearity. Overall, the modest and mixed pairwise relationships support the core argument of this study that breakthrough innovation is not driven by a single network attribute but by specific configurations of multiple structural conditions. This provides an empirical basis for the subsequent role classification and the use of the CART model to uncover heterogeneous decision pathways across different inventor groups.
Identification of Inventor Groups
To uncover the heterogeneous roles of inventors in the co-invention network, this study employed the K-means clustering algorithm to classify inventors into distinct groups. The clustering was conducted based on two inventor-level indicators that capture both innovative productivity and collaborative embeddedness: the number of patent applications (patent number) and the number of unique collaborators (partner number). Given the highly skewed distribution of these indicators, both variables were first log-transformed to reduce the influence of extreme values and then normalized to ensure comparability in the clustering space. This preprocessing procedure prevents the clustering results from being dominated by scale differences and improves the robustness of the distance-based algorithm.
To determine the optimal number of clusters, the number of clusters K was systematically varied from 2 to 7, and multiple cluster validity indices were jointly considered, including the within-cluster sum of squared errors (SSE), the Silhouette coefficient, the Calinski-Harabasz index, and the Davies-Bouldin index. As shown in Figure 2, the rate of decline in SSE slows noticeably after K = 3, indicating diminishing marginal gains in model fit. The detailed performance of each clustering solution is reported in Table 3.

The determination of the optimal cluster number.
The Clustering Performance Metrics Results.
The results show that the Silhouette coefficient reaches its highest value at K = 3, suggesting the best balance between intra-cluster cohesion and inter-cluster separation. Meanwhile, the Calinski–Harabasz index remains at a relatively high level and the Davies–Bouldin index at an acceptable level for this solution. Although larger K values improve some individual indicators, they lead to increased fragmentation and reduced interpretability of inventor roles. Therefore, considering both statistical performance and theoretical interpretability, K = 3 was selected as the optimal number of clusters.
To assess the robustness of the clustering results, a stability analysis was conducted. First, the K-means algorithm was repeated 50 times with different random initializations, and the average adjusted Rand index (ARI) reached 0.6827 with a standard deviation of 0.2134, indicating a relatively high level of consistency across runs. Second, a bootstrap-based subsampling procedure was implemented, yielding an average ARI of 0.7064 with a standard deviation of 0.2054. These results suggest that the three-cluster solution is not sensitive to initialization or sampling variation and therefore exhibits satisfactory stability.
The spatial distribution of the three clusters in the two-dimensional feature space is visualized in Figure 3, where each point represents an inventor and the centroids of the clusters are clearly separated. The figure reveals substantial heterogeneity in both innovative output and collaboration breadth across inventor groups, providing an intuitive representation of the clustering results. The relative positions of the cluster centroids in this two-dimensional space provided the rationale for naming the groups. Inventors located in the upper-right region, with both high patent output and a broad collaboration network, were labeled as star inventors, reflecting their dual advantage in innovative productivity and network engagement. Inventors positioned in the intermediate region, characterized by extensive collaboration but only moderate patent output, were designated as collaborator inventors, highlighting their role in sustaining cooperative relationships and facilitating knowledge exchange. Inventors in the lower-left region, with limited patent output and few collaborators, were classified as peripheral inventors, indicating weaker engagement in collaborative innovation activities.

Cluster results of inventors.
To further interpret these roles from a structural perspective, the study compares the average breakthrough innovation performance and network characteristics of the three clusters, as reported in Table 4. Star inventors (N = 439) demonstrate the highest average BIP, relatively high degree centrality, and the strongest structural-hole richness, suggesting that their high productivity is accompanied by an advantageous network position that spans disconnected collaboration communities. Collaborator inventors (N = 884) show moderate BIP but the highest clustering coefficient and strong collaboration intensity, indicating deep embeddedness within cohesive subgroups and frequent joint activities, which support collective innovation rather than breakthrough output. Peripheral inventors (N = 1,049) exhibit the lowest BIP, low degree centrality and structural-hole richness, and limited collaboration intensity, consistent with a weakly embedded position that restricts access to diverse knowledge sources. Closeness centrality remains relatively similar across groups, reflecting general network reachability rather than differential structural advantage.
Network Characteristics of Diverse Inventor Groups.
Overall, the role names are grounded in both the behavioral clustering dimensions and the subsequent network structural patterns. Star inventors combine high patent output with broad collaboration, collaborator inventors maintain intensive ties within cohesive local structures, and peripheral inventors participate only modestly in collaborative activities. This joint evidence from the clustering indicators and network measures establishes a theoretically meaningful basis for the role-specific configurational analysis of breakthrough innovation in the subsequent sections.
Construction of Decision Tree Model
To examine how different configurations of collaboration network characteristics are associated with breakthrough innovation across heterogeneous inventor roles, this study constructs classification decision tree models for each inventor group identified through the K-means clustering. The decision tree is implemented using the CART algorithm in Python, which recursively partitions the sample into increasingly homogeneous subgroups according to the values of the explanatory variables and produces a set of hierarchical decision rules for predicting the outcome. Following prior studies that adopt a classification framework to distinguish heterogeneous innovation outcomes, the dependent variable BIP is dichotomized into high and low levels using the sample mean as the threshold. Observations with values above the mean are coded as high breakthrough innovation, while those below the mean are coded as low breakthrough innovation. This treatment enables the identification of configurational conditions associated with different innovation states and facilitates the interpretation of the resulting decision pathways.
Given the substantial heterogeneity across inventor roles, separate decision tree models are constructed for collaborator inventors, peripheral inventors, and star inventors. For each group, the sample is randomly divided into a 70% training set and a 30% test set to evaluate the out-of-sample predictive performance of the model. The training set is used for model estimation and parameter tuning, while the test set is reserved for performance assessment to avoid overfitting and to ensure the generalizability of the results. To improve the robustness and predictive accuracy of the models, a 10-fold cross-validation procedure is employed within the training set. In addition, hyperparameter optimization is conducted using a grid search strategy, which systematically explores combinations of key model parameters, including the maximum tree depth, the minimum number of samples required to split an internal node, the minimum number of samples required to form a leaf node, and the cost-complexity pruning parameter. The optimal parameter combination is selected based on cross-validated performance, ensuring a balance between model complexity and predictive accuracy.
After determining the optimal hyperparameters, the final decision tree for each inventor group is trained using the full training set and then evaluated on the corresponding test set. Model performance is assessed using multiple metrics, including the area under the ROC curve (AUC), accuracy, recall, and F1 score, in order to provide a comprehensive evaluation of classification quality.
As reported in Table 5, all three models exhibit satisfactory predictive performance, with AUC values exceeding 0.77, indicating good discriminatory power. The collaborator-inventor model achieves the highest overall performance in terms of AUC and accuracy, while the peripheral-inventor model shows the highest recall, suggesting a stronger ability to identify high breakthrough cases within this group. The star-inventor model demonstrates a balanced performance across the different evaluation metrics.
Test-set Performance Metrics.
Overall, the model construction procedure ensures that the identified decision pathways are not only interpretable but also empirically reliable. By combining role-based sample segmentation, cross-validation, and hyperparameter optimization, the constructed decision models capture the heterogeneous configurational mechanisms through which collaboration network structures are associated with breakthrough innovation in the new energy vehicle industry.
To further examine the classification quality, the confusion matrices for the test sets are presented in Figure 4. For collaborator inventors, the model correctly classifies 160 low-breakthrough and 50 high-breakthrough cases, with relatively small numbers of misclassifications. This indicates a balanced predictive ability for both classes. For peripheral inventors, the model correctly identifies 138 low-breakthrough and 58 high-breakthrough cases. Although the number of false positives is relatively higher in this group, the model achieves strong recall for high-breakthrough inventors, which is consistent with the performance reported in Table 5. For star inventors, the model correctly classifies 44 low-breakthrough and 55 high-breakthrough cases, demonstrating a relatively even distribution of predictive accuracy across the two classes.

The confusion matrices of diverse inventor types.
Overall, the confusion matrices confirm that the decision tree models do not rely on a single dominant class and are capable of distinguishing between high and low breakthrough innovation in different inventor groups. By combining role-based sample segmentation, cross-validation, hyperparameter optimization, and out-of-sample evaluation, the modeling procedure ensures that the identified configurational pathways are both interpretable and empirically robust. This provides a reliable basis for uncovering the heterogeneous mechanisms through which collaboration network structures are associated with breakthrough innovation in the new energy vehicle industry.
Extraction of Key Decision Rules
Based on the trained decision tree models, this study further extracts and visualizes the key decision rules shown in Figures 5 to 7 for each inventor group in order to uncover the configurational mechanisms through which collaboration network structures are associated with breakthrough innovation performance. To enhance interpretability and analytical robustness, only decision rules with relatively high confidence and clear hierarchical structures are retained. These rules represent the most stable combinations of network conditions that lead to high or low breakthrough innovation and thus provide a role-specific explanation of innovation pathways.

Decision rule of collaborator inventors.

Decision rule of peripheral inventors.

Decision rule of star inventors.
For collaborator inventors, four key decision rules are extracted from the decision tree, indicating distinct pathways to high or low breakthrough innovation performance. Specifically, very low closeness centrality combined with moderate structural hole advantage corresponds to high breakthrough innovation, while excessive structural holes in the same closeness range reduce the performance. For intermediate closeness centrality, breakthrough performance tends to be low, whereas higher closeness centrality beyond a threshold again increases the likelihood of high breakthrough innovation. Collectively, these rules capture the heterogeneity in role-network configurations among collaborator inventors, illustrating that even within the same group, subtle differences in CC and SH can produce markedly different innovation outcomes.
For peripheral inventors, closeness centrality plays a decisive role shown in Figure 6. When closeness centrality remains at a low level, peripheral inventors are more likely to achieve high breakthrough innovation performance. When closeness centrality increases, the effect of structural holes becomes contingent: a low level of structural hole advantage leads to low breakthrough innovation, whereas a higher level of structural holes reverses this relationship and promotes breakthrough outcomes.
This configuration indicates that peripheral inventors rely more heavily on specific structural opportunities to overcome their disadvantaged network positions. Low closeness centrality implies that these inventors are loosely embedded in the network and face limited access to core knowledge flows. Under such conditions, occupying structural holes becomes a crucial mechanism for accessing heterogeneous technological information and enhancing knowledge recombination capability. This finding is consistent with the structural hole argument in social capital theory, which emphasizes that actors located at the intersection of disconnected groups can generate novel ideas by integrating diverse knowledge domains. At the individual level, this also reflects the importance of boundary-spanning search behavior for less central inventors. For peripheral inventors, breakthrough innovation is therefore not driven by intensive collaboration but by strategically leveraging brokerage positions to compensate for their limited network embeddedness.
The configurational mechanism for star inventors differs fundamentally from the other two groups. Their breakthrough innovation performance is primarily determined by the interaction between the clustering coefficient and collaboration intensity. When the clustering coefficient is low, breakthrough innovation performance tends to be low, indicating that a certain level of cohesive local structure is necessary for knowledge integration. However, when the clustering coefficient exceeds a critical threshold, collaboration intensity becomes the key differentiating factor: low collaboration intensity leads to high breakthrough innovation, whereas high collaboration intensity results in low breakthrough innovation.
This finding reveals a distinctive innovation logic for star inventors. A higher clustering coefficient indicates that these inventors are embedded in cohesive and trust-based local networks, which facilitates the efficient transfer and integration of complex knowledge. However, excessive collaboration intensity within such cohesive structures may lead to overembeddedness, knowledge redundancy, and increased coordination costs. In contrast, maintaining relatively low collaboration intensity allows star inventors to preserve autonomy in knowledge recombination while still benefiting from the trust and absorptive capacity provided by cohesive teams. This configuration reflects the optimal balance between network cohesion and individual creativity.
A comparison across the three inventor groups clearly demonstrates that the network mechanisms leading to breakthrough innovation are highly role-specific and configurational. Collaborator inventors rely on balanced bridging structures, peripheral inventors depend on structural hole compensation mechanisms, and star inventors benefit from cohesive local networks combined with low-intensity collaboration. These heterogeneous pathways illustrate the principle of equifinality: multiple distinct network configurations can lead to high breakthrough innovation.
More importantly, the results indicate that the same network feature may play different roles for different inventor groups. Structural holes are beneficial for peripheral inventors but must be maintained at a moderate level for collaborator inventors, while collaboration intensity has opposite effects for star inventors depending on the level of local cohesion. This highlights the necessity of adopting a role-configuration perspective rather than examining the net effects of individual network variables.
Overall, the extracted decision rules reveal that breakthrough innovation in the new energy vehicle industry is not driven by a single optimal network position but by different combinations of structural conditions tailored to the specific roles of inventors. These findings provide a micro-level explanation of how heterogeneous actors orchestrate their collaboration networks to achieve breakthrough innovation and offer role-specific strategies for enhancing innovation performance (X. Y. Rong et al., 2020).
Robustness Check
To assess the robustness of the extracted decision rules, two alternative specifications of the dependent variable were considered. First, breakthrough innovation performance was dichotomized using the 75% percentile as the threshold to distinguish high and low performance, instead of the sample mean employed in the main analysis. Second, the dependent variable was treated as a continuous measure, and regression tree models were estimated using the original continuous BIP values rather than a binary classification. These alternative approaches aim to examine whether the role-specific decision rules identified in the main analysis are sensitive to the operationalization of breakthrough innovation.
The robustness analyses indicate that the key configurational pathways remain largely consistent across specifications. For all inventor groups, the combinations of network features associated with high breakthrough innovation identified in the main analysis continue to emerge, with only minor adjustments to the numerical thresholds of network indicators. In particular, the dominance of closeness centrality for collaborator inventors, the structural hole dependence for peripheral inventors, and the interaction between clustering coefficient and collaboration intensity for star inventors are preserved. These findings suggest that the main results are not an artifact of the specific dichotomization threshold and that the identified role-specific mechanisms of collaboration network influence on breakthrough innovation are stable and reliable.
Overall, the robustness checks reinforce the validity of the configurational insights derived from the decision tree models and support the generalizability of the extracted decision rules under alternative operationalizations of the dependent variable.
Conclusions and Discussion
Conclusions
Focusing on the new energy vehicle industry, this study examines how inventors’ collaboration network structures shape breakthrough innovation performance. By integrating social network analysis with machine learning–based configurational modeling on a sample of 2,372 inventors, three main conclusions are drawn.
First, inventors can be systematically classified into three distinct role-based groups, collaborator inventors, peripheral inventors, and star inventors, according to their patent output and collaboration breadth. These groups exhibit heterogeneous network positions and interaction patterns, which are reflected in their differing centrality measures, structural hole indices, clustering coefficients, and collaboration intensities. The classification not only reveals the diversity of inventor roles within the industry but also provides a foundation for understanding the differentiated mechanisms through which network structures influence breakthrough innovations, further expands the theoretical research on classifying human inventions based on social capital (X. Y. Rong et al., 2020). This highlights the importance of adopting a role configuration perspective rather than treating inventors as a homogeneous group, echoing the findings of prior study (Y. Wang et al., 2025).
Second, the analysis demonstrates that breakthrough innovation performance is shaped by distinct combinations of network features for each inventor group. For collaborator inventors, closeness centrality emerges as the dominant factor, with moderate structural hole positions further modulating performance outcomes. Peripheral inventors rely heavily on structural hole advantage to compensate for their relatively marginal network positions. Star inventors achieve high breakthrough innovation when embedded in cohesive networks with moderate clustering but maintain relatively low collaboration intensity, highlighting the nuanced balance between integration and autonomy. These findings deepen the research on inventor network configuration (Kim et al., 2021) and reveal that no single network feature universally drives innovation (Luo et al., 2025); rather, breakthrough performance results from specific configurational patterns that are role-dependent. This configurational insight aligns with the principle of equifinality in organizational and innovation research, where multiple distinct pathways can lead to similarly high innovation outcomes (Zhou & Li, 2025).
Third, the application of machine learning–based configurational methods proves effective in capturing micro-level differences in inventors’ innovation behavior. By combining K-means clustering with classification decision trees, this study identifies how heterogeneous network positions interact with collaboration patterns to produce distinct breakthrough outcomes. The approach enables the extraction of clear, interpretable rules that link network configurations to innovation performance at the individual inventor level. This demonstrates the methodological effectiveness of integrating data-driven machine learning techniques with network analysis for uncovering subtle heterogeneity in innovation behavior that traditional linear models might overlook (H. Li et al., 2024; Zhou & Li, 2024), thereby addressing the challenges in innovation research through the combination of machine learning and network analysis.
Managerial Implications
The findings of this study provide several important managerial implications for innovation management in the new energy vehicle industry, particularly in terms of how firms and R&D organizations can design differentiated collaboration strategies based on inventors’ heterogeneous network roles.
First, firms should adopt a role-based collaboration management approach rather than implementing uniform innovation policies for all inventors. The results show that collaborator inventors, peripheral inventors, and star inventors rely on fundamentally different network configurations to achieve breakthrough innovation. This implies that innovation performance can be significantly enhanced when collaboration strategies are aligned with inventors’ structural positions. For collaborator inventors, whose breakthrough performance is primarily driven by closeness centrality, managers should facilitate their access to diverse knowledge sources by enhancing their reachability within the network. For example, by assigning them boundary-spanning tasks or positioning them as coordinators in cross-team projects. In contrast, for peripheral inventors, the key lies in helping them build structural hole advantages. This can be achieved by encouraging their participation in inter-organizational or cross-domain projects, which allows them to connect otherwise disconnected knowledge pools and compensate for their relatively weak embeddedness.
Second, the collaboration governance of star inventors should emphasize a balance between network cohesion and autonomy. The results indicate that high breakthrough performance for star inventors is associated with cohesive collaboration structures combined with relatively low collaboration intensity. This suggests that excessive collaboration may generate coordination costs and reduce the cognitive space necessary for radical knowledge recombination. Therefore, firms should avoid over-embedding star inventors in dense collaborative tasks or administrative coordination roles. Instead, they should provide them with stable core teams while preserving sufficient independent research time and decision-making autonomy. Such an approach helps maintain both knowledge integration efficiency and creative freedom, which are essential for breakthrough innovation.
Third, the study highlights the importance of configuring collaboration networks as strategic innovation assets. From a managerial perspective, collaboration networks should not be viewed merely as by-products of R&D activities but as organizational resources that can be deliberately designed and optimized. Firms can use data-driven network analytics to identify key inventors’ structural positions and dynamically adjust team composition, partner selection, and project allocation. This enables the formation of multiple effective pathways to breakthrough innovation, consistent with the configurational nature of innovation performance.
Limitations and Future Directions
Despite its contributions, this study has several limitations that open avenues for future research. First, this study primarily focuses on collaboration network characteristics and does not explicitly incorporate non-network factors that may also shape breakthrough innovation performance, such as individual knowledge base, technological diversity, organizational support. Although isolating network configurations helps clarify the structural mechanisms of collaboration, innovation outcomes are inherently the result of multiple interacting dimensions. Future research could integrate network features with individual-, organizational-, and technological-level variables to construct a more comprehensive configurational framework and to examine how structural embeddedness interacts with resource endowments. Second, the potential issue of reverse causality cannot be entirely ruled out. Inventors who achieve higher breakthrough innovation performance may subsequently attract more collaboration opportunities and occupy more advantageous network positions, which in turn reinforces their structural advantages. While the present study emphasizes the configurational effects of network roles on innovation outcomes, future research could employ longitudinal network data, instrumental-variable strategies, or causal inference techniques to better disentangle the co-evolutionary relationship between collaboration structures and breakthrough innovation performance.
Footnotes
Ethical Considerations
This article does not contain any studies with human participants performed by any of the authors.
Author Contributions
Wenhao Zhou: Conceptualization; Formal analysis; Writing - Original Draft. Zhiwei Zhang: Methodology; Validation.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was funded by Startup Fund for Advanced Talents of Putian University(No. 2026008).
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data Availability Statement
Data will be available on request from the authors.
