Abstract
In cross-project software defect detection, traditional methods struggle to accurately capture the complex structural relationships and dynamic evolution characteristics of code. This work proposes that constructing a dynamic heterogeneous graph model integrating multidimensional information enables a software defect detection system based on a Graph Neural Network (GNN) to significantly improve detection accuracy and cross-project generalization. To validate this hypothesis, a model named Dynamic Heterogeneous Graph Defect Detection (DGDefect) is designed and implemented. First, a dual-layer heterogeneous graph structure is built based on the Abstract Syntax Tree (AST) and the Program Dependence Graph (PDG). Additionally, a Dynamic Edge Weight Assignment (DEWA) algorithm is introduced to dynamically compute edge weights according to node attributes and contextual similarity. Next, a Gated Graph Attention Network performs gated fusion of syntactic features from AST nodes, control flow features from PDG nodes, and developer commit behavior features. A hierarchical attention mechanism-comprising node-level, subgraph-level, and global-level attention-is integrated within the GNN framework, along with a subgraph pattern matching strategy to identify defect propagation paths. Finally, a resilient incremental learning framework is developed, significantly enhancing model update efficiency through parameter freezing and knowledge distillation. Experiments conducted on the NASA software defect prediction dataset and three large-scale open-source industrial projects demonstrate that DGDefect achieves an average F1 score of 85.5% in cross-project detection, with 89.7% for Java projects. The false positive rate (FPR) is reduced to 5.8%, and recall reaches 88.1%. In industrial-scale codebase detection, the model achieves an average true defect detection rate of 93.1%. With only 52.7 million parameters, the model is significantly smaller than CodeBERT. These results confirm the proposed method's advantages in accuracy, efficiency, generalization, and practical applicability. This work offers a theoretically grounded and engineering-feasible solution for GNN-based software defect detection.
Keywords
Get full access to this article
View all access options for this article.
