Abstract
In response to the problems of poor generalization ability and difficulty in feature selection of traditional software defect prediction models, this paper introduces deep neural networks to build an automated, real-time updated, and efficient software defect prediction system that processes large-scale data. Data, code features, historical defect records, and developer activities can be collected from the version control system Git and defect tracking system JIRA. The quartile range method can be used to handle outliers, the mean interpolation method and forward filling method can be used to fill in missing values, and these raw data have been cleaned and feature extracted. In terms of the model, the Deep Neural Networks (DNN) algorithm is used for model architecture and training. In terms of real-time prediction, Apache Kafka and Spark Streaming are used to achieve real-time acquisition, processing, and analysis of software data, achieving real-time software defect prediction. After multiple experiments on several open-source projects, the model achieves a prediction accuracy of 92.5%, a recall of 88.3%, a precision of 90.2%, and an F1 score of 89.2%. The results show that the system has high prediction performance when dealing with large-scale data in complex environments, and can help improve the efficiency and quality of software development.
Keywords
Get full access to this article
View all access options for this article.
