Abstract
Incorrectly labeled issue reports stored in issue tracking systems deteriorate the quality of such repositories. Lots of experimental studies make use of issue reports labeled as ‘bugs’ for training machine learning models. Such mislabeled issue reports present in issue tracking systems pose serious threat on validity of these studies and their subsequent results. Hence an accurate and efficient approach is required for labeling issue reports as ‘bugs’ and ‘non-bugs’. Supervised learning approaches proposed for automated labeling of issue reports need large number of pre-labeled issue reports. This constraint is overcome by use of unsupervised learning approaches that do not require pre-labeled issue reports but they fail to give performance as good as supervised approaches. This paper proposes a semi supervised approach as an improvised solution for automated labeling of issue reports. The objective of the proposed approach is to overrule the dependency on having large pre-labeled reports for training, at the same time, give better performance than unsupervised approaches. To test the validity of proposed semi supervised approach, experiments are conducted on issue reports of three widely used open source systems. Results obtained using semi supervised approach illustrates considerable improvement in terms of F-measure score as compared to unsupervised approaches.
Keywords
Get full access to this article
View all access options for this article.
