Automated labeling of issue reports using semi supervised approach

Abstract

Incorrectly labeled issue reports stored in issue tracking systems deteriorate the quality of such repositories. Lots of experimental studies make use of issue reports labeled as ‘bugs’ for training machine learning models. Such mislabeled issue reports present in issue tracking systems pose serious threat on validity of these studies and their subsequent results. Hence an accurate and efficient approach is required for labeling issue reports as ‘bugs’ and ‘non-bugs’. Supervised learning approaches proposed for automated labeling of issue reports need large number of pre-labeled issue reports. This constraint is overcome by use of unsupervised learning approaches that do not require pre-labeled issue reports but they fail to give performance as good as supervised approaches. This paper proposes a semi supervised approach as an improvised solution for automated labeling of issue reports. The objective of the proposed approach is to overrule the dependency on having large pre-labeled reports for training, at the same time, give better performance than unsupervised approaches. To test the validity of proposed semi supervised approach, experiments are conducted on issue reports of three widely used open source systems. Results obtained using semi supervised approach illustrates considerable improvement in terms of F-measure score as compared to unsupervised approaches.

Keywords

Issue labeling issue tracking system semi supervised approach fuzzy clustering semi supervised fuzzy clustering

Get full access to this article

View all access options for this article.

References

Bensaid

A.M.

Hall

L.O.

Bezdek

J.C.

and Clarke

L.P.

, Partially supervised clustering for image segmentation, Pattern Recognition 29(5) (1996), 859–871.

Tamrawi

Nguyen

T.T.

Al-Kofahi

J.M.

and Nguyen

T.N.

, Fuzzy set and cache-based approach for bug triaging, Proc 19th ACM SIGSOFT symposium and 13th European Conference on Foundations of Software Engineering (2011), 365–375.

Sun

Khoo

S.C.

and Jiang

, Towards more accurate retrieval of duplicate bug reports, In Proc 26th IEEE/ACM International Conference on Automated Software Engineering (2011), 253–262.

Liu

and Jiang

, Objective function of semi-supervised fuzzy c-means clustering algorithm, 6th IEEE International Conference on Industrial Informatics (2008), 737–742.

Yasunori

Yukihiro

Makito

and Sadaaki

, On semi-supervised fuzzy c-means clustering, IEEE International Conference on Fuzzy Systems (2009), 1119–1124.

Hoppner

and Klawonn

, A contribution to convergence theory of fuzzy c-means and derivatives, IEEE Transactions on Fuzzy Systems (2003), 682–694.

Thung

and Jiang

, Automatic defect categorization, 19th IEEE Working Conference on Reverse Engineering (2012), 205–214.

Antoniol

Ayari

Di Penta

Khomh

and Guéhéneuc

Y.G.

, Is it a bug or an enhancement? A text-based approach to classify change requests, Proc Conference of the Center for Advanced Studies on Collaborative Research: Meeting of Minds, ACM (2008), 23:304–23:318.

Klir

G.J.

and Yuan

, Fuzzy Sets and Fuzzy Logic: Theory and Applications, Prentice Hall, 1995.

10.

Keswani

and Hall

L.O.

, Text classification with enhanced semi-supervised fuzzy clustering, Proc IEEE International Conference on Fuzzy Systems, FUZZ-IEEE’02 1 (2002), 621–626.

11.

Chawla

and Singh

S.K.

, An automated approach for bug categorization using fuzzy logic, Proc 8th ACM India Software Engineering Conference (ISEC) (2015), 90–99.

12.

Diaz-Valenzuela

Martin-Bautista

M.J.

and Vila

M.A.

, A fuzzy semi supervised clustering method: Application to the classification of scientific publications, In Information Processing and Management of Uncertainty in Knowledge-Based Systems (2014), 179–188.

13.

Anvik

Hiew

and Murphy

G.C.

, Coping with an open bug repository, Proc ACM workshop on Eclipse Technology eXchange (2005), 35–39.

14.

Bezdek

J.C.

, Pattern Recognition with Fuzzy Objective Algorithms, Plenum Press, New York, 1981.

15.

Bezdek

J.C.

, A convergence theorem for the fuzzy ISODATA clustering algorithms, IEEE Transactions on Pattern Analysis and Machine Intelligence (1980), 1–8.

16.

Dunn

J.C.

, A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters, Taylor & Francis, 1973.

17.

Xuan

Jiang

Ren

Yan

and Luo

, Automatic bug triage using semi-supervised text classification, Proc International Conf Software Engineering and Knowledge Engineering (2010), 209–214.

18.

Herzig

Just

and Zeller

, It’s not a bug, it’s a feature: How misclassification impacts bug prediction, Proc International Conference on Software Engineering (ICSE) (2013), 392–401.

19.

Tan

Liu

Wang

Zhou

and Zhai

, Bug characteristics in open source software, Empirical Software Engineering 19(6), 1665–1705.

20.

Benkhalifa

Bensaid

and Mouradi

, Text categorization using the semi-supervised fuzzy c-means algorithm, 18th International Conference of the North American in Fuzzy Information Processing Society (1999), NAFIPS, 561–565.

21.

Yang

M.S.

, Convergence properties of the generalized fuzzy c-means clustering algorithms, Computers & Mathematics with Applications (1993), 3–11.

22.

Yambal

and Gupta

, Image segmentation using fuzzy C means clustering: A survey, International Journal of Advanced Research in Computer and Communication Engineering 2(7).

23.

Grira

Crucianu

and Boujemaa

, Unsupervised and semi-supervised clustering: A brief survey, A Review of Machine Learning Techniques for Processing Multimedia Content, Report of the MUSCLE European Network of Excellence (FP6), 2004.

24.

Nagwani

N.K.

and Verma

, CLUBAS: An algorithm and Java based tool for software bug classification using bug attributes similarities, Journal of Software Engineering and Applications 5(6) (2012), 436–447.

25.

Limsettho

Hata

Monden

and Matsumoto

, Automatic unsupervised bug report categorization, Proc 6th IEEE International Workshop on Empirical Software Engineering in Practice (IWESEP) (2014), 7–12.

26.

Pingclasai

Hata

and Matsumoto

K.I.

, Classifying bug reports to bugs and other requests using topic modeling, Proc 20th IEEE Asia Pacific Software Engineering Conference (APSEC) (2013), 13–18.

27.

Hathaway

R.J.

and Bezdek

J.C.

, Local convergence of the fuzzy c-means algorithms, Pattern Recognition (1986), 477–480.

28.

Hathaway

R.J.

and Bezdek

J.C.

, Recent convergence results for the fuzzy c-means clustering algorithms, Journal of Classification (1988), 237–247.

29.

Zeng

Tong

Sang

and Huang

, A study on semi-supervised FCM algorithm, Knowledge and Information Systems 35(3) (2013), 585–612.

30.

Singh

Kumar

and Tiwari

, Document clustering using k-means, heuristic k-means and fuzzy c-means, Proc IEEE International Conference on Computational Intelligence and Communication Networks (CICN) (2011).

31.

Rijsbergen

Cornelis

Robertson

S.E.

and Porter

M.F.

, New models in probabilistic information retrieval, British Library Research and Development Department, 1980.

32.

Pedrycz

and James

, Fuzzy clustering with partial supervision, IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics 27(5) (1997), 787–795.

33.

Zhang

Tang

and Yoshida

, TESC: An approach to text classification using semi-supervised clustering, Knowledge-Based Systems 75 (2015), 152–160.

34.

Zhou

Tong

and Gall

, Combining text mining and data mining for bug report classification, Journal of Software: Evolution and Process.

35.

Rana

Z.A

Mian

M.A.

and Shamail

, Improving Recall of software defect prediction models using association mining, Knowledge-Based Systems 90 (2015), 1–13.