Sage Journals: Discover world-class research

Abstract

Breakthrough classification performances have been achieved by utilizing ensemble techniques in machine learning and data mining. Bagging is one such ensemble technique that has outperformed single models in obtaining higher predictive performances. This paper proposes an ensemble technique by utilizing the basic bootstrap aggregating technique on hybridization of two base learners namely Naïve Bayes (NB) and Decision Tree (DT). Before induction of the DT, NB algorithm is employed for eliminating mislabeled or contradictory instances from the training set. Consequently, bagging approach is applied on hybrid NBDT as the base learner. The resultant Bagged Naïve Bayes-Decision Tree (BNBDT) algorithm is then used for improving the classification accuracy of various multi-class problems. This algorithm iteratively trains the base learner from random samples of the training set, and then performs majority voting of their predictions. The proposed algorithm is compared with both ensemble and single classification techniques such as Random Forest, Bagged NB, Bagged DT, NB, and DT. Experimental results over 52 UCI data sets with bag size 100 demonstrate that the proposed algorithm significantly outperforms the existing algorithms.

Keywords

Bagging naïve bayes decision tree classification multi-class problems machine learning hybrid learner

Get full access to this article

View all access options for this article.

References

Kavakiotis ,

Tsave ,

Salifoglou ,

Maglaveras ,

Vlahavas and

Chouvarda , Machine learning and data mining methods in diabetes research, Comput Struct Biotechnol J 15 (2017), 104–116.

S.H.

Liao ,

P.H.

Chu and

P.Y.

Hsiao , Data mining techniques and applications - A decade review from 2000 to 2011, Expert Syst Appl 39 (2012), 11303–11311.

Langley and

Sage , Induction of Selective Bayesian Classifiers, In: Uncertainty Proceedings 1994, 1994, pp. 399–406. Elsevier.

Z.-H.

Zhou , Ensemble Learning, In: Encyclopedia of Biometrics, Springer US, Boston, MA, 2009, pp. 270–273.

Rocha and

S.K.

Goldenstein , Multiclass from binary: Expanding One-versus-all, one-versus-one and ECOC-based approaches, IEEE Trans Neural Networks Learn Syst 25 (2014), 289–302.

Chaudhary ,

Kolhe and

Kamal , A hybrid ensemble for classification in multiclass datasets: An application to oilseed disease dataset, Comput Electron Agric 124 (2016), 65–72.

Aly , Survey on multiclass classification methods, Neural Netw 19 (2005), 1–9.

Silva-Palacios ,

Ferri and

M.J.

Ramírez-Quintana , Probabilistic class hierarchies for multiclass classification, J Comput Sci 26 (2018), 254–263.

Chaudhary ,

Kolhe and

Kamal , An improved random forest classifier for multi-class classification, Inf Process Agric 3 (2016), 215–222.

10.

García-Pedrajas and

Ortiz-Boyer , An empirical study of binary classifier fusion methods for multiclass classification, Inf Fusion 12 (2011), 111–130.

11.

Mousavi and

Eftekhari , A new ensemble learning methodology based on hybridization of classifier ensemble selection approaches, Appl Soft Comput J 37 (2015), 652–666.

12.

Agarwal ,

V.N.

Balasubramanian and

C.V.

Jawahar , Improving multiclass classification by deep networks using DAGSVM and Triplet Loss, Pattern Recognit Lett 112 (2018), 184–190.

13.

I.H.

Sarker ,

M.A.

Kabir ,

Colman and

Han , An Improved Naive Bayes Classifier-Based Noise Detection Technique for Classifying User Phone Call Behavior, In: Australasian Conference on Data Mining, Springer, Singapore, 2018, pp. 72–85.

14.

Ren ,

Lian and

Zou , Incremental naïve bayesian learning algorithm based on classification contribution degree, J Comput 9 (2014), 1967–1974.

15.

Frank and

Asuncion , UCI Machine Learning Repository. University of California, School of Information and Computer Science, http://archive.ics.uci.edu/ml.

16.

De Caigny ,

Coussement and

K.W.

De Bock , A new hybrid classification algorithm for customer churn prediction based on logistic regression and decision trees, Eur J Oper Res 269 (2018), 760–772.

17.

Kotsiantis , A hybrid decision tree classifier, J Intell Fuzzy Syst 26 (2014), 327–336.

18.

D.R.

Carvalho and

A.A.

Freitas , A hybrid decision tree/genetic algorithm method for data mining, Inf Sci (Ny) 163 (2004), 13–35.

19.

Panhalkar and

Doye , An outlook in some aspects of hybrid decision tree classification approach: A survey, In: Proceedings of the International Conference on Data Engineering and Communication Technology, Springer, Singapore, 2017, pp. 85–95.

20.

L.-M.

Wang ,

X.-L.

Li ,

C.-H.

Cao and

S.-M.

Yuan , Combining decision tree and Naive Bayes for classification, Knowledge-Based Syst 19 (2006), 511–515.

21.

Polat and

Güneş , A novel hybrid intelligent method based on C4. 5 decision tree classifier and one-against-all approach for multi-class classification problems, Expert Syst Appl 36 (2009), 1587–1592.

22.

S.-J.

Lee ,

Xu ,

Li and

Yang , A novel bagging C4. 5 algorithm based on wrapper feature selection for supporting wise clinical decision making, J Biomed Inform 78 (2017), 144–155.

23.

Singh and

Verma , Multi-classifier model for software fault prediction, Int Arab J Inf Technol 15 (2018), 912–919.

24.

Ala’raj and

M.F.

Abbod , Classifiers consensus system approach for credit scoring, Knowledge-Based Syst 104 (2016), 89–105.

25.

Sun ,

Lang ,

Fujita and

Li , Imbalanced enterprise credit evaluation with DTE-SBD: Decision tree ensemble based on SMOTE and bagging with differentiated sampling rates, Inf Sci (Ny) 425 (2018), 76–91.

26.

Wu ,

Pan ,

Zhu ,

Cai ,

Zhang and

Zhang , Self-adaptive attribute weighting for Naive Bayes classification, Expert Syst Appl 42 (2015), 1487–1502.

27.

Chandra and

Gupta , Robust approach for estimating probabilities in Naïve–Bayes Classifier for gene expression data, Expert Syst Appl 38 (2011), 1293–1298.

28.

Wei ,

Sen ,

J.Y.

Yang ,

Shen, H.

Bin and

D.J.

Yu , A cascade random forests algorithm for predicting protein-protein interaction sites, IEEE Trans Nanobioscience 14 (2015), 746–760.

29.

Lin ,

Wang ,

Xie and

Zhong , Random forests-based extreme learning machine ensemble for multi-regime time series prediction, Expert Syst Appl 83 (2017), 164–176.

30.

Silva-Palacios ,

Ferri and

M.J.

Ramírez-Quintana , Improving performance of multiclass classification by inducing class hierarchies, Procedia Comput Sci 108 (2017), 1692–1701.

31.

Friedman ,

Geiger and

Goldszmidt , Bayesian network classifiers, Mach Learn 29 (1997), 131–163.

32.

Breiman , Random Forests, 2001, pp. 1–33.

33.

D.W.

Opitz and

Maclin , Popular ensemble methods: An empirical study, J Artif Intell Res 11 (1999), 169–198.

34.

C.L.

Blake and

C.J.

Merz , UCI repository of machine learning databases, http://archive.ics.uci.edu/ml/index.php.

35.

C.A.

Kulikowski and

S.M.

Weiss , Computer systems that learn: classification and prediction methods from statistics neural nets, machine learning, and expert systems, ,Morgan Kaufmann Publishers, San Francisco, 1991.

36.

C.E.

Rasmussen ,

R.M.

Neal ,

Hinton ,

van Camp ,

Revow ,

Ghahramani ,

Kustra and

Tibshirani , Delve Datasets, http://www.cs.toronto.edu/~delve/data/datasets.html.

37.

J.H.

Friedman ,

Tibshirani and

Hastie , Datasets for “The Elements of Statistical Learning,” https://statweb.stanford.edu/~tibs/ElemStatLearn/data.html.

38.

Breiman , Bagging predictors, Mach Learn 24 (1996), 123–140.

39.

Breiman , Friedman ,

Jerome ,

Olshen and

C.J.

Stone , Classification and Regression Trees, Chapman and Hall, New York, 1984.

40.

Breiman , Random forests, Mach Learn 45 (2001), 5–32.

A novel Bagged Naïve Bayes-Decision Tree approach for multi-class classification problems

Abstract

Keywords

Get full access to this article

References