Flow graphs as data structures for inducing classifiers

Abstract

This paper describes an empirical research work based on the use of a suitable data structure, named Flow Graph (FG), that can be induced from a supervised training data set. A FG can be approached as a weighted and labeled digraph that summarizes a given supervised training set, aiming at its analysis. FGs can also be used as a repository of the information embedded in training sets, that supports the extraction of classification rules, aiming at the definition of classifiers. The work described in this paper reviews FGs and related concepts, as originally proposed i.e., a suitable structure for modeling discrete data, and proposes its customization for dealing with continuous data. The customization consists of a pre-processing step where a discretization process is carried out in a two-step hybrid approach named HFG (Hybrid Flow Graph). Several experiments with focus on the classifiers extracted from HFGs were conducted and their results were analyzed with focus on both, the value of some metrics associated with the induced digraph-based structure as well as the performance of the classifier extracted from the structure. For the experiments 19 diversified datasets were used and the classification results were comparatively analyzed with those obtained by classifiers induced using four other algorithms namely, J48, Naïve Bayes, k-Nearest-Neighbor and Support Vector Machine.

Keywords

Flow graphs extended flow graphs data structures supervised machine learning algorithms data discretization hybrid systems

Get full access to this article

View all access options for this article.

References

Chan

C.-C.

and Tsumoto

, On learning decision rules from flow graphs, in: Proc of the 2007 Annual Meeting of the North American Fuzzy Information Processing Society (NAFIPS 2007), v. 5, 2007, pp. 655–658.

Butz

C.J.

Yan

and Yang

, An efficient algorithm for inference in rough set flow graphs, Transactions on Rough Sets Peters

J.F.

and Skowron

, eds, LCCS 4100, (2006), 102–122.

Bishop

C.M.

, Neural Networks for Pattern Recognition, Oxford University Press, UK, 2005.

Bishop

C.M.

, Pattern Recognition and Machine Learning, Springer-Verlag Publishing, Berlin, 2006.

Dua

and Graff

, UCI Machine Learning Repository, http://archive.ics.edu/ml. University of California, School of Information and Computer Science, Irvine, CA, 2019.

Knuth

D.E.

, The Art of Computer Programming, v. III, Addison-Wesley, USA, 1973.

Santoro

D.M.

and Nicoletti

M.C.

, Investigating a wrapper approach for selecting features using constructive neural networks, in: Proc International Conference on Information Technology: Coding and Computing (ITCC 2005), 2005, pp. 77–82.

Rodrigues

E.C.

and Nicoletti

M.C.

, Extending flow graphs for handling continuous-valued attributes, in: Hybrid Intelligent Systems (HIS 2018), Advances in Intelligent Systems and Computing, v. 923 Madureira

Abraham

Gandhi

and Varela

, eds , Springer, Cham.

Rodrigues

E.C.

, Flow graphs as data structures for representing and extracting information, M. Sc. dissertation, UNIFACCAMP, 2018 (in Portuguese).

10.

Frank

Hall

M.A.

and Witten

I.A.

, The WEKA Workbench. Online Appendix for Data Mining: Practical Machine Learning Tools and Techniques, Morgan Kaufmann Publishers, USA, 2016.

11.

Ruspini

E.H.

, Numerical methods for fuzzy clustering, Information Science 2 (1970), 319–350.

12.

Hruschka

E.R.

, Jr. Nicoletti

M.C.

Oliveira

V.A.

and Bressan

G.M.

, BayesRule: A Markov-blanket based procedure for extracting a set of probabilistic rules from Bayesian classifiers, International Journal of Hybrid Intelligent Systems 5(2) (2008), 83–96.

13.

Allen

F.E.

, Program optimization, Annual Review in Automatic Programming 5 (1969), 239–307.

14.

Allen

F.E.

, Control flow analysis, SIGPLAN Notices 5(7) (1970), 1–19.

15.

Allen

F.E.

, A basis for program optimization, in: Proc IFIP Congress, North Holland Publ Co., Amsterdam, 1972, pp. 385–390.

16.

Allen

F.E.

, Interprocedural data flow analysis, in: Proc IFIP Congress, North Holland Publ Co., Amsterdam, 1974, pp. 398–402.

17.

Chernoff

, The use of faces to represent points in n-dimensional space graphically, Tech Report no. 71, Department of Statistics, Stanford University, Stanford, CA, USA, 1971.

18.

Witten

I.H.

Frank

and Hall

M.A.

, Data Mining: Practical Machine Learning Tools and Techniques, Morgan Kaufmann Publishers, USA, 2011.

19.

Williams

J.B.

and Zhang

, Combining affective intelligence with learning to improve action selection in decision-making agents, International Journal of Hybrid Intelligent Systems, Pre-press, (2018), 1–27.

20.

Clark

and Holton

D.A.

, A First Look at Graph Theory, (2

{}^{\text{nd}}

Ed), World Scientific, USA, 1998.

21.

Handl

and Knowles

, Multiobjective clustering with automatic determination of the number of clusters, Tech Rep TR-COMPSYSBIO-2004-02, UMIST, UK, 2004.

22.

Quinlan

J.R.

, Induction of decision trees, Machine Learning 1 (1986), 81–106.

23.

Quinlan

J.R.

, C4.5: Programs for Machine Learning, Morgan Kaufmann Publishers, USA, 1993.

24.

Grzymala-Busse

J.W.

, Learning from examples based on rough multisets, in: Proc of the Second International Symposium on Methodologies for Intelligent Systems, 1987, pp. 325–332.

25.

Lisowski

and Czyzewski

, Pawlak’s flow graph extensions for video surveillance systems, in: Proc of the Federated Conference on Computer Science and Information Systems, v. 5, 2015, pp. 81–87.

26.

Mondal

, Application design and analysis of different hybrid intelligent techniques, International Journal of Hybrid Intelligent Systems 13(3–4) (2016), 173–181.

27.

Fosdick

L.D.

Osterweil

L.J.

, Data flow analysis in software reliability, Computing Surveys 8(3) (1976), 305–330.

28.

Ford

L.R.

and Fulkerson

D.R.

, Maximal flow through a network, Canadian Journal of Mathematics 8 (1956), 399–404.

29.

M.C.

Chou

C.H.

Hsieh

C.C.

, Fuzzy c-Means algorithm with a point symmetry distance, International Journal of Fuzzy Systems 7(4) (2005), 175–181.

30.

Nicoletti

M.C.

and Santoro

D.M.

, The influence of search mechanisms in feature subset selection processes, Intelligent Decision Technologies 2(4) (2008), 231–238.

31.

Suri

N.N.R.R.

Murty

M.N.

and Athithan

, A ranking-based algorithm for detection of outliers in categorical data, International Journal of Hybrid Intelligent Systems 11(1) (2014), 1–11.

32.

Latifa

Feraoun

Batouche

and Abraham

, Arabic text detection using ensemble machine learning, International Journal of Hybrid Intelligent Systems 14(4) (2018), 233–238.

33.

Clark

and Niblett

, The CN2 induction algorithm, Machine Learning 3 (1989), 261–283.

34.

Pattaraintakorn

Cercone

and Naruedomkul

, Rule learning: Ordinal prediction based on rough sets and soft-computing, Applied Mathematics Letters 19 (2006), 1300–1307.

35.

Duda

R.O.

Hart

P.E.

and Stork

D.G.

, Pattern Classification, (2nd Ed), John Wiley and Sons, Inc., USA, 2001.

36.

Bandyopadhyay

and Maulik

, Genetic clustering for automatic evolution of clusters and application to image classification, Pattern Recognition 35 (2002), 1197–1208.

37.

García

Luengo

Sáez

J.A.

López

and Herrera

, A survey of discretization techniques: taxonomy and empirical analysis in supervised learning, IEEE Transactions on Knowledge and Data Engineering 25(4) (2013), 734–750.

38.

Russell

and Norvig

, Artificial Intelligence: A Modern Approach, (3

{}^{\text{rd}}

Ed), USA: Pearson Publishing Ltd., 2009.

39.

Ludermir

T.B.

Prudêncio

R.B.C.

and Zanchettin

, Feature and algorithm selection with hybrid intelligent techniques, International Journal of Hybrid Intelligent Systems 8(3) (2011), 115–116.

40.

Mitchell

T.M.

, Machine Learning, McGraw-Hill, USA, 1997.

41.

Fayyad

U.M.

and Irani

K.B.

, Multi-interval discretization of continuous-valued attributes for classification learning, in: Proc of the International Joint Conference on Artificial Intelligence, 1993, pp. 1022–1029.

42.

Yang

Y.-P.O.

Shich

H.-M.

Tzeng

G.-H.

Yen

and Chan

C.C.

, Combined rough sets with flow graph and formal concept analysis for business aviation decision-making, Journal of Intelligent Information Systems 36 (2011), 347–366.

43.

Pawlak

, Rough Sets: Theoretical Aspects of Reasoning About Data, Kluwer, London, 1991.

44.

Pawlak

Grzymala-Busse

Slowinski

and Ziarko

, Rough sets, Communications of the ACM 38(11) (1995), 89–95.

45.

Pawlak

, Rough sets, decision algorithms and Bayes’ theorem, European Journal of Operational Research 136 (2002), 181–189.

46.

Pawlak

, Flow graphs and decision algorithms, in: Lecture Notes in Artificial Intelligence, v. 2639 Wang

et al., eds, Springer-Verlag Publishing, Berlin, 2003, pp. 1–10.

47.

Pawlak

, Probability, truth and flow graphs, Electronic Notes in Theoretical Computer Science 82(4) (2003), 1–9.

48.

Pawlak

, Decision algorithms and flow graphs: A rough set approach, Journal of Telecommunications and Information Technology 3 (2003), 98–101.

49.

Pawlak

, Flow graphs – a new paradigm for data mining and knowledge discovery, in: JAIST Forum 2004 – Technology Creation Based on Knowledge Science: Theory and Practice, jointly with The 5th International Symposium on Knowledge and Systems Science (Proc of the KSS2004), 2004, pp. 147–153.

50.

Pawlak

, Decision rules and flow networks, European Journal of Operational Research 152 (2004), 184–190.

51.

Pawlak

, Data analysis and flow graphs, Journal of Telecommunications and Information Technology 3 (2004), 1–5.

52.

Pawlak

, Flow graphs and data mining, in: Transactions on Rough Sets III, Lecture Notes in Computer Science, v. 3400 Peters

J.F.

and Skowron

, eds, Springer-Verlag Publishing, Berlin, 2005, pp. 1–36.

53.

Pawlak

, and Skowron

, Rudiments of rough sets, Information Sciences 177 (2007), 3–27.

54.

Pawlak

, Flow Graphs – a new paradigm for intelligent data analysis, Warsaw University of Technology Digital Library, 2010, 1–28.