Evaluation of data analytics based clustering algorithms for knowledge mining in a student engagement data

Abstract

The application of algorithms based on data analytics for the task of knowledge mining in a student dataset is an important strategy for improving learning outcomes, student success and supporting strategic decision making in higher educational institutions of learning. However, the widely used data analytics based clustering algorithms are highly data dependent, making it pertinent to find the most effective algorithm for knowledge mining in a dataset associated with student engagement. In this study, performances of five famous clustering algorithms are evaluated for this purpose. The k-means algorithm was benchmarked with 22 distance functions based on the Silhouette index, Dunn’s index and partition entropy internal validity metrics. The hierarchical clustering algorithm was benchmarked with the Cophenetic correlation coefficient computed for different combinations of distance and linkage functions. The Fuzzy c-means algorithm was benchmarked with the partition entropy, partition coefficient, Silhouette index and modified partition coefficient. The k-nearest neighbor algorithm was applied to determine the optimum epsilon value for the density-based spatial clustering of applications with noise. The default parameter settings were accepted for the expectation-maximization algorithm. The overall ranking of the clustering algorithms was based on cluster potentiality using the median deviation statistics. The results of the evaluation show the well-known k-means algorithm to have the highest cluster potentiality, demonstrating its effectiveness for the task of knowledge mining in a student engagement dataset.

Keywords

Algorithm evaluation data analytics data clustering knowledge mining student engagement

Get full access to this article

View all access options for this article.

References

Daniel

, Big data and analytics in higher education: Opportunities and challenges, British Journal of Educational Technology 46(5) (2015), 904–920.

Shernoff

D.J.

Kelly

Tonks

S.M.

Anderson

Cavanagh

R.F.

Sinha

and Abdi

, Student engagement as a function of environmental complexity in high school classrooms, Learning and Instruction 43 (2016), 52–60.

Salmela-Aro

Moeller

Schneider

Spicer

and Lavonen

, Integrating the light and dark sides of student engagement using person-oriented and situation-specific approaches, Learning and Instruction 43 (2016), 61–70.

Wagner

and Ice

, Data changes everything delivering on the promise of learning analytics in higher education, EDUCAUSE Review (2012), 33–42.

Schmidt

J.A.

Rosenberg

J.M.

and Beymer

P.N.

, A person-in-context approach to student engagement in science: Examining learning activities and choice, Journal of Research in Science Teaching 55(1) (2018), 19–43.

Krause

and Coates

, Students’ engagement in first-year university, Assessment and Evaluation in Higher Education 33(5) (2008), 493–505.

Kuh

G.D.

, How to help students achieve, Chronicle of Higher Education 53(41) (2007), B12–B13.

Johnson

C.S.

and Delawsky

, Project-based learning and student engagement, Academic Research International 4 (2013), 1–11.

Elmore

G.M.

and Huebner

E.S.

, Adolescents’ satisfaction with school experiences: relationships with demographics, attachment relationships, and school engagement behaviour, Psychology in the Schools 47(6) (2010), 525–537.

10.

Sharma

J.K.

, Fundamental of business statistics, 2

{}^{\text{nd}}

Edition, Vikas Publish House, PVT Ltd. India. 2014, 7–8.

11.

Trninić

Jelaska

and Štalec

, Appropriateness and limitations of factor analysis methods utilized in psychology and kinesiology: Part II., Fizička Kultura 67(1) (2013), 1–17.

12.

Kang

and Cho

, K-means clustering seeds initialization based on centrality, sparsity, and isotropy, in: International Conference on Intelligent Data Engineering and Automated Learning, 2009, pp. 109–117.

13.

Fredricks

J.A.

and McColskey

, The measurement of student engagement: A comparative analysis of various methods and student self-report instruments, in: Christenson

S.L.

et al. (eds.), Handbook of Research on Student Engagement, 2012, pp. 763–782.

14.

Skinner

, Using community development theory to improve student engagement in online discussion: A case study, ALT-J 17(2) (2009), 89–100.

15.

Wong

A.C.K.

, Understanding students’ experiences in their own words: Moving beyond a basic analysis of student engagement, The Canadian Journal of Higher Education 45(2) (2015), 60–80.

16.

Wigfield

Guthrie

J.T.

Perencevich

K.C.

Taboada

Klauda

S.L.

McRae

et al., Role of reading engagement in mediating the effects of reading comprehension instruction on reading outcomes, Psychology in the Schools 45 (2008), 432–445.

17.

Blumenfeld

Modell

Bartko

W.T.

Secada

W.G.

Fredricks

J.A.

Friedel

and Paris

, School engagement of inner-city students during middle childhood, Developmental Pathways Through Middle Childhood. Rethinking Contexts and Diversity as Resources 27 (2005), 145–170.

18.

Witkowski

and Cornell

, An investigation into student engagement in higher education classrooms, In Sight: A Journal of Scholarly Teaching 10 (2015), 56–67.

19.

Shernoff

D.J.

and Schmidt

J.A.

, Further evidence of an engagement-achievement paradox among US high school students, Journal of Youth and Adolescence 37(5) (2008), 564–580.

20.

Järvelä

Veermans

and Leinonen

, Investigating student engagement in computer-supported inquiry: A process-oriented analysis, Social Psychology of Education 11(3) (2008), 299–322.

21.

Miller

R.L.

Rycek

R.F.

and Fritson

, The effects of high impact learning experiences on student engagement, Procedia-Social and Behavioral Sciences 15 (2011), 53–59.

22.

Petty

and Farinde

A.A.

, Investigating student engagement in an online mathematics course through windows into teaching and learning, Journal of Online Learning and Teaching 9(2) (2013), 261–270.

23.

Hamari

Shernoff

D.J.

Rowe

Coller

Asbell-Clarke

and Edwards

, Challenging games help students learn: An empirical study on engagement, flow and immersion in game-based learning, Computers in Human Behavior 54 (2016), 170–179.

24.

Manwaring

K.C.

Larsen

Graham

C.R.

Henrie

C.R.

and Halverson

L.R.

, Investigating student engagement in blended learning settings using experience sampling and structural equation modeling, The Internet and Higher Education 35 (2017), 21–33.

25.

Veiga

F.H.

, Assessing student engagement in school: Development and validation of a four-dimensional scale, Procedia-Social and Behavioral Sciences 217 (2016), 813–819.

26.

Himmele

and Himmele

, Total participation techniques: Making every student an active learner, ASCD. 2017.

27.

South African Survey of Student Engagement (SASSE), Institutional Report. 2016.

28.

Schreiber

and Yu

, Exploring student engagement practices at a South African university: Student engagement as reliable predictor of academic performance, South African Journal of Higher Education 30(5) (2016), 157–175.

29.

Han

Kamber

and Pei

, Data mining: concepts and techniques, Morgan Kaufmann. 2011.

30.

Shirkhorshidi

A.S.

Aghabozorgi

and Wah

T.Y.

, A comparison study on similarity and dissimilarity measures in clustering continuous data, PloS One 10(12) (2015), e0144059.

31.

Bhatnagar

Majhi

and Jena

P.R.

, Comparative performance evaluation of clustering algorithms for grouping manufacturing firms, Arabian Journal for Science and Engineering (2017), 1–13.

32.

Singla

Yadav

and Singh

, Comparison and analysis of clustering techniques, in: Information Technology, ITSim 2008. International Symposium, on. 3, 2008, pp. 1–3.

33.

Oyelade

Isewon

Oladipupo

Aromolaran

Uwoghiren

Ameh

and Adebiyi

, Clustering algorithms: Their application to gene expression data, Bioinformatics and Biology Insights 10 (2016), 237–253.

34.

Estivill-Castro

, Why so many clustering algorithms: A position paper, ACM SIGKDD Explorations Newsletter 4(1) (2002), 65–75.

35.

W.H.

Chan

K.C.

Wong

A.K.

and Wang

, Attribute clustering for grouping, selection, and classification of gene expression data, IEEE/ACM Transactions on Computational Biology and Bioinformatics 2(2) (2005), 83–101.

36.

Abbasi

A.A.

and Younis

, A survey on clustering algorithms for wireless sensor networks, Computer and Communications 30 (2007), 2826–2841.

37.

Esfandiari

Babavalian

M.R.

Moghadam

A.M.E.

and Tabar

V.K.

, Knowledge discovery in medicine: Current issue and future trend, Expert Systems with Applications 35 (2014), 4434–4463.

38.

Zheng

Yoon

S.W.

and Lam

S.S.

, Breast cancer diagnosis based on feature extraction using a hybrid of k-means and support vector machine algorithms, Expert Systems with Applications 41 (2014), 1476–1482.

39.

Phanich

Pholkul

and Phimoltares

, Food recommendation system using clustering analysis for diabetic patients, in: IEEE International Conference on Information Science and Applications (ICISA), 2010, pp. 1–8.

40.

Oyelade

O.J.

Oladipupo

O.O.

and Obagbuwa

I.C.

, Application of k-means clustering algorithm for prediction of students’ academic performance, International Journal of Computer Science and Information Security 7(1) (2010), 292–295.

41.

Olugbara

O.O.

Adetiba

and Oyewole

S.A.

, Pixel intensity clustering algorithm for multilevel image segmentation, Mathematical Problems in Engineering (2015), 19 pages.

42.

Arora

Singha

and Sahney

, Understanding consumer’s showrooming behaviour: Extending the theory of planned behavior, Asia Pacific Journal of Marketing and Logistics 29(2) (2017), 409–431.

43.

Civic

and Cilimkovic

, Characteristics of consumers’ behavior in shopping of food products in the market of bosnia and herzegovina, Research in World Economy 8(2) (2017), 49–58.

44.

Eriksson

Barford

and Nowak

R.D.

, Network discovery from passive measurements, in: Proceedings of the ACM SIGCOMM 2008 Conference on Applications, Technologies, Architectures and Protocols for Computer Communications, Seattle, WA, USA, August 17–22. 2008, pp. 291–302.

45.

Pedro

Barbero

Martini

and Discoli

, Application of the k-means clustering method for the detection and analysis of areas of homogeneous residential electricity consumption at the Great La Plata region, Buenos Aires, Argentina, Sustainable Cities and Society 32 (2017), 115–129.

46.

Sharmaa

B.R.

and Paula

, Clustering algorithms: Study and performance evaluation using Weka tool, International Journal of Current Engineering and Technology (2013), 1094–1094.

47.

Kabakchieva

, Predicting student performance by using data mining methods for classification, Cybernetics and Information Technologies 13(1) (2013), 61–72.

48.

Shao

Lee

Liu

and Shen

, Automatic K selection method for the K-means algorithm, in: Systems and Informatics (ICSAI), 4th International Conference on, 2017, pp. 1573–1578.

49.

Thinsungnoena

Kaoungkub

Durongdumronchaib

Kerdprasopb

and Kerdprasopb

, The clustering validity with Silhouette and sum of squared errors, in: Proceedings of the 3rd International Conference on Industrial Application Engineering, 2015, pp. 44–51.

50.

Charrad

Ghazzali

Boiteau

and Niknafs

, NbClust: An R package for determining the relevant number of clusters in a data Set, Journal of Statistical Software 61(6) (2014), 1–36.

51.

Cha

S.H.

, Comprehensive survey on distance/similarity measures between probability density functions, International Journal of Mathematical Models and Methods in Applied Science 4(1) (2007), 300–307.

52.

Bezdek

J.C.

, Cluster validity with fuzzy sets, J. Cybernet 3 (1974), 58–72.

53.

Rousseeuw

P.J.

, Silhouettes: A graphical aid to the interpretation and validation of cluster analysis, Journal of Computational and Applied Mathematics 2 (1987), 53–65.

54.

Dunn

J.C.

, Well separated clusters and optimal fuzzy partitions, J. Cybernet 4 (1974), 95–104.

55.

Ansari

Azeem

M.F.

Ahmed

and Babu

A.V.

, Quantitative evaluation of performance and validity indices for clustering the web navigational sessions, World of Computer Science and Information Technology Journal (WCSIT) 1(5) (2011), 217–226.

56.

Saraçli

Doğan

and Doğan

, Comparison of hierarchical cluster analysis methods by Cophenetic correlation, Journal of Inequalities and Applications 1 (2013), 203–210.

57.

NCSS, LLC. NCSS Statistical Software. NCSS.com [online] Chapter 445 Hierarchical clustering/Dendrograms. http://ncss.wpengine.netdna-cdn.com/wpcontent/themes/ncs.zs/pdf/Procedures/NCSS/Hierarchical_ClusteringDendrograms.pdf.

58.

Kurumalla

and Rao

P.S.

, K-nearest neighbor based DBSCAN clustering algorithm for image segmentation, Journal of Theoretical and Applied Information Technology 92(2) (2016), 395–402.

59.

Gui

W.H.

and Zhang

H.N.

, Asymptotic properties and expectation-maximization algorithm for maximum likelihood estimates of the parameters from Weibull-Logarithmic model, Applied Mathematics-A Journal of Chinese Universities 31(4) (2016), 425–438.

60.

Saad

M.F.

and Alimi

A.M.

, Validity Index and number of clusters, International Journal of Computer Science Issues (IJCSI) 9(1) (2012), 52–57.

61.

Duan

, Density-based clustering and anomaly detection, in: Business Intelligence-Solution for Business Development, InTech, 2012.