A hybrid clustering algorithm and web information foraging

Abstract

Clustering techniques have shown their usefulness for many real applications. In this article, we design a new clustering algorithm and adapt it to Web information foraging. The algorithm namely k-MM, takes advantages of both k-means and PAM to comply with the clustering criteria such as effectiveness, efficiency, scalability and ability to control noise and outliers. We experimented k-MM on some UCI datasets and show that when, compared to k-means, PAM, CLARA and CLARANS, it is very effective and efficient. We also tested it on COIL-100 to show its applicability on concrete domains and demonstrate that it outperforms a recent image clustering algorithm found in the literature. In a second step, we present an application to Web Information Foraging and confront k-MM to a recent agent-based method. Experiments in this case were performed on a real dynamic website called MedlinePlus, in contrast of what was traditionally done on web logs. We show that k-MM integrated to Web Information Foraging, has the ability to discover authorities more effectively and more efficiently.

Keywords

Data mining clustering k-means PAM web information foraging MedlinePlus

Get full access to this article

View all access options for this article.

References

Arbelaitz

, Gurrutxaga

, Lojo

, Muguerza

, Pérez

J.M.

and Perona

, Web usage and content mining to extract knowledge for modelling the users of the bidasoa turismo website and to adapt it, Expert System With Applications 40 (2013), 7478-7491.

BreastCancerDataSet: Wisconsin university (2015). https:// archive.ics.uci.edu/ml/datasets/Breast Cancer Wisconsin.

CarEvaluationDataSet: Wisconsin university (2015). https:// archive.ics.uci.edu/ml/datasets/CarEvaluation.

Chi

and Pirolli

, Social information foraging and collaborative search, In: HCIC Workshop, Frase CO, 2006, pp. 7478-7491.

Drias

, Fodil-Cherif

and Kechid

, k-mm: A hybrid clustering algorithm based on k-means and k-medoids, in: Advances in Nature and Biologically Inspired Computing, vol. 419 of the series Advances in Intelligent Systems and Computing, Springer International Publishing Switzerland, 2016, pp. 37-48.

Drias

, Sadeg

and Yahi

, Cooperative bees swarm optimisation for solving the maximum weighted maximum satisfiability problem, In: Proceedings of Iwann, LNCS, Springer, vol. 3512, 2005, pp. 318-325.

Drias

and Kechid

, Bees swarm optimization for web information foraging. In: Mining Intelligence and Knowledge Exploration, MIKE, 2014, pp. 189-198.

Drias

, Kechid

and Pasi

, A novel framework for medical web information foraging using hybrid aco and tabu search, Journal of Medical Systems (2016).

FTCDCDataSet: Wisconsin university (2015). https://archive. ics.uci.edu/ml/datasets/Firm-Teacher Clave-Direction Classification

10.

Garcia

L.L.

, Rodenas

R.G.

and Gomez

A.G.

, Hybrid meta-heuristic optimization algorithms for time-domain-constrained data clustering, Applied Soft Computing 23 (2014), 319-332.

11.

Grira

and Houle

, Best of both: a hybridized centroid-medoid clustering heuristic, In: ICML, ACM International Conference Proceeding Series, vol. 227, 2007, pp. 313-320.

12.

Han

, Kamber

and Pei

, Data Mining: Concepts and Techniques, 3rd edn. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2011.

13.

Huberman

B.A.

and Adamic

L.A.

, Growth dynamics of the world-wide web, Nature 40 (1999), 7478-7491.

14.

Huberman

B.A.

, Pirolli

P.L.T.L.T.

, Pitkow

J.E.

and Lukos

, Strong regularities in world wide web surfing, Science 280 (1997), 96-97.

15.

Joshi

, Joshi

and Krishnapuram

, On mining web access logs. In: ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, 2000, pp. 63-69.

16.

Kaufman

and Rousseeuw

, Clustering large data sets (with discussion). Pattern Recognition in Practice II. North-Holland, Amsterdam, 1986, pp. 425-437.

17.

Kaufman

and Rousseeuw

, Clustering by means of medoids. Statistical Data Analysis Based on the L1-Norm and Related Methods, 1987, pp. 405-416.

18.

Kaufman

and Rousseeuw

, Finding Groups in Data: an Introduction to Cluster Analysis, John Wiley Sons, San Francisco, CA, USA, 2009.

19.

Liu

, Web intelligence: What makes wisdom web? In: IJCAI-03, (2003), pp. 1596-1601.

20.

Liu

and Zhang

, Characterizing web usage regularities with information foraging agents, IEEE Transactions on Knowledge and Data Engineering 16 (2004), 566-584.

21.

Liu

, Zhong

, Yao

and Ras

Z.W.

, The wisdom web: New challenges for web intelligence (wi), Journal of Intelligent Information Systems, Kluwer Academic Publishers 20 (2003), 5-9.

22.

MacQueen

, Some methods for classification and analysis of multivariate observations. In: Proceedings of 5-th Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, University of California Press, vol. 1, 1967, pp. 281-297.

23.

MedlinePlus: Us national library of medicine (2015). http:// www.nlm.nih.gov/medlineplus/.

24.

Mouratidis

, Papadias

and Papadimitriou

, Medoid queries in large spatial databases, in: Advances in Spatial and Temporal Databases, C. Bauzer Medeiros et al., SSTD 2005, LNCS, Springer, Berlin/Heidelberg, vol. 3633, 2005, pp. 55-72.

25.

Nene

, Nayar

and Murase

, Columbia object image library: Coil-100 (2007). http://www.cs.columbia.edu/Cave.

26.

and Han

, Efficient and effective clustering method for spatial data mining, IEEE Transactions on Knowledge and Data Engineering 14 (2002), 1003-1016.

27.

Park

and Jun

, A simple and fast algorithm for k-medoids clustering, Expert Systems with Applications 36 (2009), 3336-3341.

28.

Paterlini

, Nascimento

and Traina

, Using pivots to speed-up k-medoids clustering, Journal of Information and Data Management 2 (2011), 221-236.

29.

UCI: Wisconsin university (2007). https://archive.ics.uci.edu/ ml/datasets.

30.

Yang

X.S.

, A new metaheuristic bat-inspired algorithm. in: Nature Inspired Cooperative Strategies for Optimization (NISCO 2010), J.R. Gonzalez et al., in Studies in Computational Intelligence, Springer Berlin, vol. 284, 2010, pp. 65-74.

31.

Zhang

and Couloigner

, A new and efficient k-medoid algorithm for spatial clustering, In: Computational Science and Its Applications - ICCSA 2005 LNCS, Springer, vol. 3482, 2005, pp. 181-189.

32.

Zhong

, Ma

J.H.

, Huang

, Liu

, Yao

, Zhang

and Chen

, Research challenges and perspectives on wisdom web of things, Supercomputing Springer, 2010.