Sage Journals: Discover world-class research

Abstract

We study the cluster ensemble problem and propose a cluster ensemble approach based on subspace similarity (CEASS). From a subspace similarity perspective, we seek the optimal subspace which is most similar to the given subspaces corresponding to the cluster solutions to be combined. We formulate the cluster ensemble problem as an optimization problem of minimizing the squared sum of Euclidean distances between the standard orthogonal basis vectors of the target subspace and the given subspaces. We derived an explicit solution to the preceding problem in terms of singular value decomposition. Moreover, the solution consists of the low dimensional embeddings of instances. Finally, K-means algorithm with the minimum-maximum principle is utilized to cluster instances according to their coordinates in the embedding space. In particular, we circumvent the initialization problem of K-means by employing CEASS that combines different K-means clustering solutions obtained from random initialization to obtain a stable clustering result. We evaluate and compare CEASS so constructed with several other state-of-art cluster ensemble algorithms using nine real world datasets. Experimental results demonstrate that CEASS generally outperforms other algorithms in terms of normalized mutual information and F1 measure. In addition, CEASS is extremely efficient compared to hierarchy clustering algorithms.

Keywords

Machine learning data mining cluster analysis cluster ensemble subspace similarity

Get full access to this article

View all access options for this article.

References

Azimi

and Fern

X.L.

, Adaptive cluster ensemble selection, in: Proc of the 21st IJCAI (2009), 992-997.

Berry

M.W.

, Large-scale sparse singular value computations, International Journal of Supercomputer Applications 6(1) (1992), 13-49.

Carpineto

and Romano

, Consensus clustering based on a new probabilistic rand index with application to subtopic retrieval, IEEE Transactions on Pattern Analysis and Machine Intelligence 34(12) (2012), 2315-2326.

Dattorro

, Convex optimization and euclidean distance geometry, Meboo Publishing, USA, 2005.

Duda

R.O.

, Hart

P.E.

and Stork

D.G.

, Pattern classification (2nd Edition), John Wiley and Sons, New York, 2001.

Fan

J.C.

and Mei

C.L.

, Data analysis, Science Press, Beijing, 2002.

Fern

X.Z.

and Lin

, Cluster ensemble selection, Statistical Analysis and Data Mining 1(3) (2008), 128-141.

Fred

and Lourengo

, Cluster ensemble methods: From single clusterings to combined solutions, Supervised and unsupervised ensemble methods and their applications, Studies in Computational Intelligence, New York 126(1) (2008), 3-30.

Han

, Karypis

, Kumar

and Mobasher

, Hypergraph based clustering in high-dimensional data sets: A summary of results, Bulletin of the Technical Committee on Data Engineering 21(1) (1998), 15-22.

10.

Han

, Boley

, Gini

, Gross

, Hastings

, Karypis

, Kumar

, Mobasher

and Moore

, Webace: A web agent for document categorization and exploration, in: Proc of the 2nd Intl Conf on Autonomous Agents (1998), 408-415.

11.

Iam-On

, Boongeon

, Garrett

and Price

, A link-based cluster ensemble approach for categorical data clustering, IEEE Transactions on Knowledge and Data Engineering 24(3) (2012), 413-425.

12.

Jain

A.K.

and Dubes

R.C.

, Algorithms for clustering cata, Prentice-Hall, Inc., 1988.

13.

Jain

A.K.

, Murty

M.N.

and Flynn

P.J.

, Data clustering: A review, ACM Computing Surveys 31(3) (1999), 264-323.

14.

Jain

A.K.

, Data clustering: 50 years beyond K-means, Pattern Recognition Letters 31(8) (2010), 651-666.

15.

Kaufman

and Rousseeuw

P.J.

, Finding groups in data: An introduction to cluster analysis, John Wiley and Sons, 1990.

16.

Kittler

, Hatef

, Duin

and Matas

, On combining classiers, IEEE Transactions on Pattern Analysis and Machine Intelligence 20(3) (1998), 226-239.

17.

MacQueen

, Some methods for classification and analysis of multivariate observations, in: Proc 5th Symp Math Statist, Prob (1967), 281-297.

18.

Naldi

M.C.

, Carvalho

A.C.P.L.F.

and Campello

R.J.

, Cluster ensemble selection based on relative validity indexes, Data Mining and Knowledge Discovery 27(2) (2013), 259-289.

19.

Nguyen

and Caruana

, Consensus clusterings, in: Proc of the 7th IEEE ICDM, (2007), 607-612.

20.

Sevillano

, Alías

and SocoróJ

, BordaConsensus: A new consensus function for soft cluster ensembles, in: Proc of the 30th Annual Intl ACM SIGIR (2007), 743-744.

21.

Strehl

and Ghosh

, Cluster ensembles - A knowledge reuse framework for combining partitionings, The Journal of Machine Learning Research 3 (2002), 583-617.

22.

Von Luxburg

, A tutorial on spectral clustering, Statistics and Computing 17(4) (2007), 395-416.

23.

Wang

, Shan

and Banerjee

, Bayesian cluster ensembles, Statistical Analysis and Data Mining 4(1) (2011), 54-70.

24.

Wang

, Du

, Wu

, Li

and Li

, Cluster ensemble-based image segmentation, International Journal of Advanced Robotic Systems 10 (2013), 1-11.

25.

Wolpert

, The lack of a priori distinctions between learning algorithms, Neural Computation 8(7) (1996), 1341-1390.

A novel cluster ensemble approach effected by subspace similarity

Abstract

Keywords

Get full access to this article

References