Abstract
We study the cluster ensemble problem and propose a cluster ensemble approach based on subspace similarity (CEASS). From a subspace similarity perspective, we seek the optimal subspace which is most similar to the given subspaces corresponding to the cluster solutions to be combined. We formulate the cluster ensemble problem as an optimization problem of minimizing the squared sum of Euclidean distances between the standard orthogonal basis vectors of the target subspace and the given subspaces. We derived an explicit solution to the preceding problem in terms of singular value decomposition. Moreover, the solution consists of the low dimensional embeddings of instances. Finally, K-means algorithm with the minimum-maximum principle is utilized to cluster instances according to their coordinates in the embedding space. In particular, we circumvent the initialization problem of K-means by employing CEASS that combines different K-means clustering solutions obtained from random initialization to obtain a stable clustering result. We evaluate and compare CEASS so constructed with several other state-of-art cluster ensemble algorithms using nine real world datasets. Experimental results demonstrate that CEASS generally outperforms other algorithms in terms of normalized mutual information and F1 measure. In addition, CEASS is extremely efficient compared to hierarchy clustering algorithms.
Get full access to this article
View all access options for this article.
