Novel multi-centroid,multi-run sampling schemes for K -medoids-based algorithms

Abstract

Clustering in data mining is used to group similar objects based on their distance, connectivity, relative density, or some specific characteristics. Data clustering has become an important task for discovering significant patterns and characteristics in large spatial databases. The k-medoids-based algorithms have been shown to be effective to spherical-shaped clusters with outliers. However, they are not efficient for large database. In this paper, we propose two novel algorithms – Multi-Centroid with Multi-Run Sampling Scheme, which we termed MCMRS, and a more advanced sampling scheme termed the Incremental Multi-Centroid, Multi-Run Sampling Scheme, which called simply (IMCMRS) hereafter, to improve the performance of many k-medoids-based algorithms including PAM, CLARA and CLARANS. Experimental results demonstrate the proposed scheme can not only reduce by more than 80% computation time but also reduce the average distance per object compared with CLARA and CLARANS. IMCMRS is also superior to MCMRS.

Get full access to this article

View all access options for this article.