Abstract
There are clustering algorithms (such as DBSCAN) that do not group all data into clusters, but identify some data as noise and exclude it from clusters. In the literature there are no dedicated validity measures for this kind of noise-aware clusterings. Applying the standard measures blindly (which seems to happen in the literature) yields misleading results. We revise top performing, established validity measures to cope with the results of this kind of clustering algorithms and demonstrate that such clusterings may require an additional type of validity check, assessing not only the cluster validity (separation and compactness), but also the validity of the distinction between noise and cluster instances. Additionally, we propose a balanced score, that captures both types of validity to get a holistic validity score. All proposed measures are evaluated on artificial data, mimicking the experiments of the extensive review [Arbelaitz O, Gurrutxaga I, Muguerza J et al. 2013]. The encouraging results demonstrate that the noise aware extension of the Silhouette coefficient and the Score function are least influenced by the noise level.
Get full access to this article
View all access options for this article.
