Abstract
Many stability measures, such as Normalized Mutual Information (NMI), have been proposed to validate a set of partitionings. It is highly possible that a set of partitionings may contain one (or more) high quality cluster(s) but is still adjudged a bad cluster by a stability measure, and as a result, is completely neglected. Inspired by evaluation approaches measuring the efficacy of a set of partitionings, researchers have tried to define new measures for evaluating a cluster. Thus far, the measures defined for assessing a cluster are mostly based on the well-known NMI measure. The drawback of this commonly used approach is discussed in this paper, after which a new asymmetric criterion, called the Alizadeh–Parvin–Moshki–Minaei criterion (APMM), is proposed to assess the association between a cluster and a set of partitionings. We show that the APMM criterion overcomes the deficiency in the conventional NMI measure. We also propose a clustering ensemble framework that incorporates the APMM's capabilities in order to find the best performing clusters. The framework uses Average APMM (AAPMM) as a fitness measure to select a number of clusters instead of using all of the results. Any cluster that satisfies a predefined threshold of the mentioned measure is selected to participate in an elite ensemble. To combine the chosen clusters, a co-association matrix-based consensus function (by which the set of resultant partitionings are obtained) is used. Because Evidence Accumulation Clustering (EAC) can not derive the co-association matrix from a subset of clusters appropriately, a new EAC-based method, called Extended EAC (EEAC), is employed to construct the co-association matrix from the chosen subset of clusters. Empirical studies show that our proposed approach outperforms other cluster ensemble approaches.
Keywords
Get full access to this article
View all access options for this article.
