Abstract
Methods for feature selection in cluster analysis are not yet well established, although research has demonstrated clearly that extraneous descriptors can mask natural clusters in data. The goal in this work has been to use variables’ contribution to clustering tendency to distinguish those that contribute to clustering from those variables that do not. It is also important to choose the smallest subsets of variables that will support clustering.
A modified version of Hopkins’ statistic is used to evaluate the degree to which each variable in a pool of measured or calculated variables contributes to the clustering tendency of a data set. The value of clustering tendency in choosing reasonable sets of variables will be demonstrated in examples using real and artificial data sets. Since clustering is exploratory in nature, there may be more than one set of useful variables.
Get full access to this article
View all access options for this article.
