Abstract
Similarity measures quantify resemblance between pairs of items when each consists of a pattern of two-state (eg, presence versus absence) variables. Numerous similarity measures, many of which are straightforward to calculate and interpret, have been developed and characterized. Methods for testing if items within a specified cluster are significantly more similar to each other than to items outside the cluster have not been extensively developed for binary responses, but a permutation test procedure using a measure of distinctness is available to do this. We compare three well known similarity measures, the Dice, Jaccard and simple matching coefficients, with the more complex tripartite T similarity index recently proposed by Tulloss. Each measure is used in significance tests of whether hypothesized subsets of items are legitimately grouped for resemblance. Theoretically derived measures reflecting diverse scenarios found in medical research and data from neuropsychological research illustrate the methods. Results for the tripartite T measure were comparable to the other methods in some settings, and essentially the same as the Dice coefficient overall when compared theoretically and on the same clinical data. Some shortcomings with the Tulloss algorithm were found and limit the usefulness of the tripartite T index in medical applications.
Get full access to this article
View all access options for this article.
