Significance testing of a cluster of multivariate binary variables: comparison of the tripartite T index to three common similarity measures

Abstract

Similarity measures quantify resemblance between pairs of items when each consists of a pattern of two-state (eg, presence versus absence) variables. Numerous similarity measures, many of which are straightforward to calculate and interpret, have been developed and characterized. Methods for testing if items within a specified cluster are significantly more similar to each other than to items outside the cluster have not been extensively developed for binary responses, but a permutation test procedure using a measure of distinctness is available to do this. We compare three well known similarity measures, the Dice, Jaccard and simple matching coefficients, with the more complex tripartite T similarity index recently proposed by Tulloss. Each measure is used in significance tests of whether hypothesized subsets of items are legitimately grouped for resemblance. Theoretically derived measures reflecting diverse scenarios found in medical research and data from neuropsychological research illustrate the methods. Results for the tripartite T measure were comparable to the other methods in some settings, and essentially the same as the Dice coefficient overall when compared theoretically and on the same clinical data. Some shortcomings with the Tulloss algorithm were found and limit the usefulness of the tripartite T index in medical applications.

Get full access to this article

View all access options for this article.

References

Pitman EJG. Significance tests which may be applied to samples from any populations . Supplement to the Journal of the Royal Statistical Society 1937; 4(1): 119-130 .

Fisher RA. The design of experiments (Eighth edition). Oliver and Boyd , 1966.

Sokal RR , Rohlf FJ. Biometry. The principles and practice of statistics in biological research (Third edition) WH Freeman & Co , 1995: 806-808.

Anderberg MR. Cluster analysis for applications. Academic Press , 1973: 89-90.

http://crrm.u-3mrs.fr/commercial/software/dataview/appendix.html. Accessed 22 January 2004.

Hubálek Z. Coefficients of association and similarity based on binary (presence-absence) data: an evaluation . Biological Reviews 1982; 57: 669-689 .

Hayek LAC. Analysis of amphibian biodiversity data. In Heyer WR et al. , eds. Measuring and monitoring miological diversity. Standard methods for amphibians. Smithsonian Institute , 1994; 207-269.

Birks HJB. Recent methodological developments in quantitative descriptive biogeography . Annales Zoologici Fennici 1987; 24: 165-178 .

[Correction in Depression and Anxiety 2003; 17: 229-229 .]

10.

Sokal RR , Sneath PHA. Principles of numerical taxonomy. WH Freeman & Company , 1963.

11.

Snijders TAB , Dormaar M , Van Schuur WH , Dijkman-Caes C , Driessen G. Distribution of some similarity coefficients for dyadic binary data in the case of associated attributes . Journal of Classification 1990; 7: 5-31 .

12.

Viera VV , Teixeira LM , Zahner V , Momen H , Fachlam RR , Steigerwalt AG , Brenner DJ , Castro ACD. Genetic relationships among the different phenotypes of streptococcus dysgalactiae strains . International Journal of Systematic Bacteriology 1998; 48: 1231-1243 .

13.

Tulloss RE. Assessment of similarity indices for undesirable properties and a new tripartite similarity index based on cost functions. In Palm ME , Chapela IH eds. Mycology in sustainable development: expanding concepts, vanishing borders. Parkway Publishers , 1997: 122-143.

14.

Tulloss RE , Tulloss DC. Tripartite similarity calculator. Accessed 18 June 2004 from http://www.amanitabear.com/similarity_1-0/calculator/, 2003.

15.

Tulloss RE , Tulloss DC. Tripartite similarity calculator. Accessed 18 June 2004 from http://www.amanitabear.com/similarity/teaching_+_application/user_publications/, 2003.

16.

Sneath PHA , Sokal RR. Numerical taxonomy. WH Freeman and Company , 1973: 129-133.

17.

Rippeth J , Heaton RK , Carey C , Marcotte ID , Moore DJ , Gonzalez R , Woltson T , Grant I , HNRC Group . Methamphetamine dependence increases risk of neuropsychological impairment in HIV infected persons . Journal of the International Neuropsychological Society 2004; 10: 1-14 .

18.

Heaton RK , Kirson D , Velin RA , Grant I and the HNRC Group . The utility of clinical ratings for detecting cognitive change in HIV infection. In Grant I , Martin A , eds. Neuropsychology of HIV infection. Oxford University Press , 1994: 188-206.

19.

Dice LR. Measures of the amount of ecologic association between species . Ecology 1945; 26: 297-302 .

20.

Sørenson T. A method of establishing groups of equal amplitude in plant sociology based on similarity of species content and its application to analyses of the vegetation on Danish commons . Biologiske Skrifter 1948; 5(4): 1-34 .

21.

Czekanowski J. Zarys metod statystycznych w zastosowanin do antropologii . Towarzystwo Naukowy Warszawskie, Wydzial 3, Nauk Mathematyka-Fizyki, Prace 1913; 5: 1-228 .

22.

Jaccard P. Étude comparative de la distribution florale dans une portion des Alpes et des Jura . Bulletin de la Société Vaudoise des Sciences Naturelles 1901; 37: 547-579 .

23.

Sokal RR , Michener CD. A statistical method for evaluating systematic relationships . University of Kansas Science Bulletin 1958; 38: 1409-1438 .

24.

S-PLUS® 6.1 for Windows, Professional edition, Release 1. Insightful Corp: 1988, 2002.