A K-Means Cluster Analysis Computer Program With Cross-Tabulations and Next-Nearest-Neighbor Analysis

Abstract

A cluster analysis computer program is presented which uses the K-Means algorithm to obtain partitions of multivariate data which have low within-class variance. The program provides a somewhat novel form of a next-nearest neighbor analysis, convenient cross tabulations for variables other than the ones used in the clustering, an efficient form of variable clustering, and approximate randomization tests for a variety of relationships.

Get full access to this article

View all access options for this article.

References

Anderberg, M. R. Cluster analysis for applications . New York: Academic Press, 1973.

Ball, G. H. A comparison of some cluster-seeking techniques. Menlo Park, CA: Stanford Research Institute, 1966.

Ball, G. H. and Hall, D. J. Isodata: A novel method of data analysis and pattern classification. Menlo Park, CA: Stanford Research Institute, 1965.

Blashfield, R. K. and Aldenderfer, M. S. The literature on cluster analysis. Multivariate Behaviorial Research, 1978, 13, 271-295.

Dwass, M. Modified randomization tests for non-parameteric hypothesis. Annals of Mathematical Statistics , 1957, 28, 181-187.

Fisher, W. D. On grouping for maximum homogeniety . Journal of the American Statistical Association, 1958, 53, 789-798.

Hartigan, J. Clustering algorithms. New York: Wiley, 1975 .

MacQueen, J. B. Some methods for classification and analysis of multivariate observations. Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability (Vol. 1). Berkeley and Los Angeles: University of California Press, 1967.

Tryon, R. C. Cluster analysis. Ann Arbor, Michigan: Edward Brothers, 1939.