Selecting Predictor Variables in Two-Group Classification Problems

Abstract

One approach to the problem of selecting predictor variables in the two-group classification situation is to compare the cross-validated classification accuracies of all possible subsets. Because of the obvious time constraints of this approach for even a moderate number of predictors, a computer program was written to accomplish the task. The program examines the cross-validated classification hit rates of all possible subsets of predictor variables and identifies the user-specified number of best subsets overall and for each subset size.

Get full access to this article

View all access options for this article.

References

Bartlett, M. S. (1951). An inverse matrix adjustment arising in discriminant analysis. Annals of Mathematical Statistics, 22, 107-111.

Hocking, R. R. (1983). Developments in linear regression: 1959-1982. Technometrics, 19, 219-249.

Huberty, C. J (1984). Issues in the use and interpretation of discriminant analysis. Psychological Bulletin, 95, 156-171.

Huberty, C. J (1989). Problems with stepwise methods-Better alternatives. In B. Thompson (Ed.), Advances in social science methodology (Vol. 1, pp. 43-70). Greenwich, CT: JAI.

Huberty, C. J , & Mourad, S. A. (1980). Estimation in multiple correlation/prediction. Educational and Psychological Measurement, 40, 101-112.

Lachenbruch, P. A. (1967). An almost unbiased method of obtaining confidence intervals for the probability of misclassification in discriminant analysis. Biometrics, 23, 639-654.

Lachenbruch, P. A. , & Mickey, M. R. (1968). Estimation of error rates in discriminant analysis. Technometrics, 10, 1-11.

McCabe, G. P. (1975). Computation for variable selection in discriminant analysis. Technometrics, 17, 103-109.

McHenry, D. E. (1978). Computation of a best subset in multivariate analysis. Applied Statistics, 27, 291-296.

10.

McKay, R. J. , & Campbell, N. A. (1982). Variable selection techniques in discriminant analysis II. Allocation. British Journal of Mathematical and Statistical Psychology, 35, 1-29.

11.

Mosteller, F. , & Tukey, J. W. (1968). Data analysis, including statistics. In G. Lindzey & E. Aronson (Eds.), Handbook of social psychology (Vol. 2, pp. 80-203). Reading, MA: Addison-Wesley.