Some considerations of classification for high dimension low-sample size data

Abstract

Abstarct

We review in this article several classification methods, especially for high-dimensional and low-sample size data. We discuss several desirable properties for classifiers in such settings, including predictability, consistency, generality, stability, robustness and sparsity. Specifically, a good classifier should have a small prediction error (predictability); converge to the Bayes-rule classifier asymptotically (consistency); be stable when adding/removing an observation (generality); be stable for different data sets of the same kind (stochastic stability); be stable when there are a small number of contaminated observations (robustness); and have a small number of variables in the classifier (interpretability or sparsity). Several simulation examples and real applications are used to illustrate the usefulness of the existing popular classifiers and compare their performance.

Keywords

classification consistency discriminant analysis machine learning misclassification error sparsity

Get full access to this article

View all access options for this article.

References

Duda

Hart

Stork

. Pattern classification, 2nd ed. New York: Wiley-Interscience, 2000.

McLachlan

. Discriminant analysis and statistical pattern recognition, New York: Wiley Interscience, 2004.

Hastie

Tibshirani

Friedman

. The elements of statistical learning, data mining, inference, and prediction 2009; 2nd ed. Springer.

Marron

Todd

Ahn

. Distance Weighted Discrimination. J Am Stat Asso 2007; 102: 1267–1271.

Ahn

Marron

. Maximal data piling in discrimination. Biometrika 2010; 97: 254–259.

Johnson

Wichern

. Applied multivariate statistical analysis, 5th ed. Upper Saddle River: Prentice Hall, 2001.

Domingos

Pazzani

. On the optimality of the simple Bayesian classifier under zero-one loss. Mach Learn 1997; 29: 103–130.

Burges

CJC

. A Tutorial on Support Vector Machines for Pattern Recognition. Data Min Know Discov 1998; 2: 121–167.

Vapnik

. The nature of statistical learning theory, New York: Springer, 1995.

10.

Vapnik

. Statistical learning theory, New York: John Wiley and Sons, 1998.

11.

Lobo

Vandenberghe

Boyd

Lebret

. Applications of Second-Order Cone Programming. Linear Algebra App 1998; 284: 193–228.

12.

Jung

Marron

. PCA consistency in High dimension, low sample size context. Ann Sta 2009; 37: 4104–4130.

13.

Ahn

Marron

Muller

Chi

. The high dimension, low sample size geometric representation holds under mild conditions. Biometrika 2007; 94: 760–766.

14.

Baik

Silverstein

. Eigenvalues of large sample covariance matrices of spiked population models. J Multivar Anal 2006; 97: 1382–1408.

15.

Paul

. Asymptotics of sample eigenstruture for a large dimensional spiked covariance model. Stat Sin 2007; 17: 1617–1642.

16.

Bickel

Levina

. Some theory of Fisher's linear discriminant function, “naive Bayes”, and some alternatives when there are many more variables than observations. Bernoulli 2004; 10: 989–1010.

17.

McCullagh

Nelder

. Generalized linear models, 2nd ed. London: Chapman and Hall, 1989.

18.

Lin

. Support Vector Machines and the Bayes Rule in Classification. Data Mining Knowledge Discovery 2002; 6: 259–275.

19.

Bartlett

Jordan

McAuliffe

. Convexity, classification, and risk bounds. J Am Stat Asso 2006; 101: 138–156.

20.

Qiao

Zhang

Liu

Todd

Marron

. Weighted Distance Weighted Discrimination and its asymptotic properties. J Am Stat Assoc 2010; 105: 401–414.

21.

Hall

Marron

Neeman

. Geometric representation of high dimension, low sample size data. J R Stat Soc B Stat Meth 2005; 67: 427–444.

22.

Tibshirani

Hastie

Narasimhan

Chu

. Diagnosis of multiple cancer types by shrunken centroids of gene expression. PNAS 2002; 99: 6567–6572.

23.

Wang

Zhu

. Improved centroids estimation for the nearest shrunken centroid classifier. Bioinformatics 2007; 23: 972–979.

24.

Guo

Hastie

Tibshirani

. Regularized linear discriminant analysis and its application in microarrays. Biostatistics 2007; 8: 86–100.

25.

Qiao

. Variable selection in multivariate data analysis using regularization. PhD thesis, University of Pennsylvania, 2006.

26.

Qiao

Zhou

Huang

. Sparse linear discriminant analysis with applications to high dimensional low sample size data. IAENG Int J Appl Math 2008; 39: 48–60.

27.

Zhang

Wang

Christiani

Lin

. Sparse linear discriminant analysis for simultaneous testing for the significance of a gene set/pathway and gene selection. Bioinformatics 2009; 25: 1145–1145.

28.

Cai

Liu

. A Direct Estimation Approach to Sparse Linear Discriminant Analysis 2011. Arxiv preprint arXiv:1107.3442.

29.

Shao

Wang

Deng

Wang

. Sparse linear discriminant analysis by thresholding for high dimensional data. Ann Stat 2011; 39: 1241–1265.

30.

Schimek

. Penalized logistic regression in gene expression analysis, In: Proceedings to 2003 Semiparametric Conference, 13–20 August 2003. Berlin, Germany, 2003.

31.

Zhu

Hastie

. Classification of gene microarrays by penalized logistic regression. Biostatistics 2004; 5: 427–443.

32.

Park

Hastie

. Penalized logistic regression for detecting gene interactions. Biostatistics 2008; 9: 30–50.

33.

Lee

Abbeel

. Efficient L1 regularized logistic regression. In: Proceedings of the Twenty-First National Conference on Artificial Intelligence (AAAI), 16–20 July 2006, 2006; Boston MA.

34.

Koh

Kim

Boyd

. An Interior-Point Method for Large-Scale l1-Regularized Logistic Regression. J Mach Learn Res 2007, pp. 1519–1555.

35.

Fan

. Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 2001; 96: 1348–1360.

36.

Zhu

Rosset

Hastie

. 1-norm support vector machines. Advances in Neural Information Processing Systems 2004; 16: 49–49.

37.

Zhang

Ahn

Lin

Park

. Gene Selection Using Support Vector Machines With Nonconvex Penalty. Bioinformatics 2006; 22: 88–95.

38.

Zhang

. Website for high dimension classification http://www.stat.purdue.edu/~lingsong/research/classificationindex.html (2011.

39.

Dettling M and Bühlmann P. Supervised clustering of genes. Genome Biology 2002; 3: 1–69.