Sage Journals: Discover world-class research

Abstract

This paper presents a novel supervised dimensionality reduction approach for facial feature extraction called (2D)²LDALPP. The proposed (2D)²LDALPP method effectively combines alternative 2DLDA with alternative 2DLPP. The feature extraction is split into two steps: firstly, the column directional information is extracted by applying alternative 2DLDA; secondly, the feature matrix is inversed and alternative 2DLPP is used to extract the row directional information. The advantage of the method lies in the compression of the facial image in two different directions and the fact that the dimension of the feature matrix is low. At the same time, because 2DLDA is a supervised learning method, the proposed method not only preserves the manifold structure of the samples but also contains the label information of the classes. Experimental results on the Feret, ORL, and Yale databases show that the proposed method is effective.

Keywords

Face Recognition Linear Discriminant Analysis Locality Preserving Projection

1. Introduction

Feature extraction is the key problem of face recognition, and the extraction of effective and stable features is a current research hot spot. Principal Component Analysis (PCA) [1] and Linear Discriminant Analysis (LDA) [2] are the two best-known methods, which have been developed along with many other outstanding approaches. The objective of PCA is to seek a set of mutually orthogonal basis functions that capture the directions of maximum variance in the data. LDA aims to find an optimal projection space that maximizes the ratio of between-class scatter matrices and within-class scatter matrices of the training samples. LDA is a supervised learning method and thus is suitable for the classification task. However, LDA used in face recognition will confront a small sample size problem (SSS) due to the larger size of the vector and the relatively small number of training samples. To overcome this problem, the Fisherface method first projects the samples into the PCA space so that the within-class scatter matrix is non-singular. Many studies have indicated that face images possibly reside in a low dimensional, non-linear sub-manifold embedded in the original high dimensional data space. This discovery has inspired many to propose the use of many manifold learning methods in face recognition. He [3] proposed a Locality Preserving Projection (LPP) called the Laplacianfaces method, which finds an embedding that preserves the local structure of the image space. Being similar to LDA, LPP also confronts the SSS if it is used directly for the face recognition task, so Laplacianfaces extracts the low-dimensional features of the image by first projecting the samples into PCA space.

PCA, LDA and LPP are 1D linear projecting methods. The common disadvantage of these methods is that an image must be presented as a vector, which not only causes the SSS but also the loss of the structure information residing in the 2D image. To address this problem, two techniques are introduced to improve efficiency. One is a kernel technique, which has been reported in [4], [5],[6] and [7]. The other is a 2D linear projecting method. Yang [8] developed the Two-Dimensional Principal Component Analysis (2DPCA) method; Li [9] developed the Two-Dimensional Linear Discriminant Analysis (2DLDA) method, and Hu [10] proposed the Two-Dimensional Locality Preserving Projection (2DLPP) method. 2D techniques compute eigenvectors of the so-called image covariance matrix directly, without matrix-to-vector conversion, making 2D techniques more efficient than 1Dversions.

The feature of 2D techniques is the matrix, and the dimension is much greater than 1D techniques, where the feature is the vector. At the same time, 2DPCA, 2DLDA and 2DLPP extract the feature of the image in the row (or column) direction only, so the information of the row (or column) direction is not correlative, but the information of the column (or row) is still correlative. Certain methods are proposed to deal with this issue. The first method is the so-called two-directional two-dimensional method, in which 2D techniques are carried out twice. Zhang [11] proposed Two-Directional Two-Dimensional Principal Component Analysis ((2D)²PCA); S. Noushath [12] developed Two-Directional Two-Dimensional Linear Discriminant Analysis ((2D)²LDA), and Guo [13] proposed Two-Directional Two-Dimensional Locality Preserving Projection ((2D)²LPP). Two-directional two-dimensional methods extract features from both the row direction and the column direction, and the dimension of the feature matrix is much less than in the two-dimensional techniques. As mentioned above, (2D)²PCA and (2D)²LPP are unsupervised learning methods which are not suitable for classification tasks such as face recognition. (2D)²LDA is a supervised learning method, but some research indicates that it is ambiguous on computing because there are different ways to construct the covariance matrix in the(2D)²LDA approach. The second type of method pays attention to the algorithm itself, which tries to amend 2D techniques by combining other optimal methods such as [14] and [15]. These methods improve the performance of recognition by adding optimal methods to 2D techniques; at the same time, the design of algorithm complexity is also improved. Recently, some methods that hybridize different feature extraction techniques have become popular. These methods combine more than two kinds of feature extraction techniques to enhance the recognition accuracy on the ORL,Yale and other famous databases, as reported in [16] and [17]. Qi [18] proposed another two-direction two dimensional method called (2D)²PCALDA, which combines 2DPCA and 2DLDA. It is reported that the performance of (2D)²PCALDA is better than (2D)²PCA and (2D)²LDA on the ORL and Yale databases.

The main disadvantage of two-directional two-dimensional compression methods is that the information of the feature matrix presentsonly a single feature. For example, (2D)²LPP preserves the local structure of the training samples without including the label information of the classes. By contrast, (2D)²LDA is a supervised learning method which contains the label information of the classes but loses the local structure of the training samples. Inspired by (2D)²PCALDA, in this paper we propose a similar method called (2D)²LDALPP. The feature matrix can be obtained by projecting an image into the column-directional alternative 2DLDA and the row-directional alternative 2DLPP sequentially. The experiment's results on subsets of the Feret, ORL and Yale databases show that the proposed method is effective. In the field of face recognition, there are always controversies about the competence of 2D subspace analysis methods. Wang [19] compared 2DPCA and 2DLDA on the ORL and Cas-peal databases. Rao[20] discussed the performance of some kinds of subspace analysis methods in five noise conditions. Lu [21] investigated dimensionality reduction approaches for direct feature extraction from tensor data. However, there are still no certain conclusions about this issue. In this paper, we also focus on the performance of present two-directional two-dimensional methods, which have been reported in [11],[12],[13] and [18], as well as our proposed method. We compare the recognition accuracy and computing complexity and discuss the advantages and disadvantages of these methods.

The rest of this paper is organized as follows. Section 2 briefly reviews alternative 2DLDA and 2DLPP methods; in Section 3,the idea of the proposed method is described; the experimental results and analysis are presented in Section 4, and the conclusion is given in Section 5.

2. Overview of alternative 2DLDA and 2DLPP approaches

2.1 Alternative2DLDA

Suppose {X_ij} are the training images, which contain c classes, and ith class ω_i has N_i training samples, the total training samples are N = Σ^c_i=1 N_i. The ith class jth sample is denoted by m × nmatrix X_ij. The between-class scatter matrix S_b, within-class scatter matrix S_w and total class scatter S_t, are defined as follows[9]:

S_{b} = \sum_{i = 1}^{c} N_{i} ({\bar{X}}_{i} - \bar{X}) {({\bar{X}}_{i} - \bar{X})}^{T}

(1)

S_{w} = \sum_{i = 1}^{c} \sum_{j = 1}^{N_{i}} (X_{i j} - {\bar{X}}_{i}) {(X_{i j} - {\bar{X}}_{i})}^{T}

(2)

S_{t} = S_{b} + S_{w} = \sum_{i = 1}^{c} \sum_{j = 1}^{N_{i}} (X_{i j} - \bar{X}) {(X_{i j} - \bar{X})}^{T}

(3)

In Eqs.(1), (2) and (3), X̄ and X̄_i denote the mean of all samples and the mean of the ith class samples, respectively. The approach of alternative 2DLDA attempts to seek a set of projecting vectors A_d = (a_i,a₂,…,a_d) that best discriminates different face classes by maximizing the criterion function J(a) of alternative 2DLDA as

J (a) = \frac{a^{T} S_{b} a}{a^{T} S_{w} a}

(4)

The vector a_opt, which maximizes the function J(a), is called the optimal discriminant vector. The physical interpretation of maximization J(a) is that the ratio of between-class scatter matrix S_b,, and within-class scatter matrix S_w is maximal. That is, in this projecting direction, the samples of different class are more scattered and the samples of the same class are closer. Introducing Lagrange multiplier method, maximizing function J(a) is equal to the computed maximum value of function f:

f = \sum_{i = 1}^{d} a_{i}^{T} S_{b} a_{i} - \sum_{i = 1}^{d} λ_{i} (a_{i}^{T} S_{w} a_{i} - 1)

(5)

Let ∂f/∂a_i = 0, the objective function can be reduced to an eigen equation:

S_{b} a = λ S_{w} a

(6)

Obviously, if S_w is non-singular, the optimal vector of 2DLDA is the eigenvector corresponding to the maximal eigenvalue of structure S_w⁻¹S_b. Generally, it is not enough to have only one optimal projection vector, so the discriminant vector A_d is composed of the orthogonal eigenvectors a₁,a₂,…,a_d of S_w⁻¹S_b corresponding to the first d largest eigenvalues. The feature matrix of X_ij is Y_ij = A^T_dX_ij by projecting X_ij into the subspace A_d, and the size of Y_ij is d × n.

2.2 2DLPP

Given N training samples, the objective function of 2DLPP [10] is defined as:

J = \sum_{i=1}^{N} \sum_{j=1}^{N} ||Y_{i} - Y_{j}|| W_{i j}

(7)

where ‖.‖ means L₂ norm. Let X_i denote the ith sample, and Y_i denote the feature matrix of X_i. W_ij is the similarity of sample X_i and X_j, which can be defined as $W_{i j} = \{\begin{array}{l} e x p (- \frac{|X_{i} - X_{j}|}{t}), X_{i} is close to X_{j} \\ 0, else \end{array}$

The physical interpretation of minimizing J is that if X_i and X_j are close, the feature matrix Y_i and Y_j are also close. In this case, the samples in the low-dimensional space preserve the local manifold structure in the high dimensional space.

With the constraint

a^{T} X (D \otimes I_{n} {) X}^{T} a = 1

(8)

Introducing the Lagrange multiplier method, minimizing function J is equal to the solution of the eigenproblem of function as follows:

X (L \otimes I_{n} {) X}^{T} a = λ X (D \otimes I_{n} {) X}^{T} a

(9)

whereX = [X₁X₂ …X_N] is an m × Nn matrix which consists of all the training samples. D is a diagonal matrix defined as D_ii = Σ_i W_ij, and the entries of D is the column (or row) sum of W. L is Laplacian's matrix, defined as L = D – W. I_n is an identity matrix of n order, and operator ⊗ is the Kronecher product of the matrix. Vector a is the eigenvector of Eq. (9), corresponding to the smallest eigenvalue. If the d smallest eigenvalues of Eq. (9) are λ₁ < λ₂ < … < λ_d, and its corresponding eigenvectors a₁, a₂,…,a_d consist of the projecting matrix A_d = [a_l a₂ … a_d], the feature matrix of sample X_i is

Y_{i} = A_{d}^{T} X_{i}

(10)

Since A_d is an m × d matrix, the size of Y is d × n.

3. The proposed approach

3.1 Alternative 2DLPP approach

The essence of 2DLPP is the row-based LPP, which regards the row of images as a single sample for extracting the feature by carrying out LPP. A natural extension is to regard the column of images as a single sample for carrying out LPP. This method is named alternative 2DLPP. Obviously 2DLPP compresses the image in the row direction, while alternative 2DLPP does the same in the column direction.

Suppose the training set X = [X₁X₂ … X_N] has N training samples. The size of X_i is m × n. The main idea of alternative 2DLPP is to seek a column vector b_i of n order to obtain the row vector Y_i of m order by linear transformation Y_i = X_ib_i. Generally, it is not enough to select only one projecting vector, so the projecting subspace B_r is composed of r eigenvectors b₁,b₂,…,b_r. The feature matrix of X_i is Y_i = X_iB_r in the subspace B_r, and the size of Y_i is m × r.

Similarly, we can generalize an eigenproblem to obtain b, as shown in Eq. (11):

X^{T} (L \otimes I_{m}) X b = λ X^{T} (D \otimes I_{m}) X b

(11)

where X = [X₁^TX₂^T …X_N^T]^T is an Nm × n matrix. D is a diagonal matrix defined as D_ii =Σ_jW_ij. I_mis an identity matrix of m order, and operator ⊗ is the Kronecher product of matrix. Vector b is the eigenvector of Eq. (11) corresponding to the smallest eigenvalue. If the r smallest eigenvalues of Eq. (9) are λ₁ < λ₂ < … < λ_r, its corresponding eigenvectors b₁,…,b_r consist of the projecting matrix B = [b₁,b₂, …, b_r]

3.2 (2D)²LDALPP

Suppose we have obtained the projection matrix A_d(as in Section 2.1) and B_r (as in Section 3.1), projecting the m × n image X_ij into A_d and B_r sequentially to yield a d by d × r feature matrix Y_ij.

Y_{i j} = A_{d}^{T} X_{ij} B_{r}

(12)

We called algorithm (12) (2D)²LDALPP. The size of Y_ij is d × r, due to d « m and r « n, so the dimension of the image is compressed significantly.

From formula (12) we can see that the main idea of (2D)²LDALPP is that the original image X_ij is projected into the alternative 2DLDA subspace to extract the vertical direction feature A^T_dX_ij; A^T_dX_ij is then transposed to yield (A^T_dX_ij)^T, and alternative 2DLPP is utilized to extract the horizontal direction feature. Thus, the feature matrix Y_ij contains both the vertical direction feature and the horizontal direction feature of the original sample image. Obviously, the vertical direction feature represents the discriminant information and the horizontal direction feature preserves the manifold structure of the sample. Because 2DLDA is a supervised learning method, (2D)²LDALPP contains the class label of training samples, which is the reason that the recognition performance of (2D)²LDALPP is better than (2D)²LPPand (2D)²LDA.

Given the test sample X_T, Eq. (12) is used to obtain the feature matrix Y; a nearest neighbour is then used for classification. Here the distance between the two feature matrices Y and Y_ij is defined by

d (Y, Y_{ij}) = ||Y- Y_{ij}|| i =1,2, \dots,c; j=1,2, \dots, N_{i}

(13)

where‖Y-Y_ij‖ denotes the Euclidean distance between the two feature matrices Y and Y_ij. Suppose {X_ij} is the training set which contain c classesω₁,ω₂…ω_c; X_ij denotes the ith class the jth sample. If d(Y,Y_kj) = min_i d(Y,Y_ij) and Y_kj ∈ ω_k, the resulting decision is Y ∈ ω_k. The framework of (2D)²LDALPP is illustrated in Fig. 1.

Figure 1.

The Framework of (2D)²LDALPP

To summarize the preceding description, the (2D)²LDALPP algorithm is as follows:

Step 1. Computing the projection in the 2DLDA space: let S_w and S_b denote the within-class scatter matrix and the between-class scatter matrix. We compute the S_w and S_b by using Eqs.(1) and (2). Discriminant matrix A_d is composed of the orthogonal eigenvectors a₁,a₂,…,a_d of S_w⁻¹S_b corresponding to the first d largest eigenvalues. Compute the projection of training sample Z_ij = A_d^TX_ij.

Step 2. Feature matrix transposing: in this step, after projecting to the 2DLDA space, all the feature matrices $A_{d}^{T} X_{i j}$ of the training sample are transposed to yield Z_ij^T = (A_d^TX_ij)^T.

Step 3. Computing the projection in 2DLPP space: Z = {Z^T_ij}_{i=1,2,…,c,j=1,2,…Ni}; the eigenproblem of Eq. (11) is then generalized to obtain B_r.

Step 4. Extracting features: in this step, feature matricesY_ij and Y of the training samples X_ij and testing sample X_T can be obtained by projecting X_ij and X_T to A_d and B_r in sequence.

Step 5. Classifying testing samples: in this step, classification of the testing samples can be realized by using the dissimilarity defined in Eq. (13) and the nearest neighbour classifier.

3.3 Algorithm analysis

The separability of extraction features is the important factor in evaluating the performance of the algorithm. Next, we use experimental methods to determine the separability of five existing two-directional two-dimensional methods.

We select the first to fifth individuals in Yale B [22]. Each individual has 64 samples. Fig. 2 shows samples of the database in the five two-directional two-dimensional subspaces. Figs. 2 (a), (b), (c), (d) and (e) represent the separability of (2D)²PCA, (2D)²LDA, (2D)²LPP, (2D)²PCALDA and (2D)²LDALPP, respectively. The figures, from left to right, are the scatter diagrams of each method in terms of the first and second projected vectors, the first and third projected vectors, and the first and fourth projected vectors. It is clear from Fig. 2 that (2D)²LDALPP has significantly higher separability than the other four methods for classifying the samples in the database, because the classes in (2D)²LDALPP have larger between-class distance and smaller within-class scatter. At the same time, it seems that (2D)²LPP is better than the last three methods:we believe the key factor is possibly that it preserves the manifold structure of the sample when the data is projected from high dimensional space to low dimensional space.

Figure 2.

Distribution of some samples using the two features in two-directional two-dimensional subspace

The five methods we discussed above can be classified into two types: the single feature extraction type and the hybrid feature extraction type. (2D)²PCA,(2D)²LDA and(2D)²LPP belong to the first type, while (2D)²PCALDA and (2D)²LDALPP belong to the second type. The proposed method (2D)²LDALPP has two advantages. Firstly, comparing with (2D)²PCA,(2D)²LDA and (2D)²LPP, (2D)²LDALPP extracts the vertical feature of images by using alternative 2DLDA, and the horizontal feature by using alternative 2DLPP. Such hybrid feature is suitable for feature representation, which is used to improve recognition performance [23]. Secondly, the difference between (2D)²PCALDA and (2D)²LDALPP is that the latter replaces 2DPCA by using 2DLPP to extract the row-directional feature of images. From Fig.2, we can see that (2D)²LDALPP performs better than (2D)²PCALDA The main reason is as follows. 2DPCA is the 2D version of the PCA approach, which aims at preserving the global structure of the data. On the other hand, 2DLPP is the 2D version of LPP, which aims at preserving the local structure of original data by explicitly considering the manifold structure. The basis function obtained by LPP is the eigenvectors of the local covariance matrix [24]. It is commonly recognized that the intrinsic feature of face images possibly resides in a low-dimensional sub-manifold embedded in the original high-dimensional data space [10] [16] [17] and [18]. Therefore, the feature extracted by 2DLPP can be used to represent the sub-manifold embedded in the high-dimension space to improve the recognition performance.

3.4(2D)²LPPLDA

It should be noted that an alternative way to combine 2DLDA and 2DLPP is to extract the vertical feature of images by using 2DLPP and the horizontal feature by using 2DLDA. That is, first the column directional information is extracted by applying 2DLPP, then the feature matrix is inversed and 2DLDA is used to extract the row-directional information, which can be called the (2D)²LPPLDA approach. Compared with (2D)²LPPLDA, (2D)²LDALPP has the advantage in computational complexity. If the size of the training sample is m × n,d₁,d₂ and N are the number of the row-projected vectors, the number of the column-projected vectors, and the training number, respectively. The training complexity of (2D)²LDALPPis O(n²d₁ + m²d₂ + nd₁N²), while that of (2D)²LDALPP is O(n²d₁ + m²d₂ + nmN²). Since d₂ « m, it is obvious that the complexity of the (2D)²LPPLDA is much larger than that of (2D)²LDALPP, so we select the (2D)²LDALPP as our proposed method to compare other existing two-directional two-dimensional methods.

4. Experimental results and analysis

In this section, the proposed method (2D)²LDALPP is used for face recognition and tested on three well-known databases: Feret [25], ORL [26] and Yale [2]. Feret and Yale are used to test the performance of the face recognition methods in conditions of varied facial expressions and illuminations. The ORL database is used to examine the performance of the methods under the condition of minor variations of scaling rotation. We compare (2D)²LDALPP with (2D)²PCA, (2D)²LDA, (2D)²LPP and (2D)²PCALDA for their recognition accuracy and computing time. For (2D)²LPP and (2D)²LDALPP, if samples X_i and X_jare connected (X_i and Xj are in the same class), we use $W_{ij} =exp (- \frac{|X_{i} - X_{j}|}{t})$ . Otherwise, we use W_ij = 0. The selection of the constant t is an open problem. Here, we selected t through repeated and numerous experiments according to the performance considerations. It is found that t = 8 × 10⁶ is the best in the range of 8 × 10ⁿ, n = 1,2,…,7. The experiments are carried out on a PC with Inter N270–1.6G CPU and 1G RAM memory on a Matlab7.0 platform.

We choose a subset of the Feret database consisting of 432 images from 72 individuals; each individual has set images, including a front image and its variations in facial expression and illumination. For the purpose of computation efficiency, the facial portion of each original image is cropped and resized to 56 × 48using nearest-neighbour interpolation. There are 40 different people in the ORL database and 10 different images of each person, making 400 images in total, and the size of each image is 112 × 92. All pictures are taken at different times, from different angles, with different facial expression (closed or open eyes, smiling or non-smiling surprised, annoyed, angry, excited) and with different facial details (with or without glasses, with or without beard, different hairstyle). The facial portion of each original image is cropped and resized to 56 × 48 using nearest-neighbour interpolation. The Yale database contains 165 greyscale images of 15 individuals, each of which has 11 different images showing various facial expressions in a range of lighting conditions. The facial portion of each original image is cropped and resized to 50 × 50 using nearest-neighbour interpolation from the original size of 320 × 243.

We conduct the experiments from three aspects. Firstly, we aim at examining the relationship between the recognition accuracy and the feature dimension, especially the low dimension of the feature matrix. Secondly, we consider the recognition accuracy corresponding to the training number. Finally, we compare the running time of these five methods and discuss their complexity.

4.1 Results on variety of feature dimensions

The relationship between the recognition performance and the feature dimension, especially in the lower dimension, is a crucial issue in face recognition systems. Due to the fact that face recognition systems are generally off-line, the feature dimension plays an important role in keeping the system in real-time. The lower the feature dimension, the faster the face recognition system. In this experiment, we try to find the effect of low dimension on face recognition performance.

The experimental database is split into two parts: one part is the training set, and the other is the testing set. On the Feret database, we randomly select two images per individual for the training samples, and the last four images are used as testing samples. Thus, the training set consists of 144 samples and the testing set consists of 288 samples. On the ORL database, six images per individual are used as training samples and the remaining four images are used as testing samples. The training set consists of 240 samples and the testing set consists of 120 samples. On the Yale database, six training samples per individual are selected randomly for training samples, and the last five images per individual are used as testing samples. The training set consists of 90 samples and the testing set consists of 75 samples.

In order to evaluate the recognition accuracy corresponding to the number of projected vectors, a classification experiment is conducted under a series of different dimensions. The size of the feature matrix is d × d (d is the number of the projected vector: its value is from 1 to 7) for (2D)²LDALPP, (2D)²PCA, (2D)²LDA, (2D)²LPP and (2D)²PCALDA Fig. 3(a) illustrates the recognition accuracy curves versus the number of projected vectors on the Feret database. From Fig.3(a), it can be seen that the recognition accuracy of (2D)²LDALPP, (2D)²PCALDA, (2D)²LDA increases rapidly with the number of projected vectors. When d>3, the recognition accuracy of (2D)²LDALPP, (2D)²PCALDA, (2D)²LDA is higher than (2D)²PCAand (2D)²LPP. The recognition accuracy approaches the best results at 78.13%, 49.65%, 73.26%, 55.78% and 77.78% at the dimensions of 6, 7, 7, (2D)²LPP,and (2D)²PCALDA, respectively.

Figure 3.

Recognition performance of different approaches with varying dimension of feature matrices of the Feret, ORL and Yale face databases

Fig. 3(b) reports the results on the ORL database. Fig.3(b) shows that, on this database, (2D)²LDALPP, (2D)²PCA, and (2D)²LPP are superior to (2D)²LDA and (2D)²PCALDA when d<4, but the performance of all methods seem to be closer to one another when d>4. The top recognition accuracies of (2D)²LDALPP, (2D)²PCA, (2D)²LDA, (2D)²LPP,and(2D)²PCALDAare 98.75%, 97.5%, 98.75%, 99.38% and 97.5%, when the dimension of the feature vectors are 5, 7, 7, 6 and 7, respectively

Fig. 3(c) illustrates the results on the Yale database. (2D)²LDALPP, (2D)²PCA, and (2D)²LPP are superior to (2D)²LDA and(2D)²PCALDA when d<4, but the performance of all methods seems to be close when d>4. The recognition accuracy approaches the best results at 89.17%, 80%, 77.5%, 80% and 74.17% at the dimensions of 7, 6, 5, 6 and 6 for (2D)²LDALPP, (2D)²PCA, (2D)²LDA, (2D)²LPP,and (2D)²PCALDA, respectively.

It can be observed from the results that, on different databases, the recognition performance varies from method to method when the number of projected vectors is relatively smaller. On the Feret database, (2D)²LDALPP,(2D)²LDA and (2D)²PCALDA out-perform (2D)²LPP and (2D)²PCA, but on the ORL and Yale databases, it seems that (2D)²LDALPP, (2D)²LPP and (2D)²PCA are superior to (2D)²LDA and (2D)²PCALDA. In our opinion, the reason for this is the size of class in the training set. The bigger the class, the more suitable it is for the supervised learning method. In our experiment, the class of the Feret database is 72, the largest, so the supervised learning methods (2D)²LDALPP,(2D)²LDA and (2D)²PCALDA show better performances than the unsupervised learning methods (2D)²LPP and (2D)²PCA On the other hand, the classes of ORL and Yale are relatively smaller, and (2D)²LPP and (2D)²PCA are more accurate than (2D)²LDA and (2D)²PCALDA It seems that an unsupervised method is more powerful than a supervised learning method. (2D)²LDALPP also achieves satisfactory performance on these databases, and we believe this is because 2DLPP is included in the algorithm. According to the above analysis, (2D)²LDALPP combines the main features of 2DLDA and 2DLPP, which endows it with powerful and robust ability to extract the more significant features from the training samples. Of course, the constraint of the conclusion is the smaller dimension of the feature matrix.

4.2 Results on variation in training number

In 4.1 we have discussed the relationship between recognition performance and the low feature dimension in order to acquire more detailed results about the recognition performance. In this experiment, we compare the performance of the proposed method (2D)²LDALPP with four other methods for varying numbers of training samples. A random subset with l(=2,3,4,5) samples per individual on the Feret database is taken with labels to form the training set. The rest of the database is used as the testing set. The performance of (2D)²LDALPP is compared with that of (2D)²PCA, (2D)²LDA, (2D)²LPP, and (2D)²PCALDA The top recognition accuracies are shown in Table 1. The values in parentheses denote the dimension of the feature matrix for the top recognition accuracy.

Table 1.

Comparison of different approaches in terms of top recognition accuracy (%) on FERET database

Methods	Number of training samples per class
Methods	2	3	4	5
(2D)²PCA	53.13(17×27)	79.17(21×31)	90.28(14×34)	97.22(28×34)
(2D)²LDA	79.51(12×25)	89.81(13×12)	94.44(12×11)	95.83(17×7)
(2D)²LPP	54.17(10×4)	81.94(15×15)	92.36(11×6)	95.83(11×8)
(2D)²PCALDA	80.93(8×12)	82.13(12×9)	97.22(12×17)	98.61(8×18)
(2D)²LDALPP	81.60(8×15)	94.44(12×10)	96.53(11×7)	97.22(5×10)

It is clear from Table 1 that the top recognition accuracy of (2D)²LDALPP is the highest with 2 or 3 training samples per class. The recognition accuracies are 81.6% and 94.44% respectively. When the number of training samples per classis 4 and 5, the top recognition accuracy of (2D)²PCALDA is the highest, 97.22% and 98.61% respectively. On the Feret database, we see that none of the methods achieves top recognition accuracy on all the training samples per class. For example, when the number of training samples per class is 2, the recognition accuracy of (2D)²LDALPP is the highest, (2D)²PCALDA is second, and both of them are higher than the last three methods;however, when the number of training samples per class is 3, the recognition accuracy of (2D)²PCALDA is the highest and (2D)²LDALPP is the second highest.

The attainment count of top recognition accuracy for the five methods on the Feret database is shown in Table 2. Both (2D)²LDALPP and (2D)²PCALDA attain top recognition accuracy twice;(2D)²PCA attains top recognition accuracy once, and neither of the other two methods attains top recognition accuracy. As mentioned above, in regard to recognition accuracy, the performance of (2D)²LDALPP and (2D)²PCALDA is superior to the other three methods on this database. In this experiment, the Feret database is used to test the performance of the face recognition methods in conditions of varied facial expressions and illuminations, so the conclusion is that (2D)²LDALPP and (2D)²PCALDA are more adaptive to a variety of facial expressions and illuminations. Comparing (2D)²LDALPP with (2D)²PCALDA, the former is better when the training number is less, but when the number increases, the latter begins to improve. There are two general characteristics of (2D)²LDALPP and (2D)²PCALDA: both of them are hybrids of two different subspace methods and the supervised learning method.

Table 2.

Comparison of different approaches in terms of top recognition accuracy counts on three databases

Database	Method	Top recognition counts	Number of training samples per class
	(2D)²PCA	1	8
	(2D)²LDA	4	2,3,4,5
ORL	(2D)²LPP	3	2,6,8
	(2D)²PCALDA	0	–
	(2D)²LDALPP	5	3,6,7,8,9
	(2D)²PCA	5	4,7,8,9,10
	(2D)²LDA	2	8,9
Yale	(2D)²LPP	2	9,10
	(2D)²PCALDA	0
	(2D)²LDALPP		2,3,5,6,8
	(2D)²PCA	1	5
	(2D)²LDA	0	–
Feret	(2D)²LPP	0	–
	(2D)²PCALDA	2	4,5
	(2D)²LDALPP	2	2,3

On the ORL database, a random subset with l(=2,3,4,5,6,7,8,9) samples per individual is taken with labels to form the training set. The rest of the database is used as the testing set. The top recognition accuracies are shown in Table 3. From the experiments, the conditions of recognition performance are complicated. The method which attains the top recognition accuracy is different with different numbers of training samples per class. It is found that none of the methods can attain top recognition accuracy all the time. From the statistical data in Table 2, (2D)²LDALPP attains top recognition accuracy five times, with 3,6,7,8 and 9 training samples per class. (2D)²LDA attains top recognition accuracy four times, with 2,3,4 and 5 training samples per class. (2D)²LPP and (2D)²PCA attain top recognition accuracy three times and once respectively. (2D)²PCALDA does not attain top recognition accuracy on this database. Because the ORL database is used to examine the performance of the methods under the condition of minor variations of scaling, rotation, our conclusion is that (2D)²LDALPP is quite adaptive in this condition, while (2D)²PCALDA is the weakest. On this database, the methods containing 2DPCA perform less well than the methods containing 2DLDA and 2DLPP.

Table 3.

Comparison of different approaches in terms of top recognition accuracy (%) on ORL database

Methods	Number of training samples per class
Methods	2	3	4	5
(2D)²PCA	87.50(5×15)	86.07(4×5)	92.08(4×12)	92.5(8×13)
(2D)²LDA	88.44(35×6)	90.00(29×15)	94.17(35×7)	94.5(15×9)
(2D)²LPP	86.25(18×4)	90.00(12×3)	93.33(7×5)	94.00(8×5)
(2D)²PCALDA	81.25(23×8)	88.21(13×16)	91.25(6×12)	92.00(6×10)
(2D)²LDALPP	86.25(18×6)	90.00(20×4)	93.33(11×12)	92.50(8×5)

Methods	Number of training samples per class
Methods	6	7	8	9
(2D)²PCA	97.50(5×8)	98.33(5×4)	98.75(5×5)	97.50(5×3)
(2D)²LDA	98.13(5×5)	98.33(7×8)	97.50(11×3)	97.50(4×4)
(2D)²LPP	99.38(6×6)	98.33(4×7)	98.75(4×10)	97.50(5×3)
(2D)²PCALDA	98.13(5×10)	97.50(6×6)	97.50(5×7)	97.50(14×4)
(2D)²LDALPP	99.38(7×13)	99.16(7×29)	98.75(7×6)	100.00(4×7)

On the Yale database, a random subset with l(=2,3,4,5,6,7,8,9) samples per individual is taken with labels to form the training set. The rest of the database is used as the testing set. Table 4 shows the experimental results. It is found that none of the methods achieve top recognition accuracy all the time on this database; however, we can see from Table 2 that (2D)²LDALPP,(2D)²PCALDA and (2D)²PCA attain top recognition accuracy five times, and (2D)²LDA and (2D)²LPP three times and twice respectively. In this experiment, we aim to test the performance of the face recognition methods under conditions of varied facial expressions and illuminations. It can be easily ascertained that (2D)²LDALPP and (2D)²PCA obtain better recognition accuracy compared to the other three methods. Compared with (2D)²PCA, (2D)²LDALPP attains the top recognition accuracy with relatively small training samples per class, which is very common in real face recognition systems. Something else that should be mentioned is that the performance of (2D)²PCA is quite different on the Yale database than on the Feret database. Although the aim with both these databases is to test performance in conditions of varied facial expressions and illuminations, it is revealed that (2D)²PCA is not stable on the Feret database, although it is possibly suitable for varying facial expressions and illuminations.

Table 4.

Comparison of different approaches in terms of top recognition accuracy (%) on Yale database

Methods	Number of training samples per class
Methods	2	3	4	5
(2D)²PCA	79.26(3×13)	83.33(14×6)	91.43(9×23)	88.89(11×11)
(2D)²LDA	78.52(14×12)	83.33(18×10)	87.62(20×5)	91.11(25×12)
(2D)²LPP	79.26(14×6)	87.50(12×5)	87.62(21×5)	87.78(15×11)
(2D)²PCALDA	77.04(13×12)	87.50(11×23)	87.62(6×17)	90.00(11×12)
(2D)²LDALPP	82.22(10×25)	90.83(6×7)	87.62(7×13)	92.22(7×15)

Methods	Number of training samples per class
Methods	6	7	8	9	10
(2D)²PCA	88.00(9×8)	100.00(6×11)	100.00(4×11)	96.67(4×11)	100.00(2×11)
(2D)²LDA	93.33(17×7)	98.33(11×14)	100.00(7×5)	96.67(13×3)	93.33(5×2)
(2D)²LPP	89.33(18×7)	95.00(18×6)	97.78(4×7)	96.67(4×5)	100.00(2×8)
(2D)²PCALDA	93.33(9×8)	88.33(13×1)	91.11(8×10)	86.67(17×1)	93.33(4×2)
(2D)²LDALPP	94.67(10×14)	88.33(9×8)	100.00(9×17)	93.33(18×7)	93.33(8×3)

If we look at the top recognition accuracy together with the three databases, (2D)²LDALPP achieves it 12 times, (2D)²PCA seven times, (2D)²LDA six times, (2D)²LPP five times, while (2D)²PCALDA only reaches it twice. It is obvious that (2D)²LDALPP performs more effectively for face recognition. From the experiments above, we can conclude that, from the point of view of recognition accuracy,(2D)²LDALPP is the best option, having attained the greatest top recognition accuracy on three experimental databases. (2D)²PCALDA shows good performance on Feret but is not stable because of its weakness on the ORL and Yale databases. It does not adapt to the variety of scaling and rotation. It should be noted that (2D)²LDALPP shows superior performance with small training samples and low-dimensional features, and it is feasible in actual application.

4.3 Results on running time

The running times are also compared when the five methods attain top recognition accuracy on the Feret, ORL and Yale databases. The results are shown in Table 5. In this experiment, we select a training number of 6 per class in the ORL and Yale databases, and 2 per class in the Feret database. For each method, we calculate the training time and testing time, and they are displayed in the fourth and fifth columns of Table 5; the total time is the training time plus testing time. In Table 5, the test times of the five methods on each database are similar, while the training times are quite different. It is well known that the decisive factor of testing time is the size of the feature matrix. Because the sizes of the feature matrices are close, the testing times among them is quite similar. The most different training time is that of (2D)²LPP, which is far greater than that of the other methods.

Table 5.

Comparison of different approaches in terms of running time on three databases

Database	Method	Number of training samples per class	Training time(s)	Testing time(s)	Total time(s)
	(2D)²PCA		3.688	3.000	6.688
	(2D)²LDA		3.937	3.109	7.046
ORL	(2D)²LPP	6	18.438	2.875	21.313
	(2D)²PCALDA		4.589	3.484	8.073
	(2D)²LDALPP		6.281	3.203	9.484
	(2D)²PCA		1.562	1.062	2.624
	(2D)²LDA		1.641	1.000	2.641
Yale	(2D)²LPP	6	18.50	1.094	19.594
	(2D)²PCALDA		1.625	0.938	2.563
	(2D)²LDALPP		2.297	1.031	3.328
	(2D)²PCA		2.313	5.453	7.766
	(2D)²LDA		1.906	4.734	6.640
Feret	(2D)²LPP	2	8.219	3.141	11.36
	(2D)²PCALDA		1.781	3.500	5.281
	(2D)²LDALPP		4.609	3.656	8.165

The results can be explained by the complexity of the methods. If the size of the training sample is m × n,d₁,d₂ and N are the number of the row-projected vectors, the number of the column-projected vectors, and the training number, respectively. The training complexity of (2D)²PCA, (2D)²LDA and (2D)²PCALDA is O(n²d₁ + m²d₂). The training complexity of (2D)²LPPis O)(n²d₁ + mnN² + m²d₂ + nd₁N²). The training complexity of (2D)²LDALPPis O(n²d₁ + m²d₂ + nd₁N²). It is obvious that mnN² is the greatest part of O(n²d₁ + mnN² + m²d₂ + nd₁N²). We notice also that the training complexity of (2D)²LDALPP is precisely mnN²less than (2D)²LPP. Because m × n is the size of original training sample, it is too big to compute mnN² directly. For this reason, on a large database it is very difficult to utilize (2D)²LPP directly. Although (2D)²LDALPP is also involved in computing 2DLPP, due to the fact that it firstly utilizes 2DLDA, and reduces the dimension of the image dramatically, the computed matrix is much smaller than (2D)²LPP. As a result, the training time is also less than the latter. Compared with(2D)²PCA, (2D)²LDA and (2D)²PCALDA, the training complexity of (2D)²LDALPP is more than nd₁N². The training time of (2D)²LDALPP is still more than these three methods, and future research should focus on this issue.

5. Conclusion

Dimension reduction is the key problem in face recognition systems. Extracting stable and reliable features for improving recognition accuracy is an important task. Two-dimensional subspace methods can extract the features by keeping the structure of the face image. Two-directional methods can extract features from both the row and column of the image; thus, two-directional two-dimensional methods have become popular in face recognition. In this paper, an efficient two-directionaltwo-dimensional feature extraction method for face recognition called (2D)²LDALPP is proposed. The difference between the proposed method and 2DLPP and 2DLDA is that the former works on both the row direction and column direction of the image, whereas the latter two methods work only in one direction. Compared with (2D)²LDA and (2D)²LPP, the proposed method not only preserves the modified structure of the samples but also contains the label information of the classes, meaning the proposed method is a supervised method and has fewer dimensions of feature matrix. Experimental results show that the performance of the proposed method is effective. Another contribution of this paper is its comparison of the existing two-directional two-dimensional feature extraction methods, and the analysis of their adaptability on the Feret, ORL and Yale databases.

From the experiments, we can draw a number of conclusions:

The performance of differenttwo-directional two-dimensional methods is different according to the variation in training number. None of the methods is superior to other methods in all conditions.

The (2D)²LDALPP method appears to be the best with the smaller dimension of the feature matrix.

From a statistical standpoint, (2D)²LDALPP performs better than the other methods when the number of the training sample is varied. It is therefore credible that (2D)²LDALPP is more adaptable and robust against a variety of face images.

Using hybrid features is an effective way to improve the performance of face recognition systems; however, the compatibility of the features is of primary importance. In our experiments, another hybrid method,(2D)²PCALDA, is not as stable as (2D)²LDALPP.

From the experiments and analysis of the complexity of the methods, the running time of (2D)²LDALPP is less than (2D)²LPP but more than(2D)²PCA, (2D)²LDA and(2D)²PCALDA.

It should be noted that the disadvantage of the method is that the training time is longer than some two-directional two-dimensional methods. In our future work, we will study the use of techniques such as the block-wise method to try and solve this problem. Furthermore, the reason why five two-directional two-dimensional subspace analysis methods perform differently when the training number is varied needs to be explained in terms of theory.

Footnotes

6. Acknowledgments

This research was supported bythe National Natural Science Foundation of China (grant no. 51175389) and the Fundamental Research Funds for the Central Universities, China (grant no. 2011-VI-058).

7. Appendix

Table 6.

Notation table

Notation	Explanation
PCA	Principal Component Analysis
LDA	Linear Discriminant Analysis
LPP	Locality Preserving Projection
2DPCA	Two-Dimensional Principal Component Analysis
2DLDA	Two-Dimensional Linear Discriminant Analysis
2DLPP (2D)²PCA	Two-Dimensional Locality Preserving Projection Two-Directional Two-Dimensional Principal Component Analysis
(2D)²LDA	Two-Directional Linear Discriminant Analysis
(2D)²LPP	Two-Directional Locality Preserving Projection
(2D)²PCALDA	Two-Dimensional Principal Component Analysis plus Two-Dimensional Linear Discriminant Analysis
(2D)²PCALPP	Two-Dimensional Principal Component Analysis plus Two-Dimensional Locality Preserving Projection
(2D)²LDALPP	Two-Dimensional Linear Discriminant Analysis plus Two-Dimensional Locality Preserving Projection

References

Turk

Pentland

, Eigenfaces for Recognition, Journal of Cognitive Neuroscience, 1991,3,(1),pp.71–86.

Belhumeur

P.N.

Hespanha

J.P.

Kriegman

D.J.

, Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection, IEEE Transactions on Pattern Analysis and Machine Intelligence, 1997, 19,(7),pp.711–720.

Yan

, Face Recognition Using Laplacianfaces. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005,27,(3),pp.328–340.

Yang

M.H.

Ahuja

, and Kriegman

, Face Recognition Using Kernel Eigenfaces, Proceedings of the 2000 IEEE International Conference on Image Processing, Vancouver, Canada, September 2000, pp.37–40.

Yang

M.H.

, Kernel Eigenfaces vs. Kernel Fisherfaces: Face Recognition Using Kernel Methods, Proc. of the Fifth IEEE International Conference on Automatic Face and Gesture Recognition, Washington D.C., USA, May 2002,pp.215–220

Zhou

Tang

, A Modification of Kernel Discriminant Analysis for High Dimensional Data-with Application to Face Recognition. Signal Processing.2010,90,(8),pp.2423–2430.

Zeng

Zhang

Cheng

, Kernel-based Nonlinear Discriminant Analysis Using Minimum Squared Errors Criterion for Multiclassand Undersampled problems. Signal Processing. 2010, 90, (8),pp.2333–2443.

Yang

Zhang

, Two-Dimensional PCA:A New Approach to Appearance-Based Face Representation and Recognition. IEEE Transaction on Pattern Analysis and Machine Intelligence. 2004, 24, (1), pp.131–135.

Yuan

,2D-LDA:A Statistical Linear discriminant analysis for image matrix. Pattern Recognition letters, 2005,26,(5),pp.527–532.

10.

Feng

Zhou

, Two-dimensional Locality Preserving Projections(2DLPP) with its Application to Palmprint Recognition [J]. Pattern Recognition. 2007,40,(1),pp.339–342.

11.

Zhang

D.Q.

Zhou

Z.H.

, (2D) 2PCA: 2- Directional 2Dimensional PCA for Efficient Face Representation and Recognition.Recognition Neurocomputing. 2005, 69,(1–3),pp.224–231.

12.

Noushath

Kumar

Hemantha G.

Shivakumara

, (2D)2LDA: An Efficient Approach for Face Recognition. Pattern Recognition 2006, 39, (7), pp. 1396–1400.

13.

Guo

Yang

Jiao

,Face Recognition Method Based on Two-Dimensional Locality Preserving Projections. Computer Engineering, 2011,37, (7), pp. 4–6.

14.

Mutelo

R.M.

Woo

W.L.

, and Dlay

S.S.

,Discriminant analysis of the two-dimensional Gabor features for face recognition. IET Computer Vision, 2008,2,(2), pp. 37–49.

15.

Gao

Xie

,Two-dimensional Supervised Local Similarity and Diversity Projection. Pattern Recognition. 2010,43,(10),pp.3359–3363.

16.

Wang

Chen

,A New Framework to Combine Vertical and Horizontal Information for Face Recognition.Neurocomputing. 2009, 72, (4–6), pp.1084–1091.

17.

Tao

Maybank

S.J.

,General Tensor Discriminant Analysis and Gabor Features for Gait Recognition.IEEE Transactions on Pattern Analysis and Machine Intelligence.2007,29,(10),pp.1700–1715.

18.

Zhang

, (2D)²PCALDA:An Efficient Approach for Face Recognition. Applied Mathematics and Computation.2009,213,(1),pp.1–7.

19.

Wang

Huang

Fang

Liu

,2DPCA vs. 2DLDA:Face Recognition Using Two-dimensional Method. International Conference on Artificial Intelligence and Computational Intelligence. Shanghai, China. November 2009, pp.357–360.

20.

Rao

Noushath

,Subspace Methods for Face Recognition. Computer Science Review. 2010, 4, (1), pp.1–17.

21.

Plataniotis

K.N.

Venetsanopoulos

A.N.

,A Survey of Multilinear Subspace Learning for Tensor DataPattern Recognition.2011,44,(7),pp.1540–155.

22.

Georghiades

A.S.

Belhumeur

P.N.

Kriegman

D.J.

,From Few to Many: Illumination Cone Models for Face Recognition under Variable Lighting and Pose.IEEE Trans. Pattern Anal.Mach. Intelligence. 2001, 23, (6), pp. 643–660.

23.

Saradha

, and Annadurai

,A Hybrid Feature Extraction Approach for Face Recognition Systems. ICGST International Journal on Graphics, Vision and Image Processing.2005,5,(5),pp.23–30.

24.

Cai

Min

, Statistical and Computational Analysis of Locality Preserving Projection. Proceeding of the 22^nd International Conference on Machine Learning.Bonn, Germany, August 2005, pp.21–288.

25.

Phillips

P.J.

Moon

Rauss

P.J.

Rizvi

,The FERET evaluation methodology for face recognition algorithms. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2000,22,(10),pp.1090–1100.

26.

Samaria

F.S.

Harter

A.C.

,Parameterisation of a stochastic model for human face identification. Proceedings of 2nd IEEE Workshop on Applications of Computer Vision. Sarasota FL, December 1994,pp.138–142.

(2D) 2 LDALPP: A Novel Approach to Face Recognition

Abstract

Keywords

1. Introduction

2. Overview of alternative 2DLDA and 2DLPP approaches

2.1 Alternative2DLDA

2.2 2DLPP

3. The proposed approach

3.1 Alternative 2DLPP approach

3.2 (2D)2LDALPP

3.3 Algorithm analysis

3.4(2D)2LPPLDA

4. Experimental results and analysis

4.1 Results on variety of feature dimensions

4.2 Results on variation in training number

4.3 Results on running time

5. Conclusion

Footnotes

6. Acknowledgments

7. Appendix

References

3.2 (2D)²LDALPP

3.4(2D)²LPPLDA