Symmetrical Two-Dimensional PCA with Image Measures in Face Recognition

Abstract

In this paper, weextensively investigate symmetrical two-dimensional principal component analysis (S2DPCA) and introduce two image measures for S2DPCA-based face recognition, volume measure (VM) and subspace distance measure (SM). Although symmetrical featuresare an obviously but not absolutely facial characteristic, they have been successfully applied to PCA and 2DPCA. The paper gives detailed evidence that even and odd subspaces in S2DPCA are mutually orthogonal, and particularly that S2DPCA can be constructed using a quarter of the conventional S2DPCA even/odd covariance matrix. Based on these theories, we investigate the time and memory complexities of S2PDCA further, and find that S2DPCA can in fact be computed using a quarter of the time and memory compared to conventional S2DPCA. Finally, VM and SM are introduced to S2DPCA for final classification. Our experiments compare S2DPCA with 2DPCA on YALE, AR and FERET face databases, and the results indicate that S2DPCA+VM generally outperforms other algorithms.

Keywords

Symmetrical Two-Dimensional PCA Image Measures Volume Measure Subspace Distance Measure Face Recognition

1. Introduction

Face recognition is now becoming a very popular subject for research. It can be applied in many fields[1], including mobile robots, information security, and entertainment. Over the past two decades, many face recognition approaches have been proposed. Typical approaches include principal component analysis (PCA), linear discriminant analysis (LDA), independent component analysis (ICA), neural network (NN), support vector machine (SVM), and kernel PCA/LDA [1 –3]. On the whole, these approaches are based on appearance image and have some statistical significance. Recently, 3D-based face recognition has also been a hot research area[4].

There are many approaches, including those cited above, to improve recognition performance from a statistical point of view whilst combating the “dimensionality curse” problem [5]. However, theseapproaches barely consider characters of image itself. Local binary patterns (LBP), elastic graph matching (EGM), and so on [6] consider the topologic or local characters of images, and have also been applied in face recognition.

Appearance-based statistical face recognition algorithms usually use vector-based face samplesobtained from two-dimensional (2D) face images by techniques such as concatenation [7]. Vector-based face samples usually have a large number of dimensions. However, the number of the samples is small, so the “dimensionality curse” problem is always bred. 2DPCA, which is founded on a 2D face data format rather than 1D vector-based face images, shows high performance in dealing with the “dimensionality curse” problem. Why does the 2DPCA work? The row or column of the 2D face image has a low dimensionality, so the covariance matrix in 2DPCA is more precise than thatin PCA. This is the key to solving the impact of the “dimensionality curse” problem in 2DPCA-based face recognition[4,8]. (The covariance matrix in face recognition based on PCA is not so precise, because the sample is of high dimensionality while the number of samples is small.) Other algorithms based on 2D face data and which use 2DPCA are also proposed, e.g., two-dimensional LDA (2DLDA) [9], two-dimensional Laplacianfaces [10], and diagonal PCA [11]. In [12], Gao et al. proved that 2DPCA is not equivalent to a special case of block-based PCA, where each block is a column (or row) of the image. Moreover, from the view of tensor analysis, 2DPCA or 2DLDA can be considered as the variant of the second rank tensor decomposition [13,14]. The typical classification measure used in 2DPCA is the sum of the Euclidean distance between two feature vectors in a feature matrix [7] and is called distance measure (DM) or Yang distance. Zuo et al. furthermore proposed a generalized Yang distance [15]. Volume measure (VM) [16] based on the volume of matrix has also been proposed for 2DPCA in face recognition. Recently, another rule, subspace distance measure (SM) [17,18], has also been proposed to measure the similarity between two images.

In fact, 2DPCA(2DLDA)-based approaches do not acknowledge the characters of the face image itself, and solve the impact of the “dimensionality curse” problem based on pure statistics. Symmetrical principal component analysis (SPCA) [19], an approach obtained by the even-odd decomposition principle, expands samples from the facial symmetry [19,20]. In essence, this approach improves recognition performance by exploiting priorknowledge, i.e., the symmetrical characters of the face image. Good performance has also been obtained by SKPCA [21]. However, these approaches are all based on 1D vectorized face images. Recently, a new facial symmetry approach called symmetry two-dimensional PCA (S2DPCA), has been proposed [22]. S2DPCA utilizes even and odd decompositions based on the 2D original and mirror face samples. On the one hand, S2DPCA can exploit more priorknowledge than SPCA, producing more new samples. It can also obtain a more precise covariance matrix, just like 2DPCA, using more samples than PCA or SPCA.

Although S2DPCA has been proposed in [22], the theories of S2DPCA, e.g., even/odd subspace orthogonality and computational cost, remain open, and the classifier was unique. Motivated by the conclusions on SPCA [5,19], we give more proofs of the corresponding theories on S2DPCA here. We offer further proof and clarify the pointthat even and odd subspaces in S2DPCA are mutually orthogonal [23] and in particular give a theoretical foundation to the observation that the even or odd subspace can be founded on a quarter of the conventional S2DPCA's even/odd covariance matrix. The time and memory space complexities of 2DPCA, conventional S2DPCA, and our S2DPCA are also compared. S2DPCA based on our theories requires the least time and memory space among these algorithms, with a quarter of the time and memory costs compared to conventional S2DPCA.

VM, mentioned above, is directly based on matrix data, i.e., a 2D image format [16]; it can therefore be regarded as a kind of image measure like SM. From the point of view of data format, image measures fit better for 2DPCA or S2DPCA-based face recognition, as compared to DM. We therefore investigate 2DPCA and S2DPCA with DM, VM and SM separately through experiments using a variety of face databases, and find that S2DPCA with VM generally gives better performance than others.

The remainder of this paper is organized in the following way. Ideas on S2DPCA and classification measures are developed in Section 2. Section 3 shows our experimental results. The last section concludes this paper and discusses future work.

2. Symmetrical two-dimensional PCA

Compared with PCA, 2DPCA gives better performance by computing the covariance matrix more accurately. SPCA exploits the original face training samples and the mirror samples, as well as even and odd decompositions, thus making use of more discriminative information [21]; S2DPCA, on the other hand, can use a pure statistical mathematical technique, as well as the characters of face image, to improve the recognition performance.

2.1 Principles

S2DPCA combines the idea of SPCA with that of 2DPCA. Suppose the set {X_m1,X_m2,…,X_mN} is the mirror set of the original training sample set{X₁,X₂,…X_N}. Then, X _mi (i = 1,2,…,N) can be obtained by the following equation:

X_{m i} = X_{i} M

(1)

Here, M denotes the mirror transform matrix, which is a unit sub-diagonal matrix; i.e., the sub-diagonal entries of M are all 1, and the rest are all 0. If set X_i is a h × w matrix as mentioned above, then M is a w×w matrix. The even and odd samples can therefore be computed separately by the following equations:

X_{e i} = (X_{i} + X_{m i}) / 2 = X_{i} (I + M) / 2

(2)

X_{o i} = (X_{i} - X_{m i}) / 2 = X_{i} (I - M) / 2

(3)

\begin{array}{l} C = E [{(X - E (X))}^{T} (X - E (X))] \\ \begin{array}{r} ​ \end{array} = \frac{1}{N} \sum_{i = 1}^{N} {(X_{i} - \bar{X})}^{T} (X_{i} - \bar{X}) \end{array}

(4)

In equations (2) and (3), I denotes the w × w unit matrix. As the equation (4) defined, C denotes the covariance matrix of the original training samples. In addition, we set C_e, C_o and C_m respectively to be the covariance matrix of even, odd and mirror training samples. We apply PCA to C_e and C_o, respectively, and then choose U_e and U_o as the feature matrices. The selection strategy is to order even and odd eigenvectors according to their eigenvalues, and then select eigenvectors corresponding to larger eigenvalues [22]. Then, the low dimensional even and odd features (coefficients) can be extracted respectively by projecting the original sample to even/odd subspaces. This process is called conventional S2DPCA [22]. To further improve conventional S2DPCA, we have proved several theorems, outlined below.

2.2 Theorems on S2DPCA

In fact, if we set each even and odd training sample as X_ei : = X_ei − X̄_e and X_oi := X_oi − X̄_o (X̄_e and X̄_o denote the even and odd centroid, respectively), the mean sample matrix of the new training sample set is 0 mean matrix. So, without loss of generality, we suppose the mean sample matrix of the even and odd training sample matrix is a 0 mean matrix. Then, the mean sample matrix of all training sample matrices is also a 0 mean matrix. This hypothesis can simplify the deduction of theorems.

Theorem 1: Set $C^{\oplus} = C_{e} + C_{o}$ . Then the following equation holds: $C^{\oplus} = C + C_{m} / 2$

Proof: Since the mean sample matrices of even, odd and all samples are all 0 mean matrices, from the definition of C_e and C_o we can get:

\begin{array}{l} C_{e} = \frac{1}{N} \sum_{i = 1}^{N} X_{e i}^{T} X_{e i} = \frac{1}{N} \sum_{i = 1}^{N} ({(I + M)}^{T} X_{i}^{T} X_{i} (I + M)) / 4 \\ C_{o} = \frac{1}{N} \sum_{i = 1}^{N} X_{o i}^{T} X_{o i} = \frac{1}{N} \sum_{i = 1}^{N} ({(I - M)}^{T} X_{i}^{T} X_{i} (I - M)) / 4 \\ \Rightarrow C^{\oplus} = C_{e} + C_{o} \\ \begin{matrix} ​ & \begin{matrix} ​ \end{matrix} \begin{matrix} ​ \end{matrix} \end{matrix} = \frac{1}{N} \sum_{i = 1}^{N} ({(I + M)}^{T} X_{i}^{T} X_{i} (I + M)) / 4 + \\ \begin{matrix} ​ & ​ & ​ \end{matrix} \frac{1}{N} \sum_{i = 1}^{N} ({(I - M)}^{T} X_{i}^{T} X_{i} (I - M)) / 4 \\ \begin{matrix} ​ & \begin{matrix} ​ \end{matrix} \begin{matrix} ​ \end{matrix} \end{matrix} = \frac{1}{N} \sum_{i = 1}^{N} (X_{i}^{T} X_{i}) / 2 + \frac{1}{N} \sum_{i = 1}^{N} (M^{T} X_{i}^{T} X_{i} M) / 2 \\ \begin{matrix} ​ & \begin{matrix} ​ \end{matrix} \begin{matrix} ​ \end{matrix} \end{matrix} = (C + C_{m}) / 2 \end{array}

(5)

Theorem 2: Let U^⊕ and Λ^⊕ denote the eigenvector matrix and the non-zero eigenvalue matrix of the covariance matrix C^⊕, respectively. λ_e denotes the non-zero eigenvalue matrix corresponding to the eigenvector matrix U_e of the covariance matrix C_e. Similarly, λ_o denotes the non-zero eigenvalue matrix corresponding to the eigenvector matrix U_o of the covariance matrix C_o. The following equations then hold:

U_e ^T U_o = 0;

$U^{\oplus} = [U_{e} \begin{matrix} \end{matrix} U_{o}], Λ^{\oplus} = [\begin{array}{l} Λ_{e} \begin{matrix} \end{matrix} 0 \\ 0 \begin{matrix} \end{matrix} \begin{matrix} \end{matrix} Λ_{o} \end{array}]$ .

Proofs: 1. Set U_e = [u_e1 … u_e2 … u_em], U_o = [u_o1 u_o2 … u], then, to prove the equation U_e ^T U_o = 0 holds, < u_ei, u_oj > = 0 (i ∈[1,2,…,m], j ∈[1,2,…, n]).

Since U_e and U_o denote C_e and C_o's eigenvector matrix, while C_e and C_o denote the even samples and odd samples' covariance matrix separately, and the eigenvector can be linearly approximated by samples [24], we have $u_{e i} = \sum_{k = 1}^{L} a_{i k} x_{e k}, u_{o j} = \sum_{l = 1}^{M} a_{j l} x_{o l} . x_{e k}$ and $x_{o l}$ denote the kth row of even samples and the l th row of odd samples; a_ik and a_jl are the corresponding coefficients. x_ek is a mirrored symmetric vector and x_ol is a mirrored anti-symmetric vector, u_ei is a mirrored symmetric vector and u_oj is a mirrored anti-symmetric vector. They can be written as $[\begin{array}{l} u_{e i}^{a} \\ u_{e i}^{a'} \end{array}]$ and $[\begin{array}{l} u_{o j}^{a} \\ - u_{o j}^{a'} \end{array}]$ respectively; here, $u_{e i}^{a'}$ and $u_{o j}^{a'}$ are mirrored vectors of $u_{e i}^{a}$ and $u_{o j}^{a}$ , i.e., if u^a _ei = [u_e1, u_e2,…,u_em]^T, u^a _oi = [u_o1,u_o2,…,u_om]^T, then u^a _ei = [u_em,u_em−1,…,u_e1]^T, u^a _oi = [−u_om,−u_om−1,…,−u_o1]^T. So, from the above, we can say < u_ei,u_oj > = 0.

Therefore, U_e ^T U_o = 0.

2. From conclusion 1, we have U_e ^T U_o = U_e ^T U_o λ_o = 0, so

\begin{array}{l} U_{e}^{T} C_{o} U_{o} = 0, U_{o}^{T} C_{e} U_{e} = 0, U_{e}^{T} C_{e} U_{o} = 0, \\ U_{o}^{T} C_{o} U_{e} = 0 U_{e}^{T} C_{o} = U_{o}^{T} C_{e} = C_{e} U_{o} = C_{o} U_{e} = 0 . \end{array}

\begin{array}{l} \Rightarrow Λ^{\oplus} = [\begin{array}{l} U_{e}^{T} \\ U_{o}^{T} \end{array}] C^{\oplus} [U_{e} \begin{matrix} ​ \end{matrix} U_{o}] \\ \begin{matrix} ​ & \begin{matrix} ​ \end{matrix} \begin{matrix} ​ \end{matrix} \end{matrix} = [\begin{array}{l} U_{e}^{T} \\ U_{o}^{T} \end{array}] (C_{e} + C_{o}) [U_{e} \begin{matrix} ​ \end{matrix} U_{o}] \\ \begin{matrix} ​ & \begin{matrix} ​ \end{matrix} \begin{matrix} ​ \end{matrix} \end{matrix} = [\begin{array}{l} U_{e}^{T} C_{e} U_{e} \begin{matrix} ​ \end{matrix} U_{e}^{T} C_{e} U_{o} \\ U_{o}^{T} C_{e} U_{e} \begin{matrix} ​ \end{matrix} U_{o}^{T} C_{e} U_{o} \end{array}] + [\begin{array}{l} U_{e}^{T} C_{o} U_{e} \begin{matrix} ​ \end{matrix} U_{e}^{T} C_{o} U_{o} \\ U_{o}^{T} C_{o} U_{e} \begin{matrix} ​ \end{matrix} U_{o}^{T} C_{o} U_{o} \end{array}] \\ \begin{matrix} ​ & \begin{matrix} ​ \end{matrix} \begin{matrix} ​ \end{matrix} \end{matrix} = [\begin{array}{l} Λ_{e} \begin{matrix} ​ \end{matrix} 0 \\ 0 \begin{matrix} ​ \end{matrix} \begin{matrix} ​ \end{matrix} 0 \end{array}] + [\begin{array}{l} 0 \begin{matrix} \begin{matrix} ​ \end{matrix} 0 \end{matrix} \\ 0 \begin{matrix} ​ \end{matrix} Λ_{o} \end{array}] \\ \begin{matrix} ​ & \begin{matrix} ​ \end{matrix} \begin{matrix} ​ \end{matrix} \end{matrix} = [\begin{array}{l} Λ_{e} \begin{matrix} ​ \end{matrix} 0 \\ 0 \begin{matrix} ​ \end{matrix} \begin{matrix} ​ \end{matrix} Λ_{o} \end{array}] \end{array}

(6)

That is to say, [U_e U_o] is the eigenvector matrix of the covariance matrix C^⊕. Hence,

U^{\oplus} = [U_{e} \begin{matrix} ​ \end{matrix} U_{o}] .

(7)

Theorem 3: The covariance matrices C_e and C_o, are defined as w×w matrices. Suppose w is even, then C_e and C_o can be written as $[\begin{array}{l} \begin{matrix} B & \begin{matrix} & \end{matrix} B M \end{matrix} \\ \begin{matrix} M B & M B M \end{matrix} \end{array}]$ and $[\begin{array}{l} \begin{matrix} D & \begin{matrix} & \end{matrix} D M \end{matrix} \\ \begin{matrix} - M D & M D M \end{matrix} \end{array}]$ , respectively. M denotes the mirror transform matrix as defined above. Furthermore, U_e and U_o can be obtained just by PCA on the matrix B and D, i.e., the quarter of the matrix C_e and C_o, respectively.

Proof: When the w is even, and X_ei = (X_i + X_mi)/2, the even training sample X _ei can be written as $[{\begin{matrix} X_{e i}^{a} & X \end{matrix}}_{e i}^{a'}]$ . In addition, we have supposed the mean even and odd sample matrices are 0; the mean sample matrix of all the training samples is also 0. Therefore:

\begin{array}{l} C_{e} = \frac{1}{N} \sum_{i = 1}^{N} (X_{e i}^{T} X_{e i}) \\ {\begin{matrix} ​ \end{matrix}}^{​} ​^{​} = \frac{1}{N} \sum_{i = 1}^{N} ({[{\begin{matrix} X_{e i}^{a} & X \end{matrix}}_{e i}^{a'}]}^{T} [{\begin{matrix} X_{e i}^{a} & X \end{matrix}}_{e i}^{a'}]) \\ {\begin{matrix} ​ \end{matrix}}^{​} ​^{​} = \frac{1}{N} \sum_{i = 1}^{N} [\begin{array}{l} \begin{matrix} X_{e i}^{a T} X_{e i}^{a} & X_{e i}^{a T} X_{e i}^{a} M \end{matrix} \\ \begin{matrix} {(X_{e i}^{a} M)}^{T} X_{e i}^{a} & {(X_{e i}^{a} M)}^{T} X_{e i}^{a} M \end{matrix} \end{array}] \\ {\begin{matrix} ​ \end{matrix}}^{​} ​^{​} = \frac{1}{N} \sum_{i = 1}^{N} [\begin{array}{l} \begin{matrix} X_{e i}^{a T} X_{e i}^{a} & X_{e i}^{a T} X_{e i}^{a} M \end{matrix} \\ \begin{matrix} M^{T} X_{e i}^{a T} X_{e i}^{a} & M^{T} X_{e i}^{a T} X_{e i}^{a} M \end{matrix} \end{array}] \\ {\begin{matrix} ​ \end{matrix}}^{​} ​^{​} = \frac{1}{N} \sum_{i = 1}^{N} [\begin{array}{l} \begin{matrix} X_{e i}^{a T} X_{e i}^{a} & X_{e i}^{a T} X_{e i}^{a} M \end{matrix} \\ \begin{matrix} M X_{e i}^{a T} X_{e i}^{a} & M X_{e i}^{a T} X_{e i}^{a} M \end{matrix} \end{array}] \\ {\begin{matrix} ​ \end{matrix}}^{​} ​^{​} = [\begin{array}{l} \begin{matrix} B & \begin{matrix} ​ & ​ \end{matrix} B M \end{matrix} \\ \begin{matrix} M B & M B M \end{matrix} \end{array}] \end{array}

(8)

For X_oi = (X_i − X_mi)/2, the odd training sample X _oi can be written as $[{\begin{matrix} X_{o i}^{a} & - X \end{matrix}}_{o i}^{a'}]$ . Therefore::

\begin{array}{l} C_{o} = \sum_{i = 1}^{N} (X_{o i}^{T} X_{o i}) \\ {\begin{matrix} ​ \end{matrix}}^{​} ​^{​} = \frac{1}{N} \sum_{i = 1}^{N} ({[{\begin{matrix} X_{o i}^{a} & - X \end{matrix}}_{o i}^{a'}]}^{T} [{\begin{matrix} X_{o i}^{a} & - X \end{matrix}}_{o i}^{a'}]) \\ {\begin{matrix} ​ \end{matrix}}^{​} ​^{​} = \frac{1}{N} \sum_{i = 1}^{N} ([\begin{array}{l} X_{o i}^{a T} \\ - X_{o i}^{a' T} \end{array}] [{\begin{matrix} X_{o i}^{a} & - X \end{matrix}}_{o i}^{a'}]) \\ {\begin{matrix} ​ \end{matrix}}^{​} ​^{​} = \frac{1}{N} \sum_{i = 1}^{N} [\begin{array}{l} \begin{matrix} X_{o i}^{a T} X_{o i}^{a} & - X_{o i}^{a T} X_{o i}^{a'} \end{matrix} \\ \begin{matrix} - X_{o i}^{a' T} X_{o i}^{a} & X_{e i}^{a' T} X_{e i}^{a'} \end{matrix} \end{array}] \\ {\begin{matrix} ​ \end{matrix}}^{​} ​^{​} = \frac{1}{N} \sum_{i = 1}^{N} [\begin{array}{l} \begin{matrix} X_{o i}^{a T} X_{o i}^{a} & - X_{o i}^{a T} X_{o i}^{a} M \end{matrix} \\ \begin{matrix} - M X_{o i}^{a T} X_{o i}^{a} & M X_{o i}^{a T} X_{o i}^{a} M \end{matrix} \end{array}] \\ {\begin{matrix} ​ \end{matrix}}^{​} ​^{​} = [\begin{array}{l} \begin{matrix} D & \begin{matrix} ​ & ​ \end{matrix} ​ ​ ​ ​ D M \end{matrix} \\ \begin{matrix} - M D & M D M \end{matrix} \end{array}] \end{array}

(9)

From the proofs of Theorem 2, we know u_ei can be written as $[\begin{array}{l} u_{e i}^{a} \\ u_{e i}^{a'} \end{array}]$ . Therefore:

\begin{array}{l} \begin{matrix} ​ & ​ \end{matrix} C_{e} u_{e i} = λ_{e i} u_{e i} \\ \Rightarrow [\begin{array}{l} \begin{matrix} B & \begin{matrix} ​ & ​ \end{matrix} B M \end{matrix} \\ \begin{matrix} M B & M B M \end{matrix} \end{array}] [\begin{array}{l} u_{e i}^{a} \\ u_{e i}^{a'} \end{array}] = λ_{e i} [\begin{array}{l} u_{e i}^{a} \\ u_{e i}^{a'} \end{array}] \\ \Rightarrow [\begin{array}{l} \begin{matrix} B & \begin{matrix} ​ & ​ \end{matrix} B M \end{matrix} \\ \begin{matrix} M B & M B M \end{matrix} \end{array}] [\begin{array}{l} u_{e i}^{a} \\ M u_{e i}^{a} \end{array}] = λ_{e i} [\begin{array}{l} u_{e i}^{a} \\ M u_{e i}^{a} \end{array}] \\ \Rightarrow [\begin{array}{l} B u_{e i}^{a} ​ ​ ​ B M M u_{e i}^{a} \\ M B u_{e i}^{a} ​ ​ M B M M u_{e i}^{a} \end{array}] = λ_{e i} [\begin{array}{l} u_{e i}^{a} \\ M u_{e i}^{a} \end{array}] \end{array}

Note that when MM = I (I is unit matrix), then

[\begin{array}{l} 2 B u_{e i}^{a} \\ 2 M B u_{e i}^{a} \end{array}] = [\begin{array}{l} λ_{e i} u_{e i}^{a} \\ M λ_{e i} u_{e i}^{a} \end{array}]

(10)

Therefore,

B u_{e i}^{a} = (λ_{e i} / 2) u_{e i}^{a}

(11)

So, we can obtain u_ei by solving the eigenvector u^a _ei of the matrix B.

In a similar way, u_oj can be written as $[\begin{array}{l} u_{o j}^{a} \\ - u_{o j}^{a'} \end{array}]$ , so:

\begin{array}{l} \begin{matrix} ​_{​}^{​} & ​ \end{matrix} C_{o} u_{o j} = λ_{o j} u_{o j} \\ \Rightarrow [\begin{array}{l} \begin{matrix} D & \begin{matrix} ​ & ​ \end{matrix} ​ ​ ​ ​ D M \end{matrix} \\ \begin{matrix} - M D & M D M \end{matrix} \end{array}] [\begin{array}{l} u_{o j}^{a} \\ - u_{o j}^{a'} \end{array}] = λ_{o j} [\begin{array}{l} u_{o j}^{a} \\ - u_{o j}^{a'} \end{array}] \\ \Rightarrow [\begin{array}{l} D u_{o j}^{a} ​ D M M u_{o j}^{a} \\ - (M D u_{o j}^{a} ​ M D M M u_{o j}^{a}) \end{array}] = λ_{o j} [\begin{array}{l} u_{o j}^{a} \\ - M u_{o j}^{a} \end{array}] \\ \Rightarrow [\begin{array}{l} 2 D u_{o j}^{a} \\ - 2 M D u_{o j}^{a} \end{array}] = [\begin{array}{l} λ_{o j} u_{o j}^{a} \\ - M λ_{o j} u_{o j}^{a} \end{array}] \end{array}

Therefore:

D u_{o j}^{a} = (λ_{o j} / 2) u_{o j}^{a}

(12)

We can thus obtain u_oj by solving the eigenvector u^a _oj of the matrix D. As formulated above, we can come to the conclusion that U_e and U_o are mutually orthogonal, and can be obtained just by running PCA on the matrices B and D. In particular, Theorem 1 shows that S2DPCA is equal to 2DPCA with another training set in substance. Note that the conventional S2DPCA obtain U_e and U_o by running PCA on the matrices C_e and C_o. That is to say, there is no doubt that our S2DPCA are saving time and memory space compared to the conventional S2DPCA. Based on these theories, we will investigate the time and memory space complexities of S2DPCA in the next section.

2.3 Time and Memory Space Complexities

The time and memory space complexities of 2DPCA, conventional S2DPCA and our S2DPCA are shown in Table 1. Here, w is the number of columns of face image, and L is the number of the projection vectors. Note that we only give the training time and memory space complexities; the testing time(memory space) complexities of these algorithms are the same if we suppose the projection vectors selected. In Table 1, it is very clear that the time and memory space complexities of 2DPCA[11], conventional S2DPCA and our S2DPCA, from the point of view of the computational complexity theory, are the same. The time and memory costs of our S2DPCA, however, can both be reduced to a quarter of that of conventional S2DPCA. Note that the time and memory costs of conventional S2DPCA are double those of 2DPCA, while the costs of our S2DPCA are less than 2DPCA. In fact, the computational cost and memory space cost of our S2DPCA are least among these algorithms. To make this more straightforward, the computing (CPU) time and memory costs of conventional S2DPCA and our S2DPCA on YALE will be given in Section 4.1.

Table 1.

Time and memory space complexities

Algorithm	Complexity

	Time	Memory
2DPCA	O(w² L)	O(w²)
Conventional S2DPCA	2*O(w²L)	2*O(w²)
Our S2DPCA	2*O(w²L/4)	2*O(w²/4)

2.4 Classification

A feature matrix Y_i = [y_i1 y_i2 … y_id] for each training face sample can be obtained by y _ik = X _i u_k, k = 1,2,…,d after the transformation by S2DPCA (2DPCA). Here, u_k is the eigenvector of subspace. In the same way, we can also get a feature matrix Y = [y_t1 y_t2…y_td] for each testing face sample. Then, a nearest neighbour classifier is used for classification [23].

c = \arg \min d (Y_{t}, Y_{i}) ​ ​ c \in [1, 2, \dots, N]

(13)

The distance between Y_c and Y_t is minimal, so Y_t belongs to the class Y_c belongs to. The similarity measure (i.e. distance) between any training and testing feature matrices is traditionally defined by

d (Y_{t}, Y_{i}) = \sum_{k = 1}^{d} ‖ y_{t k} - {y_{i k} ‖}_{2}

(14)

For the sake of brevity, these procedures are called S2DPCA+DM and 2DPCA+DM. Two other similarity measures are:

d (Y_{t}, Y_{i}) = \sqrt{{(Y_{t} - Y_{i})}^{T} (Y_{t} - Y_{i})}

(15)

d (Y_{t}, Y_{i}) = \sqrt{\max (m, n) - \sum_{j = 1}^{m} \sum_{k = 1}^{n} {(y_{t j}^{T} y_{i k})}^{2}}

(16)

In equation (15), we give the VM [16]. SM is defined in equation (16), m is the rank of Y_t, and n is the rank of Y_i. Note that in equation (16), Y_t and Y_i need to be orthonormal subspaces. So, the Gram-Schmidt orthogonalization should be applied first. The Gram-Schmidt orthogonalization process is numerically unstable, particularly when applied to large bases [25]. Fortunately, the command “orth” in Matlab gives a numerically stable process to produce an orthonormal subspace (matrix). Therefore, we can use equation (16) expediently. For DM every column of each sample is independent. But VM can build a high dimensional geometrical body, so it can be illustrated by the difference of corresponding elements (points) of two samples. In geometry, any geometrical body has a volume. The size of the volume can be computed to measure the similarity of the two samples. In this measure the points are not independent anymore, and the integral characteristic would be used. SM can measure the similarity between two images by subspace distance.

Similarly, for simplicity, we call these algorithms S2DPCA+VM, 2DPCA+VM, S2DPCA+SM, and 2DPCA+SM. In contrast to traditional DM, VM and SM are regarded as image measures. The steps of S2DPCA-based algorithms in our paper are depicted as follows.

Decompose each training face sample into an even symmetrical sample and an odd symmetrical one to get the even/odd symmetrical training sets.

Apply 2DPCA to the even/odd symmetrical training sets separately to get even/odd subspaces. Combine features in even and odd subspaces based on eigenvalues.

Extract low dimension features by projecting the face sample to the even/odd subspace for final classification.

Apply nearest neighbour classifier with DM, VM and SM.

3. Experimental results

Many studies (see [7], [13], [16]), have shown that the performance of 2DPCA is better than that of PCA, so we only compare S2DPCA with 2DPCA using different similarity measures on three well-known face databases, i.e., YALE [26], AR [27], and FERET face databases [28] [29]. On YALE and AR face databases, we investigate the algorithms performance when facial expressions, illumination, etc., are varied. The algorithms performance by cumulative match score (CMS) [28] is also investigated on FERET face database. The experiments are performed using Matlab 7.0 on a Pentium IV 3.0GHz processor and a total RAM memory of 1GB.

3.1 Results on YALE Database

The YALE database includes 11 different images of each of 15 individuals, in which the face images vary in different expressions and light conditions [2] [16]. All grey-scaled images were cropped and normalized to a resolution of 66×56 pixels in the experiments.

Our experiments are performed using two and three random face samples per person for training, and the remaining images for testing. Here, training samples are randomly selected in one person's samples, but the ordinal of samples for different people is the same. The average recognition rates (ARR), which are calculated over ten runs, are shown in Fig. 1 and Fig. 2.

Figure 1.

ARR on YALE database with 2 samples per person.

Figure 2.

ARR on YALE database with 3 samples per person

The corresponding standard deviations (σ) are described in Fig. 3 and Fig. 4. We illustrate the performance with the first two and three face samples per person for training (others for testing) in Fig. 5 and Fig. 6. Comparing Fig. 1 and Fig. 2 with Fig. 5 and Fig. 6, the ARRs are smooth, while the recognition rates show marked fluctuations. It can be seen that S2DPCA+VM generally outperforms others. Particularly with four to six features selected, the best ARRs can be obtained by S2DCPA+VM. The standard deviation of these algorithms is about 2% when the number of features is bigger than two. S2DPCA-based algorithms have the potential to outperform 2DPCA-based ones. S2DPCA and 2DPCA with SM could not obtain a good performance.

Figure 3.

Standard deviation on YALE database with 2 samples per person.

Figure 4.

Standard deviation on YALE database with 3 samples per person.

Figure 5.

Recognition rate on YALE database with the first 2 samples per person

Figure 6.

Recognition rate on YALE database with the first 3 samples per person

We compare the computing (CPU) time and the size of the covariance matrices for our S2DPCA and conventional S2DPCA in Table 2. The classifier based on the nearest neighbour with DM is applied (the first two face samples per person for training, others for testing). The number of features selected is nine.

Table 2.

CPU time and memory space cost

Algorithm	Average time (ms)		Memory space cost

	Training time	Testing time	Size of matrix
Conventional S2DPCA	531.0	810.0	56*56
Our S2DPCA	203.0	808.0	28*28

In Table 2, our S2DPCA takes only about 203.0ms for training, while conventional S2DPCA takes about 531.0ms. The size of matrix is reduced from 56*56 to 28*28. Here, the time cost of our S2DPCA can be reduced to about 203/531 compared to the conventional S2DPCA. Like the theory analysis mentioned above, it is not distinct (1/4), for the CPU time involves other requested operations, such as reading images, constructing sample matrices, etc. Testing data by Conventional S2DPCA and our S2DPCA takes about the same time for 135 samples, which means less than 6.0 ms per sample.

3.2 Results on AR Database

The AR database consists of over 4,000 colour face images, and the images vary in different lighting conditions, facial expressions and occlusions, etc. [27]. The images of most persons were taken in two sessions, separated by two weeks, each section consisting of 13 colour face images for one person. The face portion of the images is cropped, grey-scaled and normalized to a resolution of 66×56 pixels. The face images of 119 persons are used in the experiments. The histogram equalization is used in preprocessing. The images of one person used in our experiments are shown in Fig. 7.

Figure 7.

Sample images for one person on AR database

We select the first five images per person in every two sessions as the training and testing samples. The experiments are performed using five random face samples per person for training, and the remaining images for testing. The average recognition rates and standard deviation, calculated over ten runs, are shown in Fig. 8 and Fig. 9. S2DPCA-based algorithms have the potential to outperform 2DPCA-based ones. The standard deviations of these algorithms are about 1.6%. S2DPCA+VM generally outperform others. S2DPCA and 2DPCA with SM still could not obtain good performance.

Figure 8.

ARR on AR database with 5 samples per person

Figure 9.

Standard deviation on AR database with five samples per person person.

3.3 Results on FERET Database

For the purpose of some applications, e.g., verification from biometric signatures stored on smart cards [28], we validate S2DPCA on the FERET face database [29], which has been widely used to evaluate face recognition approaches. Details on this database and the terms used in the experiments can be found in [28]. Two FERET image sets are used in the experiments. FA is a regular frontal face library used as the gallery set, including frontal images of 1196 individuals. FB, including frontal images of 1195 individualsused as the probe set, is an alternative frontal face library, with images taken seconds after the corresponding images inFA [16]. 998 face images are selected as the training set, which includes samples from FA and FB.

Before performing the experiments, all images in the FA and FB sets are rectified using the positions of the eyes, provided by FERET, and are cropped and standardized to the 60×50 face images. To further reduce the effect of illumination, the histograms of these images are equalized for preprocessing. The recognition rate with varying number of features is given in Table 3. CMS is the performance statistics, which are shown in Fig. 10. The number of features is that where algorithms can obtain the best recognition rates, that is, the number for 2DPCA+DM is 10, the number for both 2DPCA+VM and S2DPCA+VM is 8, and for S2DPCA+DM it is 12. The performances of 2DPCA+SM and S2DPCA+SM were so poor they have not been presented in Fig. 10. As can be seen from Table 3 and Fig. 10, of these algorithms S2DPCA+VM can obtain the best recognition rates with eight features. The CMS of S2DPCA+VM generally outperform those of the other algorithms, though this difference is not marked.

Table 3.

Recognition rates (%) with varying number of features

Algorithm	Number of features

	5	6	7	8	9	10	11	12	13	14
2DPCA+DM	80.3	81.2	82.3	83.6	83.6	83.9	83.8	83.3	83.8	83.6
2DPCA+VM	85.3	85.8	87.4	88.2	86.9	87.2	86.2	86.2	85.6	85.7
2DPCA+SM	63.9	64.0	64.6	63.7	61.8	62.8	60.6	59.7	58.2	56.7
S2DPCA+DM	80.5	81.3	82.3	82.7	83.0	83.7	83.8	84.3	84.0	84.0
S2DPCA+VM	83.8	85.8	87.5	88.3	86.9	87.2	86.4	86.2	85.7	85.7
S2DPCA+SM	56.9	64.5	64.8	63.6	61.6	62.8	60.3	59.8	59.2	56.7

Figure 10.

CMS on FERET database

4. Conclusions

In this paper, we have investigated the orthogonality of the even and odd subspaces in S2DPCA in detail, and, in particular, have proved that S2DPCA can be constructedon a quarter of conventional S2DPCA' seven/odd covariance matrix. Moreover, based on these theories, we find that S2DPCA can be computed using a quarter of the time and memory costs compared to conventional S2DPCA from the point of the view of the computational complexity theory. Furthermore, two image measures, volume measure (VM) and subspace distance measure (SM), are introduced to S2DPCA-based face recognition.

S2DPCA can explore more prior-knowledge, i.e., facial symmetry, based on even and odd decompositions, as well as obtain a more precise covariance matrix in the same way as 2DPCA. Therefore, S2DPCA-based algorithms are able to achieve better performance in general. Nevertheless, theoretically, the time and memory space costs of conventional S2DPCA are doubled compared to those of 2DPCA, which is an annoying issue. We solve this problem successfully with the S2DPCA theories proved in this paper. The experimental results on YALE, AR, and FERET databases show that, generally, S2DPCA+VM outperform some other relevant algorithms. Most importantly, the computational cost and memory space needed for our S2DPCA are less compared to 2DPCA and conventional S2DPCA.

Finally, we should point out that there are still two aspects in S2DPCA worthy of further investigation. How many even and odd features should be selected in order to achieve better performance?From the experimental results, the number of features with which good performance can be obtained strongly depends on face database selection. In addition, VM is not satisfied by the triangle inequality, which means VM is not ametric. With constraints, could VM be a metric? S2DPCA+VM would be applied widely when both aspects have been worked out successfully.

Footnotes

5. Acknowledgements

This work is supported in part by the National Natural Science Foundation for Distinguished Young Scholars of China under Grant No.60905011.

References

Shan

(2004) Study on some key issues in face recognition. Ph.D. dissertation, Institute of Computing Technology Chinese Academy of Sciences, Beijing, China.

Corcoran

(2011) Reviews, Refinements and New Ideas in Face Recognition. Rijeka: InTech. 328 p.

Yang

Yuan

(2009) L1-norm-based 2DPCA. IEEE Trans. Syst. Man Cybern. B: Cybern., 40(4): 1170–1175.

Corcoran

(2011). 3D Face Recognition. Rijeka: InTech. 252 p.

Yang

(2004) Research on appearance-based statistical face recognition. Ph.D. dissertation, Dept. of Electro. Eng. Tsinghua Univ. Beijing, China.

Ahonen

Hadid

Pietikainen

(2006) Face description with local binary patterns: Application to face recognition. IEEE Trans. Pattern Anal. and Mach. Intell. 28 (12): 2037–2041.

Yang

Zhang

Frangi

A F

Yang

J Y

(2004) Two-dimensional PCA: A new approach to appearance-based face representation and recognition. IEEE Trans. Pattern Anal. and Mach. Intell. 26 (1): 131–137.

Kong

X C

Eam

K T

Wang

J. G.

Ronda

(2005) Generalized 2D principal component analysis. In Proc. IEEE Inter. Joint Conf. Neural Networks, Montreal. pp. 108–113.

Yuan

B Z

(2005) 2D-LDA: A statistical linear discriminant analysis for image matrix. Pattern Recognit. Lett. 26 (5): 527–532.

10.

Zhang

Zhou

Chen

(2006) Diagonal principal component analysis for face recognition. Pattern Recognit. 39 (1): 140–142.

11.

Niu

Yang

Shiu

Pal

(2008) Two-dimensional Laplacianfaces method for face recognition. Pattern Recognit. 41(10):3237–3243.

12.

Gao

Zhang

Yang

(2007) Comments on “On image matrix based feature extraction algorithms”. IEEE Trans. Syst., Man, Cybern. B: Cybern. 37(5): 1373–1374.

13.

Zhang

(2010) An Extension of Principal Component Analysis. In: Oravec

, editor. Face Recognition. Rijeka: In Tech. pp. 21–34.

14.

Yan

Yang

Tang

Zhang

(2007) Multilinear discriminant analysis for face recognition. IEEE Trans. Image Processing. 16(1): 212–220.

15.

Zuo

W M

Zhang

Wang

(2006) Bidirectional PCA with assembled matrix distance metric for image recognition. IEEE Trans. Syst. Man Cybern. B: Cybern. 36(4): 863–871.

16.

Meng

Zhang

(2007) Volume measure in 2DPCA-based Face Recognition. Pattern Recognit. Lett. 28(10): 1203–1208.

17.

Wang

Feng

(2006) Subspace distance analysis with application to adaptive Bayesian algorithm for face recognition. Pattern Recognit.39(3):456–464.

18.

Wang

Feng

(2007) Further results on the subspace distance. Pattern Recognit. 40(1):328–329.

19.

Yang

Ding

(2002) Symmetrical PCA in face recognition. In Proc. IEEE Int'l Conf. on Image Processing, New York, USA. pp. 97–100.

20.

Kirby

Sirovich

(1990) Application of the Karhunen-Loève procedure for the characterization of human faces. IEEE Trans. Pattern Anal. and Mach. Intell. 12 (1):103–108.

21.

Zhang

(2007) Kernel based symmetrical principal component analysis for face classification. Neurocomputing.70(4–6):904–911.

22.

Ding

Lin

Tong

(2007) Symmetry based two-dimensional principal component analysis for face recognition. The 4th International Symposium on Neural Networks, Nan Jing. pp. 1048–1055.

23.

Meng

Zhang

(2008) S2DPCA with DM and FM in Face Recognition. In: IEEE International Conference on Apperceiving Computing and Intelligence Analysis (ICACIA'08), Chengdu, China. pp. 365–368.

24.

Scholkopf

Smola

A J

Muller

K R

(1998) Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation.10(5): 1299–1319.

25.

Aster

R C

Thurber

C H

Borchers

(2005) Parameter Estimation and Inverse Problems, Holland: Elsevier Academic Press.

26.

Yale University (1997) The Yale Face Database. Available:http://cvc.yale.edu/projects/yalefaces/yalefaces.html.

27.

Martinez

A M

Benavente

(1998) The AR Face Database. CVC Technical Report, no.24.

28.

Phillips

P J

Moon

Rizvi

S A

Rauss

P J

(2000) The FERET evaluation methodology for face- recognition algorithms. IEEE Trans. Pattern Anal. and Mach. Intell.22(10):1090–1104.

29.

Defense Advanced Research Products Agency (DARPA) (2004). The Facial Recognition Technology (FERET) Database. Available: http://www.itl.nist.gov/iad/humaid/feret/feret_master.html.