Face Recognition and Gender Classification Using Orthogonal Nearest Neighbour Feature Line Embedding

Abstract

In this paper, a novel manifold learning algorithm for face recognition and gender classification – orthogonal nearest neighbour feature line embedding (ONNFLE) – is proposed. Three of the drawbacks of the nearest feature space embedding (NFSE) method are solved: the extrapolation/interpolation error, high computational load and non-orthogonal eigenvector problems. The extrapolation error occurs if the distance from a specified point to one line is small when that line passes through two farther points. The scatter matrix generated by the invalid discriminant vectors does not efficiently preserve the locally topological structure – incorrect selection reduces recognition. To remedy this, the nearest neighbour (NN) selection strategy was used in the proposed method. In addition, the high computational load was reduced using a selection strategy. The last problem involved solving the non-orthogonal eigenvectors found with the NFSE algorithm. The proposed algorithm generated orthogonal bases possessing more discriminating power. Experiments were conducted to demonstrate the effectiveness of the proposed algorithm.

Keywords

Nearest Feature Line Face Recognition Gender Classification Extrapolation Orthogonal Basis

1. Introduction

Face recognition (FR) and gender classification (GC) are widely used in many commercial systems for monitoring pedestrians who watch advertisements on TV walls or else for monitoring the behaviour of buyers at vending machines. The ID and gender of pedestrians is automatically identified for further application. Though FR and GC are two different applications, gender classification could be considered to be a two-class classification problem – a special case of FR. Three approaches – classifier design, image enhancement and feature discriminative analysis – are proposed to achieve better performance.

The first step is to design a good classifier. Neural network-based classifiers are frequently used in FR. A principle component neural network with a self-configurable systolic architecture to automatically update eigenfaces whenever training faces change was proposed in [1]. Fuzzy measures and fuzzy logic improved the aggregation operator in neural networks [2]. The modular neural network outputs are integrated and performance is improved. Similarly, a radial basis function enhances the neural network's performance with an incremental learning mechanism [3]. The re-training process is unnecessary because new data is locally selected and updated by a regressor. From the results of [3], a higher average accuracy and a much lower computational load were achieved. A neuro-fuzzy quantification method [4] is proposed to describe the Iyashi of human faces, increasing accuracy by 2–8%. An AdaBoost-based classifier is another popular classifier through the boosting method. Weak classifiers are integrated to construct a strong classifier and several strong classifiers are cascaded to enhance performance for gender classification [5 -7]. Besides this, certain tricks were also utilized for the improvement of efficiency. For example, a look-up-table is used in [6] and a 93% identification rate was achieved by pixel comparisons in [7].

Second, images are enhanced for the improvement of recognition by using image processing methods. A set of weighted colour-component features is found from various colour spaces and integrated by a boosting selection algorithm for face recognition [8]. Local colour vector binary patterns (LCVBPs) [9] consisting of colour-norm patterns are extracted from the norm values of a pixel and its four neighbours. In addition, colour angular patterns are also extracted from the angles between the vectors of colour features. 3-D face features are extracted from the shape-based method [10] against the variations of face expressions, illumination and poses.

Third, eigenspace projection methods are successfully used for feature extraction. Recently, manifold learning algorithms have attracted much attention because samples are always presented in a manifold structure in high dimensional feature spaces [11]. By preserving non-linear local structures, the discriminative power of manifold learning-based classifiers is greatly increased when compared to conventional global Euclidean structure preserving methods, such as principle component analysis (PCA) [12] and linear discriminative analysis (LDA) [13]. A third-order tensor colour representation and a tensor discriminant colour space (TDCS) model [14] preserve the spatial structure of colour images. A colour space transformation matrix is obtained to maximize Fisher's criterion. Images are convoluted by a complete set of bases – e.g., blur kernels – to construct a new subspace against blur variance [15]. A view-manifold-based TensorFace (V-TensorFace) and a kernelized TensorFace (K-TensorFace) [16] generate a continuous view manifold structure of unseen views in the tensor-based analysis. Other, remaining, problems with FR include occlusion and poor image quality. Outliers caused by occlusion and illumination variations are solved by kernel discriminant analysis [17] and the spectral-graph-based method [18]. In addition, low-resolution images are enhanced and identified in video surveillance. The non-linear mapping from low-resolution images to high-resolution images is constructed with the canonical correlation analysis and radial basis functions [19]. For gender classification, another PCA variant, called principle geodesic analysis (PGA), was proposed. Sample data with a manifold structure in Euclidean spaces was transformed into data in Riemannian spaces by using exponential log maps [20]. Furthermore, the linear discriminant techniques for gender classification have been reviewed in [21]. When the training samples are sufficient, the SVM-based classifier outperforms the others. On the other hand, if the training samples were only a few, the linear approaches achieved better results. In addition, the accuracy rates of gender classification are significantly improved by automatic face alignment [22].

In this paper, we focus on the study of feature discriminative analysis. An algorithm – orthogonal nearest neighbour feature line embedding (ONNFLE) – was modified based on previous work (e.g., nearest feature line embedding (NFLE) [23]). With the NFLE method, two problems – extrapolation/interpolation error and high computational load – were remedied during the computation of the class scatters. Those discriminant vectors which were only a small distance from a specified point on the feature lines were chosen for scatter computation. An extrapolation error occurs when the distance is small while two prototypes are far away from the specified point. At the same time, the near neighbours are not chosen due to the large distances involved. This selection strategy violates the rule of local structure preservation. To remedy this problem, the ONNFLE algorithm modified the selection rule. A point-to-line (P2L) adjacency matrix was constructed to preserve the local topology structure. Three characteristics – NFL-based measurement, neighbourhood structure preservation and class separability – were presented in ONNFLE. In addition, orthogonal projection bases were obtained in ONNFLE.

The rest of this paper is organized as follows: In Section 2, eigenspace projection methods, LPP, modified NFL and NFLE, are briefly reviewed. The algorithm ONNFLE is presented to obtain the orthogonal projection matrix in Section 3. In Section 4, two experiments – face recognition and gender classification – are presented to show the effectiveness of the proposed method. Finally, conclusions are given in Section 5.

2. Eigenspace projection methods

Given N training samples x₁, x ₂…x_N ∈ R^d consisting of C classes, new samples in a low-dimensional space were obtained by the linear projection y _i = W ^T x _i , where W is a linear projection matrix that needed to be found.

2.1. Locally preserving projection algorithm

Locally preserving projection (LPP) [11] is a popular and effective manifold learning algorithm for preserving the manifold structure of samples. The transformation matrix W* is obtained by solving the following minimization problem:

\begin{array}{l} W^{*} = \arg \min_{W} \sum_{i \neq j} {‖ y_{i} - y_{j} ‖}^{2} S_{i, j} \\ = \arg \min_{W} \sum_{i \neq j} {‖ W^{T} x_{i} - W^{T} x_{j} ‖}^{2} S_{i, j} \\ = \arg \min_{W} tr (W^{T} X L X^{T} W) . \end{array}

(1)

The matrix L = D – S is the Laplacian matrix in which, X = [x₁, x₂ … x _N , D _ii = Σ _j S _i,j and S is the similarity matrix. The transform matrix W* is given for the eigenvectors with the smallest corresponding eigenvalues by solving the general eigenvalue problem: XLX ^T w = λXDX ^T w This minimization ensures that two ‘close’ samples in a high resolution space are close enough in a low resolution space. According to the consequences presented in [24], the minimization form in the LPP algorithm has the same form as in the Locally Linearly Embedding (LLE) algorithm below [25]:

\begin{array}{l} W^{*} = \arg \min_{W} tr (W^{T} X L X^{T} W) \\ = \arg \underset{W}{\min tr} (Y (D - S) Y^{T}) \\ = \arg \min tr (Y {(I - M)}^{T} (I - M) Y^{T}) \\ = \arg \min \sum_{i} {‖ y_{i} - \sum_{j} M_{i, j} y_{j} ‖}^{2}, \end{array}

(2)

in which Y = [y₁ y₂…y _N ,], S _i,j =(M + M ^T – M ^T M) _i,j and I is an identity matrix. In [24], the optimal weights in M were obtained by solving a least-squares problem with two constraints: (1) Σ _j M_i,j = 1, and (2) M _i,j = 0, if y _j does not belong to the neighbours of y _i .

2.2. Modified nearest feature line method

Li showed the discriminative power of nearest linear combination (NLC) in face classification [26]. Generally, the NLC algorithm generates virtual prototypes for matching using straight lines passing through prototype pairs. The capacity of prototype representation is increased because of infinite virtual prototypes. However, two problems – extrapolation and interpolation errors – exist for the NLC algorithm, as shown in Fig. 1. Consider two feature line points L_2,3 and L₄₅ generated from two prototype pairs (x₂, x₃) and (x₄, x₅), respectively. Points f_2,3(x₁) and f_4,5(x₁) are two projection points of lines L_2,3 and L_4,5 for a query point x₁. From Fig. 1, it is clear that point x₁ is close to points x₂ and x₃, but far away from points x₄ and x₅. However, the distance ‖x₁ – f₄,₅(x₁)‖ for line L_4,5 is smaller than that for line L_2,3 – i.e., ‖x₁ – f_2,3(x₁)‖. The discriminant vector for line L_4,5 to point x₁ was selected rather than the other one. In addition, a great deal of computational time was needed due to the vast number of feature lines in the classification phase-i.e., C₂^N–1 possible lines.

Figure 1.

(a) An extrapolation error and (b) an interpolation error.

A local nearest neighbour classifier (LNNC) [27] was designed to solve the extrapolation inaccuracy problem of the NLC algorithm. The feature lines passing through the nearest two prototypes of a probe were chosen for matching. Both the extrapolation inaccuracy and high computation problems were mitigated. However, the proposed method decreased the classification ability of the NLC-based classifier. A rectified nearest feature line segment (RNFLS) [28] algorithm – another modified version – was proposed to remedy both extrapolation and interpolation inaccuracy. Line segments trespassing on the territory of different classes were removed. The specified subspace for a class was constructed from the satisfied line segments. However, RNFLS requires high computation because a vast number of segments must be checked. The NLC, LNNC and RNFLS algorithms check the possible feature lines in the classification phase, which is an impractical strategy for many systems. In the NFLE algorithm [23], the NLC strategy is performed in the training phase rather than in the classification phase. The point-to-line distance metric is embedded into the transformation matrix. As shown in [23], not only is the matching time reduced but so too is classification performance improved.

2.3. Nearest feature line embedding

The NFLE algorithm is a new linear transformation method for face recognition. The point-to-line (p-2-l) strategy originated from the NLC approach [26]. The class scatter using the p-2-l strategy is represented as a Laplacian matrix, defined as follows:

\begin{array}{l} W^{*} = \arg \min_{W} \sum_{i} \sum_{m \neq n} {‖ y_{i} - f_{m, n} (y_{i}) ‖}^{2} w_{m, n} (y_{i}) = \\ \arg \min_{W} tr (W^{T} X L X^{T} W) . \end{array}

(3)

Here, point f(y _i ) is a projection point on line L_m,n for point y _i and weight w_m,n(y _i ) represents the connectivity strength for point y _i and line L_m,n. The projection point f_m,n(y _i ) is represented as a linear combination with f_m,n points y _m and y_n : f_m,n(y _i ) = y_m + t_m,n(y _n – y _m ) in whih i ≠ m ≠ n and t_m,n = (y _i – y _m ) ^T (y _m – y _n )/(y _m – y _n ) ^T (y _m – y _n ). The discriminant vector is represented by a new matrix notation using simple algebra operations: $y_{i} - f_{m, n} (y_{i}) = y_{i} - \sum_{j} M_{i, j} y_{j}$ . According to the consequences of [24], the class scatter in Eq. (3) was represented as a Laplacian matrix. Furthermore, the matrices of the within-class scatter S _W and the between-class scatter S _B were calculated to maximize Fisher's criterion. For more details, see [23]. Since the NLC measure increases the representational capacity of the prototypes, the algorithm NFLE using the NLC measure also increases the discriminative power in the feature spaces. NFLE preserves, therefore, much more information than LPP. In addition, the class scatter in NFLE is represented as a Laplacian matrix. Consequently, the topological locality in original spaces is preserved in low-dimension feature spaces.

Though the p-2-l strategy was successfully adopted in the training phase instead of the classification phase for the NFL-based classifier, some drawbacks remained and limited its performance. Three problems are: (1) extrapolation/interpolation inaccuracy: NFSE may not preserve the locality precisely when prototypes are far away from the probes. Thus, the discriminant vector from point x₁ to feature line L_4,5 was selected instead of that from point x₁ to line L_2,3, as shown in Fig. 1; (2) high computation complexity: a large number of feature lines are generated when the number of training samples N is large; and (3) non-orthogonal bases: non-orthogonal eigenvectors are generated from the NFLE algorithm, which does not reconstruct the original images and the intrinsic structure properly.

3. Orthogonal nearest neighbour feature line embedding

In this section, an algorithm – Orthogonal Nearest Neighbour Feature Line Embedding (ONNFLE) – is proposed. Three problems – extrapolation/interpolation inaccuracy, high computational complexity and non-orthogonal bases – are all considered as having more discriminant power when obtaining the transformation.

3.1 Nearest neighbour feature line embedding (NNFLE)

In order to overcome the extrapolation and interpolation inaccuracy, the feature lines for a query point were generated from the K nearest neighbourhood prototypes. More specifically, when two points x _m and x _n belonged to the nearest neighbours of a query point x _i , a straight line passed through points x _m and x _n and was called the nearest neighbour feature line (NNFL).

The discriminant vector x_i – f_m,n(x _i ) was chosen for the scatter computation. The selection strategy for discriminant vectors in NNFLE was designed as follows: (1)

The within-class scatter S _W : feature lines were generated from the k₁ nearest neighbours within the same class for the computation of the within-class scatter, i.e., a set F_k₁⁺(x _i ).

(2)

The between-class scatter S _B : similarly, k₂ nearest neighbours in different classes for a specified point x _i were selected to generate the feature lines and calculate the scatter S _B , i.e., a set F_k₂⁻(x _i ).

S_{W} = \sum_{p = 1}^{C} (\sum_{i = 1}^{N_{p}} ​ \sum_{\begin{matrix} x_{i} \in C_{p} \\ f \in F_{k_{1}}^{+} (x_{i}) \end{matrix}} ​ (x_{i} - f (x_{i})) {(x_{i} - f (x_{i}))}^{T}),

(4)

And

S_{B} = \sum_{p = 1}^{C} (\sum_{i = 1}^{N_{p}} ​ \sum_{\begin{matrix} x_{i} \in C_{p} \\ f \in F_{k_{2}}^{-} (x_{i}) \end{matrix}} ​ (x_{i} - f (x_{i})) {(x_{i} - f (x_{i}))}^{T}) .

(5)

The proposed algorithm is a simple and effective method for alleviating extrapolation and interpolation errors. In addition, the scatter matrices were also represented as a Laplacian matrix. The complexity of NNFL was more efficient than that of NFL. Consider N training samples, C₂^N–1 possible feature lines generated and a distance of C₂^N–1 calculated for a specified point. The K₁ nearest feature lines were chosen from all possible lines in order to calculate the class scatter. The time complexity was O(N²) for line generation and O(2N² log N) for distance sorting. At the same time, when the k nearest prototypes were chosen for line generation, where k « N, the time complexity for selecting the K₁ nearest feature lines was O(k²) + O(2k² logk). Extra overhead O(N logN) was needed for finding the k nearest prototypes. When N was large, the traditional method needed more time to calculate the class scatter.

3.2 Orthogonal nearest neighbour feature line embedding (ONNFLE)

According to the consequences of [29], non-orthogonal basis functions make it difficult to reconstruct the original data. In addition, orthogonal basis functions have more locality preserving power than non-orthogonal ones. The proposed ONNFLE method was modified from the NFLE approach in which the within-class scatter S _W and the between-class scatter S _B were calculated to maximize Fisher's criterion S _B /S _W . ONNFLE is expected to have more discriminating power than NFLE; in other words, ONNFLE tries to find a transformation matrix W* which is composed of m orthogonal basis vectors W* = [w₁,…,w _m ] and satisfies the following constraints:

\begin{array}{l} W^{*} = \arg \max_{W} \frac{W^{T} S_{B} W}{W^{T} S_{W} W}, \\ subject to \\ w_{i}^{T} w_{j} = 0, i \neq j, i, j = 1, 2, \dots, m . \end{array}

(6)

The derivation of orthogonal bases is very similar to that in [29]. The optimization in Eq. (6) is equivalent to maximizing W ^T S _B W with an additional constraint W ^T S _W W = 1. Generally, this constraint is a positive definite matrix. The ratio of W ^T S _B W and W ^T S _W W in Eq. (4) remained unchanged when W ^T S _W W was normalized to one. The following generalized eigenvalue problem was solved:

S_{B} W = λ S_{W} W

(7)

Vector w₁ was the first eigenvector with respect to the largest eigenvalue of the matrix (S _W )⁻¹ S _B . When the first m – 1 orthogonal eigenvectors were generated, the m^th eigenvector was obtained by maximizing the following objective function:

\begin{array}{l} f (w_{m}) = \frac{w_{m}^{T} S_{B} w_{m}}{w_{m}^{T} S_{W} w_{m}} \\ with the constraints \\ w_{m}^{T} w_{1} = w_{m}^{T} w_{2} = \dots = w_{m}^{T} w_{m-1} = 0, and w_{m}^{T} S_{W} w_{m} = 1 . ​ \end{array}

(8)

Eq. (8) was formulated using the Lagrange multipliers to include all constraints:

\begin{array}{l} L_{m} = w_{m}^{T} S_{B} w_{m} - λ (w_{m}^{T} S_{W} w_{m} - 1) - \\ - α_{1} w_{m}^{T} w_{1} - \dots - α_{m - 1} w_{m}^{T} w_{m - 1} \end{array}

(9)

The parameters α₁,α₂,…,α_m–1, λ and w _m were derived as follows: first, the partial derivative of L_m with respect to w _m was set as zero, i.e., $\frac{\partial L_{m}}{\partial w_{m}} = 0$

2 S_{B} w_{m} - 2 λ S_{W} w_{m} - α_{1} w_{1} - \dots - α_{m - 1} w_{m - 1} = 0

(10)

Next, Eq. (11) was obtained by multiplying the left side of Eq. (10) with a term w ^T _m and the orthogonal basis constraints w ^T _m w₁ = w ^T _m w₂ = ··· = w ^T _m w_m–1 = 0 were required. Parameter λ was calculated from the following equation:

2 w_{m}^{T} S_{B} w_{m} - 2 λ w_{m}^{T} S_{W} w_{m} = 0 \Rightarrow λ = \frac{w_{m}^{T} S_{B} w_{m}}{w_{m}^{T} S_{W} w_{m}}

(11)

m – 1 equations were obtained next by successively multiplying the left side of Eq. (10) with m – 1 terms w ^T ₁S _W ⁻¹, …, w ^T _m–1S _W ⁻¹, as shown in Eq. (12). These m – 1 equations were represented in a single matrix form to obtain the parameters α₁,α₂,…,α_m–1 from Eq. (13).

\begin{array}{l} α_{1} w_{1}^{T} {(S_{W})}^{- 1} w_{1} + \dots + α_{m - 1} w_{1}^{T} {(S_{W})}^{- 1} w_{m - 1} = 2 w_{1}^{T} {(S_{W})}^{- 1} S_{B} w_{m} \\ α_{1} w_{2}^{T} {(S_{W})}^{- 1} w_{1} + \dots + α_{m - 1} w_{2}^{T} {(S_{W})}^{- 1} w_{m - 1} = 2 w_{2}^{T} {(S_{W})}^{- 1} S_{B} w_{m} \\ ⋮ \\ α_{1} w_{m - 1}^{T} {(S_{W})}^{- 1} w_{1} + \dots + α_{m - 1} w_{m - 1}^{T} {(S_{W})}^{- 1} w_{m - 1} = 2 w_{m - 1}^{T} {(S_{W})}^{- 1} S_{B} w_{m} \end{array}

(12)

\begin{array}{l} U^{(m - 1)} α^{(m - 1)} = 2 {[W^{(m - 1)}]}^{T} {(S_{W})}^{- 1} S_{B} w_{m} \Rightarrow α^{(m - 1)} = \\ 2 {(U^{(m - 1)})}^{- 1} {[W^{(m - 1)}]}^{T} {(S_{W})}^{- 1} S_{B} w_{m} . \end{array}

(13)

Here,

\begin{array}{l} α^{(m - 1)} = {[α_{1}, \dots, α_{m - 1}]}^{T}, u^{(m - 1)} (i, j) = w_{i}^{T} {(S_{W})}^{- 1} w_{j}, \\ W^{(m - 1)} = [w_{1}, \dots, w_{m - 1}], and U^{(m - 1)} = [u^{(m - 1)} (i, j)] \\ = {[W^{(m - 1)}]}^{T} {(S_{W})}^{- 1} W^{(m - 1)} \end{array}

Third, the left side of Eq. (10) is multiplied by a term (S _W )⁻¹ to derive the eigenvector w _m . Similarly, it is represented in matrix notation as:

2 {(S_{W})}^{- 1} S_{B} w_{m} - 2 λ w_{m} - {(S_{W})}^{- 1} W^{(m - 1)} α^{(m - 1)} = 0

(14)

Parameter α^(m–1) was substituted into Eq. (14) as follows:

\begin{array}{l} {I - {(S_{W})}^{- 1} W^{(m - 1)} {(U^{(m-1)})}^{- 1} {(W^{(m - 1)})}^{T}} {(S_{W})}^{- 1} S_{B} w_{m} = \\ λ w_{m} \Rightarrow R_{m} w_{m} = λ w_{m} \end{array}

(15)

From Eq. (15), when the parameter λ in Eq. (11) was maximized, the eigenvector w _m was obtained from the eigenvector of matrix R _m with the largest eigenvalue. Finally, the orthogonal projection bases W = [w₁,…,w _m ] were repeatedly generated using the above process.

4. Experimental results

In this section, the experimental results of face recognition and gender classification are presented to show the effectiveness of the proposed NNFLE and ONNFLE methods. Besides this, they are compared with four state-of-the-art methods. The algorithms were evaluated using three public face databases: the dataset CMU [30] for face recognition and the datasets XM2VTS [31] and Max-Planck [32][33] for gender classification. The performance was also evaluated according to the ROC curves for face recognition and the accuracy rates for gender classification.

4.1. Face recognition

The CMU [30] database was composed of 68 people and 170 images per individual with PIE variations for evaluation. Four state-of-the-art methods – LPP [11], orthogonal LPP [29], NFLE [23] and orthogonal NFLE – were implemented for comparison. In the experiments, several images were randomly selected from the data sets for training while the others were used for testing. The face-only images (sized 32-by-32 pixels) were cropped from the original ones to eliminate the influence of hair and background. In addition, the projection matrix W* was constructed from the eigenvectors of S _W ⁻¹ S _B with the largest corresponding eigenvalues when Fisher's criterion was maximized. Since the dimension of a feature space is sometime larger than the number of training samples, the matrix S _W is singular and non-invertible. This is the well-known “small-sample-size” (S3) problem. The dimensionality of feature vectors was reduced by the PCA transformation to avoid the small-sample-size problem. More than 99% of the feature information was kept in the PCA process. After the PCA transformation, the optimal projection transformations were obtained for the four state-of-the-art algorithms and the proposed ONNFLE and NNFLE methods. All of the testing samples were matched with the trained prototypes using the NN matching rule. The algorithms were run ten times so as to obtain the average rates. The highest rates of the implemented algorithms for the various training samples are tabulated in Table 1. Moreover, the ROC curves of the recognition rates versus the reduced dimensions for the database CMU are shown in Figs. 2 to 5. From Table 1, the proposed NNFLE outperformed the other two algorithms for 6, 7 and 8 training samples. This implies that much more local information is preserved in the proposed method.

Table 1.

The recognition performance on the CMU database. (%)

Method	5 Trains	6 Trains	7 Trains	8 Trains
LPP	66.94 (65)	68.72 (55)	72.94 (60)	78.13 (55)
OLPP	71.55 (70)	73.79 (60)	77.87 (70)	79.83 (70)
NFLE	70.80 (70)	72.36 (60)	76.77 (45)	77.17 (50)
ONFLE	78.75 (70)	79.41 (65)	82.51 (70)	81.39 (70)
NNFLE	69.69 (65)	73.17 (50)	77.75 (50)	82.12 (55)
ONNFLE	78.02 (70)	80.41 (70)	84.08 (70)	86.20 (70)

Figure 2.

The recognition rates of various algorithms for various training samples.

4.2 Gender classification

For gender classification, two face datasets were used for evaluation. The dataset XM2VTS [31] was composed of 295 people (viz., 153 males and 142 females) and 12 images per individual with various expressions. As with the experiments for face recognition, the four state-of-the-art methods were implemented for comparison. In the experimental configurations, the data set was separated into two parts: Tn for training and Pm for testing. Sixty images (30 males and 30 females) and eighty images (40 males and 40 females) were randomly selected from the data set for training. The other 3480 images (1806 males and 1674 females) and 3460 images (1796 males and 1664 females) were used for testing. The NN matching strategy was adopted for matching the testing samples. The average rates were obtained by running the algorithms ten times. The highest recognition rates and their corresponding reduced dimensions for various algorithms are tabulated in Table 2. The proposed ONNFLE method outperformed the other algorithms both in the 60 and the 80 training samples. The NN strategy preserved more of the local topological structure than the other algorithms. In addition, the orthogonal bases reconstructed the images better than the non-orthogonal bases.

Table 2.

The performance of gender classification on the database XM2VTS. (%)

Methods	Tn =60, Pm=3480	Tn =80, Pm=3460
LPP	72.53 (16)	79.57 (20)
OLPP	79.46 (15)	81.56 (19)
NFLE	72.68 (11)	79.88 (19)
ONFLE	80.01 (12)	81.86 (18)
NNFLE	73.31 (16)	82.73 (15)
ONNFLE	82.53 (10)	83.34 (15)

Secondly, the data set Max-Planck [32][33] was composed from the data for 200 human face images generated from laser scanning heads without any hair. 100 male and 100 female face images were used for gender classification. The recognition rates were directly compared with the results in [20] which were generated from seven algorithms: Weighted PGA, Supervised Weighted PGA, Supervised PGA, Standard PGA, Real AdaBoost, Gentle AdaBoost and Modest AdaBoost. The frontal images (sized 142-by-124 pixels) were cropped from the original ones in [20]. 160 samples and the remaining 40 samples were used for training and testing, respectively. The classification results using the seven algorithms in [20] are listed in Table 3. In our experiments, smaller images (sized 43-by-37) pixels were cropped to illustrate the robustness of the proposed method. In addition, it was evaluated by fewer training samples (100 images) and more testing samples (100 images). The algorithm was run ten times so as to obtain the average rates. Due to the S3 problem, PCA projection was first executed for dimension reduction in the proposed method. The experimental results tabulated in Table 3 show the robustness of the proposed method. The reduced dimensions are listed in the parentheses in Table 3. More restricted conditions (i.e., fewer training samples and more testing samples with a smaller image size) were adopted in this experiment. Table 3 shows that the proposed ONNFLE method outperformed the other algorithms in more restricted conditions.

Table 3.

The gender classification performance on the Max-Planck database.

Methods	Highest accuracy rates (%)	Configuration
Weighted PGA	91.75 (30)	142-by-124 pixels 160 training samples 40 testing samples [20]
Supervised Weighted PGA	97.00 (10)
Supervised PGA	91.25 (10)
Standard PGA	91.25 (20)
Real AdaBoost	96.25
Gentle AdaBoost	95.75
Modest AdaBoost	94.50
PCA+NNFLE	95.75 (10)	43-by-37 pixels 100 training samples 100 testing samples
PCA+ONNFLE	97.00 (10)	43-by-37 pixels 100 training samples 100 testing samples

5. Conclusions

In this study, a nearest neighbour selection strategy was adopted through the ONNFLE method to alleviate extrapolation and interpolation inaccuracies. In addition, high computational complexity was reduced by using the NN selection strategy. The projection transformation was obtained from orthogonal bases with more discriminating powers in the feature spaces. According to the experimental results of both face recognition and gender classification, the ONNFLE algorithm outperformed the other state-of-the-art methods.

References

Sudha

Mohan

A. R.

and Meher

P. K.

(2011) A self-configurable systolic architecture for face recognition system based principal component neural network. IEEE Transactions on Circuits and Systems for Video Technology. vol. 21, no. 7: 1071–1084.

Melin

Mendoza

and Castillo

(2011) Face recognition with an improved interval type-2 fuzzy logic sugeno integral and modular neural networks. IEEE Transactions on Systems, Man and Cybernetics-Part: A: Systems and Humans. vol. 41, no. 5: 1001–1012.

Wong

Y. W.

Seng

K. P.

and Ang

L. M.

(2011) Radial basis function neural network with incremental learning for face recognition. IEEE Transactions on Systems, Man and Cybernetics-Part: B: Cybernetics. vol. 41, no. 4: 940–949.

Diago

Kitaoka

Hagiwara

and Kambayashi

(2011) Neuro-fuzzy quantification of personal perceptions of facial images based on a limited data set. IEEE Transactions on Neural Networks. vol. 22, no. 12: 2422–2434.

Shakhnarovich

Viola

and Moghaddam

(2002) A unified learning framework for real time face detection and classification. Proceedings of 2002 IEEE International Conference on Automatic Face and Gesture Recognition: 14–21.

and Huang

(2003) Lut-based AdaBoost for gender classification. Proceedings of 2003 International Conference on Audio and Video-Based Biometric Person Authentication: 104–110.

Baluja

Rowley

and Inc

(2007) Boosting sex identification performance. International Journal of Computer Vision. vol. 71, no. 1: 111–119.

Choi

Y. M.

and Plataniotis

K. N.

(2011) Boosting color feature selection for color face recognition. IEEE Transactions on Image Processing. vol. 20, no. 5: 1425–1434.

Lee

S. H.

Choi

J. Y.

and Plataniotis

K. N.

(2012) Local color vector binary patterns from multichannel face images for face recognition. IEEE Transactions on Image Processing. vol. 21, no. 4: 2347–2353.

10.

H. R.

Horadam

and Qiu

(2011) Robust shape-feature-vector-based face recognition system. IEEE Transactions on Instrumentation and Measurement. vol. 60, no. 12: 3781–3791.

11.

Yan

Niyogi

and Zhang

H. J.

(2005) Face recognition using Laplacian faces. IEEE Transactions on Pattern Analysis and Machine Intelligence. vol. 27, no. 3: 328–340.

12.

Turk

and Pentland

A. P.

(1991) Face recognition using eigenfaces. IEEE Conf. Computer Vision and Pattern Recognition: 586–591.

13.

Belhumeur

P. N.

Hespanha

J. P.

and Kriegman

D.J.

(1997) Eigenfaces vs. Fisherfaces: Recognition using class specific linear projection. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 19, no. 7: 711–720.

14.

Wang

S. J.

Yang

Zhang

and Zhou

C. G.

(2011) Tensor discriminant color space for face recognition. IEEE Transactions on Image Processing. vol. 20, no. 9: 2490–2501.

15.

Gopalan

Taheri

Turaga

and Chellappa

(2012) Blur-robust descriptor with applications to face recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence. vol. 34, no. 6: 1220–1226.

16.

Tian

Fan

Gao

and Tian

(2012) Multiview face recognition: From tensorface to v-tensorface and k-tensorface. IEEE Transactions on Systems, Man and Cybernetics-Part: B: Cybernetics. vol. 42, no. 2: 320–333.

17.

Zafeiriou

Tzimiropoulos

Petrou

and Stathaki

(2012) Regularized kernel discriminant analysis with a robust kernel for face recognition and verification. IEEE Transactions on Neural Networks. vol. 23, no. 3: 526–534.

18.

Deng

Dai

and Zhang

(2011) Graph Laplace for occluded face completion and recognition. IEEE Transactions on Image Processing. vol. 20, no. 8: 2329–2338.

19.

Huang

and He

(2011) Super-resolution method for face recognition using nonlinear mappings on coherent features. IEEE Transactions on Neural Networks. vol. 22, no. 1: 121–130.

20.

Smith

A. P.

and Hancock

E. R.

(2011) Gender discriminating models from facial surface normals. Pattern Recognition. vol. 44: 2871–2886.

21.

Bekios-Calfa

Buenaposada

J. M.

and Baumela

(2011) Revisiting Linear Discriminant Techniques in Gender Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence. vol. 33, no. 4: 858–864.

22.

Makinen

and Raisamo

(2008) Evaluation of Gender Classification Methods with Automatically Detected and Aligned Faces. IEEE Transactions on Pattern Analysis and Machine Intelligence. vol. 30, no. 3: 541–547.

23.

Chen

Y. N.

Han

C. C.

Wang

C. T.

and Fan

K. C.

(2011) Face recognition using nearest feature space embedding. IEEE Transactions on Pattern Analysis and Machine Intelligence. vol. 33, no. 6: 1073–1086.

24.

Yan

Zhang

H. J.

Yang

and Lin

(2007) Graph embedding and extensions: General framework for dimensionality reduction. IEEE Transactions on Pattern Analysis and Machine Intelligence. vol. 29, no. 1: 40–51.

25.

Roweis

S. T.

and Saul

L. K.

(2000) Nonlinear dimensionality reduction by locally linear embedding. Science. vol. 290, no. 22: 2323–2326.

26.

S. Z.

(1998) Face recognition based on nearest linear combinations. Proceedings of 1998 Computer Vision and Pattern Recognition: 839–844.

27.

Zheng

Zhao

and Zou

(2004) Locally nearest neighbor classifiers for pattern classification. Pattern Recognition. vol. 37: 1307–1309.

28.

, and Chen

Y. Q.

(2007) Rectified nearest feature line segment for pattern classification. Pattern Recognition. vol. 40: 1486–1497.

29.

Cai

Han

and Zhang

(2006) Orthogonal Laplacian faces for face recognition. IEEE Transactions on Image Processing. vol. 15, no. 11: 3608–3614.

30.

Sim

Baker

and Bsat

(2003) The CMU pose, illumination, and expression database. IEEE Transactions Pattern Analysis and Machine Intelligence. vol. 25, no. 12: 1615–1618.

31.

Luettin

and Maitre

(1998) Evaluation protocol for the extended M2VTS database (XM2VTS). DMI for Perceptual Artificial Intelligence.

32.

Troje

and Bulthoff

(1996) Face recognition under varying poses: The role of texture and shape. Vision Research. vol. 36: 1761–1771.

33.

Blanz

and Vetter

(1999) A morphable model for the synthesis of 3D faces. Proceeding of the SIGGRAPH'99 Conference: 187–194.