Distributed Face Recognition Using Multiple Kernel Discriminant Analysis in Wireless Sensor Networks

Abstract

This paper proposes a module based distributed wireless face recognition system by integrating multiple kernel discriminant analysis with face recognition in wireless sensor networks. By maximizing the margin maximization criterion (MMC), we separately perform an iterative scheme for kernel parameter optimization for each module. The simulation on the FERET and CMU PIE face databases shows that our multiple kernel framework and the optimization procedure achieve high recognition performance, compared with single-kernel-based KDDA.

1. Introduction

Face recognition (FR) system is one of the most important biometric techniques and is used in a wide range of security applications such as access control, identification systems, and surveillance [1]. FR is a contactless biometric technique and has advantages of being natural and passive over other biometric techniques requiring cooperative subjects, such as fingerprint recognition and iris recognition [2]. A normal framework of FR system is shown in Figure 1, including procedures of enrollment and identification [3].

Figure 1

A normal framework of FR system.

In recent years, FR systems combined with wireless sensor networks (WSNs) [4] have shown great interest, as WSNs are very helpful for contactless biometrics security applications. For example, Kim et al. implement a wireless face recognition system based on ZigBee protocol and principle components analysis (PCA) method with low energy consumption [5]. Muraleedharan et al. propose the use of a specific evolutionary algorithm to optimize routing in distributed time varying network for face recognition [6]. Chang and Aghajan focus on recovering face orientation for more robust face recognition in wireless image sensor networks [7]. Zaeri et al. propose application of face recognition for wireless surveillance systems [8].

As there exist many image variations such as pose, illumination, and facial expression, face recognition is a highly complex and nonlinear problem which could not be sufficiently handled by linear methods, such as principal components analysis (PCA) [9] and linear discriminant analysis (LDA) [10]. Therefore, it is reasonable to assume that a better solution to this inherent nonlinear problem could be achieved using nonlinear methods, such as the so-called kernel machine techniques [11]. Following the success of applying the kernel trick in support vector machines (SVMs) [12], many kernel-based PCA and LDA methods have been developed and applied in pattern recognition tasks, such as kernel PCA (KPCA) [13], kernel Fisher discriminant (KFD) [14], generalized discriminant analysis (GDA) [15], and kernel direct LDA (KDDA) [16].

It has been shown that the kernel-based LDA method is a feasible approach to solve the nonlinear problems in face recognition. However, the performance of the kernel-based LDA method is sensitive to the selection of a kernel function and its parameters. Kernel parameter selection to date can mainly be achieved by cross-validation [17], which is computationally expensive, and the selected kernel parameters cannot be guaranteed optimal. Furthermore, a single and fixed kernel can only characterize the geometrical structure of some aspects for the input data, and, thus, not always be fit for the applications which involve data from multiple, heterogeneous sources [18, 19].

Recent applications and developments based on SVMs [20, 21] have shown that using multiple kernels (i.e., a combination of several “base kernels”) instead of a single fixed one can enhance classifier performance, which raised the so-called multiple kernel learning (MKL) method. With m kernels, input data can be mapped into m feature spaces, where each feature space can be taken as one view of the original input data [19]. Each view is expected to exhibit some geometrical structures of the original data from its own perspective such that all the views can complement for the subsequent learning task. It has been proven that MKL can offer some needed flexibility and well manipulate the case that involves multiple, heterogeneous data sources [18, 22, 23]. However, MKL is proposed for SVMs, and there have been few reports on performance of the kernel-based LDA method with multiple kernels. Liu and Feng propose multiple kernel Fisher discriminant analysis (MKFD) with an iterative scheme for weight optimization [24], in which the constructed kernel is a linear combination of several base kernels with a constraint on their weights.

In this paper, we integrate multiple kernel discriminant analysis with face recognition in wireless sensor networks, and propose a module based distributed wireless face recognition system. We consider separate cluster head for each module, that is, fore head, eye, lips, and nose. Only the local cluster is responsible for internal module processing for both training and recognition.

The rest of this paper is organized as follows. First we describe the module based distributed wireless face recognition system in Section 2. Then in Section 3, the optimization scheme for the multikernels is presented. The simulation results are reported in Section 4, while we draw our conclusion in Section 5.

2. Module Based Distributed Wireless Face Recognition

We present a face recognition system in wireless sensor networks where training and recognition are performed both in distributed environment. The image is divided into four submodules, that is, forehead, eyes, nose, and lips, as shown in Figure 2. For face recognition tasks, enrolment and identification of each submodule are performed in separate cluster heads, and the computations are carried out in kernel feature space [12]. Each cluster head is responsible for processing its submodule and communicating with the sink cluster which performs the score level fusion.

Figure 2

Submodules of a face image.

The following describes the score level fusion criterion at the sink node. Given N images belonging to C subjects. Denote the membership degree of the nth image belonging to the cth subject as $μ_{n}^{c}$ , $n = 1,2, \dots, N$ , $c = 1,2, \dots, C$ . $μ_{n}^{c}$ is obtained as follows:

\begin{matrix} μ_{n}^{c} = α_{1} s_{n}^{c} (1) + α_{2} s_{n}^{c} (2) + α_{3} s_{n}^{c} (3) + α_{4} s_{n}^{c} (4), \end{matrix}

(1)

where $s_{n}^{c} (k)$ denotes the score of the kth submodule from the nth image with regard to the cth subject, and $α_{k}$ is the corresponding weight value, $k = 1,2, 3,4$ . The nth image is assigned to the Ith subject if and only if $μ_{n}^{I} = \max_{1 \leq c \leq C} μ_{n}^{c}$ .

Now we explain score $s_{n}^{c} (k)$ , taking the forehead module as an example. In the forehead module cluster, there are also N samples belonging to C classes. For the nth sample $x_{n}$ , we can compute the squared kernel distance between $x_{n}$ and the center of the cth class $m_{c}$ as follows:

\begin{array}{l} d_{n}^{c} = {∥ Φ (x_{n}) - Φ (m_{c}) ∥}^{2} \\ = Φ {(x_{n})}^{T} Φ (x_{n}) - 2 Φ {(x_{n})}^{T} Φ (m_{c}) + Φ {(m_{c})}^{T} Φ (m_{c}) \\ = k (x_{n}, x_{n}) - 2 k (x_{n}, m_{c}) + k (m_{c}, m_{c}), \end{array}

(2)

where $Φ : x \in R^{t} \to Φ (x) \in F$ is a nonlinear mapping, which is implicitly defined by a mercer kernel function $k (x, y) = Φ (x)^{T} Φ (y)$ [12, 25]. Then we sort the C distances ${d_{n}^{c} | c = 1,2, \dots, C}$ in ascending order and denote the squared kernel distances after sorting as ${d_{n}^{i (c)} | c = 1,2, \dots, C}$ ; that is, $i (c)$ is the order number of the cth largest squared kernel distance in ${d_{n}^{c} | c = 1,2, \dots, C}$ . Let

\begin{matrix} s_{n}^{i (c)} (1) = C - c, \end{matrix}

(3)

$c = 1,2, \dots, C$ , then we get ${s_{n}^{c} (1) | c = 1,2, \dots, C}$ . Thus, given the nth sample, the smaller the kernel distance between it and the center of the cth class, the greater $s_{n}^{c} (1)$ is, $c = 1,2, \dots, C$ .

Scores $s_{n}^{c} (2) ~ s_{n}^{c} (4)$ can be obtained from the eyes, nose, and lips modules, respectively, using a similar way to score $s_{n}^{c} (1)$ .

3. Optimization of Multiple Kernels

From Section 2, we can see the computations in the proposed distributed wireless face recognition frame are based on kernel techniques. The framework has four computing modules, for forehead, eyes, nose, and lips, respectively. These four kinds of data are so different that it is hard to imagine that an excellent classification can be reached by a single kernel. We propose the use of multiple kernels for these four modules. Specifically, we use the Gaussian RBF kernel

\begin{matrix} k (x, y) = \exp (- \frac{{∥ x - y ∥}^{2}}{σ^{2}}), \end{matrix}

(4)

but with different values of parameter σ for the different computing modules.

Every module should have its own optimal kernel parameter. For each module, we separately perform the following kernel parameter optimization procedure.

3.1. Some Notations on Kernel Discriminant Analysis

For a certain module X, there are N samples belonging to C classes. Assume the ith class $X_{i}$ contains $N_{i}$ samples; that is, $X_{i} = {x_{1}^{i}, x_{2}^{i}, \dots, x_{N_{i}}^{i}}$ , $i = 1,2, \dots, C$ , so $N = \sum_{i = 1}^{C} N_{i}$ . Denote the kernel matrix ( $N \times N$ ) as

\begin{matrix} K = {[k (x_{j}^{i}, x_{h}^{l})]}_{\begin{smallmatrix} i = 1, \dots, C, j = 1, \dots, N_{i} \\ l = 1, \dots, C, h = 1, \dots, N_{l} \end{smallmatrix}}, \end{matrix}

(5)

where $k (x_{j}^{i}, x_{h}^{l}) = Φ (x_{j}^{i})^{T} Φ (x_{h}^{l})$ ; then the kernel matrix corresponds to the nonlinear mapping.

Under the nonlinear mapping $Φ$ , the ith mapped class and the mapped sample set are, respectively, given by

\begin{matrix} Φ (X_{i}) = {Φ (x_{1}^{i}), Φ (x_{2}^{i}), \dots, Φ (x_{N_{i}}^{i})}, \\ Φ (X) = {Φ (X_{1}), Φ (X_{2}), \dots, Φ (X_{C})} . \end{matrix}

(6)

Also, the mean of the mapped class $Φ (X_{i})$ and that of the mapped sample set $Φ (X)$ are, respectively, given by

\begin{matrix} m_{i} = \frac{1}{N_{i}} \sum_{j = 1}^{N_{i}} Φ (x_{j}^{i}), m = \frac{1}{N} \sum_{i = 1}^{C} \sum_{j = 1}^{N_{i}} Φ (x_{j}^{i}) . \end{matrix}

(7)

In kernel feature space F (let f be dimensionality of F), the within-class scatter matrix $S_{w}^{Φ}$ and between-class scatter matrix $S_{b}^{Φ}$ are, respectively, defined as

\begin{matrix} S_{w}^{Φ} = \frac{1}{N} \sum_{i = 1}^{C} \sum_{x \in X_{i}} (Φ (x) - m_{i}) {(Φ (x) - m_{i})}^{T} = Φ_{w} Φ_{w}^{T}, \\ S_{b}^{Φ} = \frac{1}{N} \sum_{i = 1}^{C} N_{i} (m_{i} - m) {(m_{i} - m)}^{T} = Φ_{b} Φ_{b}^{T}, \end{matrix}

(8)

where

\begin{matrix} Φ_{w} = {[ϕ_{1}^{1}, \dots, ϕ_{N_{1}}^{1}, ϕ_{1}^{2}, \dots, ϕ_{N_{2}}^{2}, \dots \dots, ϕ_{1}^{C}, \dots, ϕ_{N_{C}}^{C}]}_{d f \times N}, \\ ϕ_{j}^{i} = \frac{1}{\sqrt{N}} (Φ (x_{j}^{i}) - m_{i}), \\ Φ_{b} = {[φ_{1}, \dots, φ_{C}]}_{d f \times C}, φ_{i} = \sqrt{\frac{N_{i}}{N}} (m_{i} - m) . \end{matrix}

(9)

The kernel Fisher criterion is defined as

\begin{matrix} J^{Φ} (W) = \frac{tr (W^{T} S_{b}^{Φ} W)}{tr (W^{T} S_{w}^{Φ} W)}, \end{matrix}

(10)

where $W = {w_{1}, \dots, w_{q}}$ is a $f \times q (f > q)$ projection matrix. Kernel discriminant analysis is to find an optimal projection matrix $W^{*} : R^{d f} \to R^{q}$ in mapped feature space F, such that $W^{*} = {argmax}_{W} J^{Φ} (W)$ .

3.2. Diagonalization Strategy

We use the same diagonalization strategy as KDDA [16] to deal with the small sample size (SSS) problem; that is, first diagonalizing $S_{b}^{Φ}$ to I (identical matrix) and then diagonalizing $S_{w}^{Φ}$ to $Λ_{w}$ , which is briefly expressed as follows.

3.2.1. Eigenanalysis of $S_{b}^{Φ}$ in the Feature Space

$Φ_{b}^{T} Φ_{b}$ can be expressed using the kernel matrix K as follows:

\begin{array}{l} Φ_{b}^{T} Φ_{b} = \frac{1}{N} D \cdot (A_{N C}^{T} \cdot K \cdot A_{N C} - \frac{1}{N} A_{N C}^{T} \cdot K \cdot 1_{N C} \\ - \frac{1}{N} 1_{N C}^{T} \cdot K \cdot A_{N C} + \frac{1}{N^{2}} 1_{N C}^{T} \cdot K \cdot 1_{N C}) \cdot D, \end{array}

(11)

where $D = diag (\sqrt{N_{1}}, \dots, \sqrt{N_{C}})$ ( $C \times C$ diagonal matrix), $1_{N C}$ is a $N \times C$ matrix with terms all equal to one, $A_{N C} = diag (a_{N_{1}}, \dots, a_{N_{C}})$ is a $N \times C$ block diagonal matrix, and $a_{N_{i}}$ is a $N_{i} \times 1$ vector with all terms equal to $1 / N_{i}$ .

Let $λ_{i}$ and $e_{i} (i = 1, \dots, C)$ be the ith largest eigenvalue and corresponding eigenvector of $Φ_{b}^{T} Φ_{b}$ . Let $r (\leq C - 1)$ be the rank of $S_{b}^{Φ} (= Φ_{b} Φ_{b}^{T})$ (also the rank of $Φ_{b}^{T} Φ_{b}$ ). Denote $E_{r} = (e_{1}, \dots, e_{r})$ and $V = (v_{1}, \dots, v_{r}) = Φ_{b} E_{r}$ . It can be derived that $V^{T} S_{b}^{Φ} V = Λ_{b}$ , with $Λ_{b} = diag (λ_{1}^{2}, \dots, λ_{r}^{2})$ , a nonsingular diagonal matrix. Let $U = V Λ_{b}^{- 1 / 2}$ . Then $U^{T} S_{b}^{Φ} U = I$ .

3.2.2. Eigenanalysis of $S_{w}^{Φ}$ in the Feature Space

Based on the analysis in Section 3.2.1, it can be seen that

\begin{matrix} U^{T} S_{w}^{Φ} U = {(E_{r} Λ_{b}^{- 1 / 2})}^{T} (Φ_{b}^{T} S_{w}^{Φ} Φ_{b}) (E_{r} Λ_{b}^{- 1 / 2}), \end{matrix}

(12)

where $Φ_{b}^{T} S_{w}^{Φ} Φ_{b}$ can be expressed using K, with similar details to that seen in [16].

Let $z_{j}$ be the eigenvector of $U^{T} S_{w}^{Φ} U$ corresponding to the jth smallest eigenvalue $λ_{j}^{'}$ , $j = 1, \dots, r$ . Denote $Z = (z_{1}, \dots, z_{r})$ . Defining $Y = U Z$ , it can be derived that $Y^{T} S_{w}^{Φ} Y = Λ_{w}$ , with $Λ_{w} = diag (λ_{1}^{'}, \dots, λ_{r}^{'})$ .

Based on the derivation presented in Sections 3.2.1 and 3.2.2, an optimal projection matrix for kernel discriminant analysis is obtained as

\begin{matrix} W^{*} = Y Λ_{w}^{- 1 / 2} = Φ_{b} E_{r} Λ_{b}^{- 1 / 2} Z Λ_{w}^{- 1 / 2} . \end{matrix}

(13)

Certainly, as the nonlinear mapping $Φ$ is implicitly defined by the kernel function (or matrix), $Φ_{b}$ (defined by (9)) remains unknown, and $W^{*}$ can not be evaluated. The real meaning of (13) is obtaining matrix $E_{r} Λ_{b}^{- 1 / 2} Z Λ_{w}^{- 1 / 2}$ , which can be computed from the kernel matrix K. This is the core result of diagonalization process.

3.3. Optimization Criterion and Objective

We adopt the maximum margin criterion (MMC) [26] as the objective function to optimize the kernel parameter σ for each specific module (module of forehead, eyes, nose, or lips):

\begin{matrix} F (W, σ) = tr (W^{T} S_{b}^{Φ} W) - tr (W^{T} S_{w}^{Φ} W), \end{matrix}

(14)

where W is a projection matrix, and σ is the parameter for the Gaussian RBF kernel as in (4).

Based on the result of (13) in Section 3.2, the optimal projection matrix $W^{*} = Φ_{b} E_{r} Λ_{b}^{- 1 / 2} Z Λ_{w}^{- 1 / 2}$ . Denoting $G = E_{r} Λ_{b}^{- 1 / 2} Z Λ_{w}^{- 1 / 2}$ , which can be computed from the kernel matrix K. Then the objective function (14) can be reformulated as

\begin{array}{l} F (σ) = tr (W^{* T} S_{b}^{Φ} W^{*} - W^{* T} S_{w}^{Φ} W^{*}) \\ = tr (G^{T} Φ_{b}^{T} Φ_{b} Φ_{b}^{T} Φ_{b} G - G^{T} Φ_{b}^{T} Φ_{w} Φ_{w}^{T} Φ_{b} G) \\ = tr (G^{T} P P^{T} G - G^{T} Q Q^{T} G), \end{array}

(15)

where $P = Φ_{b}^{T} Φ_{b}$ and $Q = Φ_{b}^{T} Φ_{w}$ can be expressed in terms of the kernel matrix K as follows:

\begin{array}{l} P = \frac{1}{N} D \cdot (A_{N C}^{T} \cdot K \cdot A_{N C} - \frac{1}{N} A_{N C}^{T} \cdot K \cdot 1_{N C} \\ - \frac{1}{N} 1_{N C}^{T} \cdot K \cdot A_{N C} + \frac{1}{N^{2}} 1_{N C}^{T} \cdot K \cdot 1_{N C}) \cdot D, \end{array}

(16)

with D, $A_{N C}$ , and $1_{N C}$ defined the same as in (11);

\begin{array}{l} Q = \frac{1}{N} D \cdot (A_{N C}^{T} \cdot K - A_{N C}^{T} \cdot K \cdot H_{N N} \\ - \frac{1}{N} 1_{N C}^{T} \cdot K + \frac{1}{N} 1_{N C}^{T} \cdot K \cdot H_{N N}), \end{array}

(17)

where $H_{N N} = diag (h_{N_{1}}, \dots, h_{N_{C}})$ is a $N \times N$ block diagonal matrix and $h_{N_{i}}$ is a $N_{i} \times N_{i}$ matrix with all terms equal to $1 / N_{i}$ .

Therefore, regarding G as constant, the objective function $F (σ)$ is an explicit function of the kernel parameter σ. To maximize the objective function with $σ_{0}$ as the initial value of the parameter, an iterative procedure based on Newton's method is developed in our method to update the kernel parameter, as shown in the following section.

3.4. Solving the Optimization Problem

Assume that G is constant. To obtain the extremum of the objective function $F (σ)$ , we need to differentiate $F (σ)$ with respect to σ.

For the sake of clarity, we denote the differentiation of the kernel matrix K with respect to σ as $d K_{σ}$ , which is an $N \times N$ matrix expressed as follows:

\begin{matrix} d K_{σ} = {[\frac{d k (x_{j}^{i}, x_{h}^{l})}{d σ}]}_{\begin{smallmatrix} i = 1, \dots, C, j = 1, \dots, N_{i} \\ l = 1, \dots, C, h = 1, \dots, N_{l} \end{smallmatrix}} . \end{matrix}

(18)

Each element in matrix $d K_{σ}$ is the differentiation of the corresponding element in the kernel matrix K with respect to σ and can be formulated as

\begin{matrix} \frac{d k (x_{j}^{i}, x_{h}^{l})}{d σ} = \frac{d}{d σ} \exp (- \frac{{∥ x_{j}^{i} - x_{h}^{l} ∥}^{2}}{σ^{2}}) \\ = \frac{2 {∥ x_{j}^{i} - x_{h}^{l} ∥}^{2}}{σ^{3}} \exp (- \frac{{∥ x_{j}^{i} - x_{h}^{l} ∥}^{2}}{σ^{2}}) . \end{matrix}

(19)

From (16) and (17), matrices P and Q can be differentiated with respect to σ as follows:

\begin{array}{l} \frac{d P}{d σ} = \frac{1}{N} D \cdot (A_{N C}^{T} \cdot d K_{σ} \cdot A_{N C} - \frac{1}{N} A_{N C}^{T} \cdot d K_{σ} \cdot 1_{N C} \\ - \frac{1}{N} 1_{N C}^{T} \cdot d K_{σ} \cdot A_{N C} + \frac{1}{N^{2}} 1_{N C}^{T} \cdot d K_{σ} \cdot 1_{N C}) \\ \cdot D, \\ \frac{d Q}{d σ} = \frac{1}{N} D \cdot (A_{N C}^{T} \cdot d K_{σ} - A_{N C}^{T} \cdot d K_{σ} \cdot H_{N N} \\ - \frac{1}{N} 1_{N C}^{T} \cdot d K_{σ} + \frac{1}{N} 1_{N C}^{T} \cdot d K_{σ} \cdot H_{N N}), \end{array}

(20)

where $d P / d σ$ is a $C \times C$ matrix and $d Q / d σ$ is a $C \times N$ matrix.

Then, the derivative of F with respect to σ can be formulated as

\begin{array}{l} F^{'} (σ) = tr ((G^{T} \frac{d P}{d σ} P^{T} G + G^{T} P \frac{d P^{T}}{d σ} G) \\ - (G^{T} \frac{d Q}{d σ} Q^{T} G + G^{T} Q \frac{d Q^{T}}{d σ} G)) . \end{array}

(21)

Thus the derivative of F with respect to σ can be expressed in terms of matrix K and matrix $d K_{σ}$ . To achieve the maximum of $F (σ)$ , we set the derivative to zero:

\begin{matrix} F^{'} (σ) = 0 . \end{matrix}

(22)

We use Newton's method to solve (22) with the initial value of the kernel parameter

\begin{matrix} σ_{0} = \sqrt{\frac{1}{N^{2}} \sum_{i = 1}^{N} \sum_{j = 1}^{N} {∥ x_{i} - x_{j} ∥}^{2}}; \end{matrix}

(23)

that is, $σ_{0}^{2}$ is the average squared distance of all the samples in the given module. And the iteration formula is

\begin{matrix} σ_{k + 1} = σ_{k} - \frac{F (σ_{k})}{F^{'} (σ_{k})} . \end{matrix}

(24)

$k = 0,1, 2, \dots$ .

3.5. Identification Using Multiple Kernels

For each of the four modules (forehead, eyes, nose, and lips), the optimization of the Gaussian kernel parameter runs and finds the optimal value as above separately. Then the four submodules of a testing image are fed into the corresponding kernel discriminant classifiers to compute the membership degree of the image belonging to the every subject according to (1). Finally the image is assigned to the subject which shows the greatest membership degree of the image.

4. Simulation

To evaluate the performance of our multiple kernel framework for distributed wireless face recognition, we have made experimental comparisons with KDDA based on Gaussian RBF kernel, in terms of recognition accuracy. Images are from two face databases, namely, the FERET and the CMU PIE databases.

In our experiments, for the weight value in the fusion criterion (1), we set $α_{1} = 0.1$ , $α_{2} = 0.3$ , $α_{3} = 0.4$ , and $α_{4} = 0.2$ , meaning that we give larger weights to the nose module and the eyes module.

4.1. Face Image Datasets

From the FERET database [27], we select 72 people, with 6 frontal-view images for each individual. Face image variations in these 432 images include illumination, facial expression, wearing glasses, and aging. All the images are aligned by the centers of the eyes and the mouth and then normalized with a resolution of 92 × 112. The pixel value of each image is normalized between 0 and 1. The original images with resolution 92 × 112 are reduced to wavelet feature faces with resolution 49 × 59 after 1-level Daubechies-4 (Db4) wavelet decomposition. Images from one individual are shown in Figure 3.

Figure 3

Images of one person from the FERET database.

In the CMU PIE face database [28], there are a total of 68 people, and each person has 13 pose variations ranged from the full right profile image to the full left profile image and 43 different lighting conditions, 21 flashes with ambient light on or off. In our experiments, for each person, we select 56 images including 13 poses with neutral expression and 43 different lighting conditions in the frontal view. For all frontal-view images, we apply alignment based on two-eye center and nose center points, and no alignment is applied on the other images with poses. All the segmented images are rescaled to the resolution of 92 × 112, and then reduced to wavelet feature faces with resolution 49 × 59 after 1-level Daubechies-4 (Db4) wavelet decomposition. Some images of one person are shown in Figure 4.

Figure 4

Some images of one person from the CMU PIE face database.

4.2. Recognition Results

This section reports the recognition results of the proposed multiple kernel framework and KDDA with a single Gaussian RBF kernel on the FERET and the CMU PIE datasets. For KDDA, the parameter of Gaussian RBF kernel is optimized via grid search. For each subject in the FERET dataset, we randomly select n ( $n = 2$ to 5) out of 6 images for training, with the rest for testing. In the CMU PIE dataset, the number of randomly selected training images is ranged from 10 to 18 out of 56 for each individual, while the rest are testing images. The average recognition accuracies over 10 runs on the FERET and CMU PIE datasets are shown in Figures 5(a) and 5(b), respectively.

Figure 5

Comparison of accuracies obtained by multiple kernel framework and KDDA with a single kernel.

Table 1 shows the average and standard deviation of the accuracies for FERET (n = 4: 4 images per subject for training with the rest for testing) and CMU-PIE (n = 14: 14 images per subject for training with the rest for testing), respectively.

Table 1

Performance comparison between multiple kernel framework and KDDA with a single kernel.

Type of kernel	FERET ( $n = 4$ )	CMU PIE ( $n = 14$ )
RBF kernel
Mean	87.54%	71.99%
Std	0.042	0.013
Multi kernels
Mean	91.16%	78.96%
Std	0.011	0.006

The bold value means the higher average accuracy of the two methods.

From the results in Figure 5 and Table 1, it can be seen that the proposed multiple kernel framework can achieve higher accuracies than KDDA with an optimized parameter.

5. Conclusion

In this paper, on the assumption that multiple kernels can characterize geometrical structures of the original data from multiple views which can complement to improve recognition performance, we integrate multiple kernel discriminant analysis with face recognition in wireless sensor networks and propose a module based distributed wireless face recognition system. For each module, we separately perform an iterative scheme based on Newton's method for kernel parameter optimization, by maximizing the margin maximization criterion. The multiple kernel framework and the optimization procedure yield high recognition accuracy on the FERET and CMU PIE face database, compared with a single kernel.

Footnotes

Conflict of Interests

The authors declare that they have no conflict of interests regarding the publication of this paper.

Acknowledgments

This work is partially supported by the Basic and Frontier Technology Research Project of Henan Province in China under Grant no. 122300410321 and Science and Technology Development Project of Henan Province under Grant no. 132102210186.

References

Zhao

Chellappa

Phillips

P. J.

Rosenfeld

Face recognition: a literature survey

ACM Computing Surveys 2003 35 4 399 458

2-s2.0-1842499650

10.1145/954339.954342

Zhang

Gao

Face recognition across pose: a review

Pattern Recognition 2009 42 11 2876 2896

2-s2.0-67649414924

10.1016/j.patcog.2009.04.017

Lin

Q.-M.

Yang

J.-W.

Wang

R.-C.

Zhang

Face recognition in mobile wireless sensor networks

International Journal of Distributed Sensor Networks 2013 2013 7

890737

10.1155/2013/890737

Akyildiz

I. F.

Sankarasubramaniam

Cayirci

Wireless sensor networks: a survey

Computer Networks 2002 38 4 393 422

2-s2.0-0037086890

10.1016/S1389-1286(01)00302-4

Kim

Shim

Schlessman

Wolf

Remote wireless face recognition employing zigbee

Proceedings of the ACM SenSys Workshop on Distributed Smart Cameras (DSC '06)

2006

Boulder, Colo, USA

Muraleedharan

Yan

Osadciw

L. A.

Increased efficiency of face recognition system using wireless sensor network

Systemics, Cybernetics and Informatics 2006 4 1 38 46

Chang

C. C.

Aghajan

Collaborative face orientation detection in wireless image sensor networks

Proceedings of the ACM SenSys Workshop on Distributed Smart Cameras (DSC '06)

2006

Boulder, Colo, USA

Zaeri

Mokhtarian

Cherri

Gobbetti

Efficient face recognition for wireless surveillance systems

Proceedings of the 9th IASTED International Conference on Computer Graphics and Imaging (CGIM '07)

February 2007

Innsbruck, Austria

OACTA Press

132 137

2-s2.0-54949102397

Turk

Pentland

Eigenfaces for recognition

Journal of Cognitive Neuroscience 1991 3 1 71 86

2-s2.0-0026065565

10.

Belhumeur

P. N.

Hespanha

J. P.

Kriegman

D. J.

Eigenfaces versus fisherfaces: recognition using class specific linear projection

IEEE Transactions on Pattern Analysis and Machine Intelligence 1997 19 7 711 720

2-s2.0-0031185845

10.1109/34.598228

11.

Ruiz

López-de-Teruel

P. E.

Nonlinear kernel-based statistical pattern analysis

IEEE Transactions on Neural Networks 2001 12 1 16 32

2-s2.0-0035111818

10.1109/72.896793

12.

Vapnik

V. N.

The Nature of Statistical Learning Theory 1995

New York, NY, USA

Springer

13.

Schölkopf

Smola

Müller

Nonlinear component analysis as a kernel eigenvalue problem

1996 44

Tubingen, Germany

MPI fur Biologische Kybernetik

14.

Mika

Rätsch

Weston

Schölkopf

Müller

K. R.

Fisher discriminant analysis with kernels

Proceedings of the IEEE Signal Processing Society Workshop, Neural Networks for Signal Processing IX

1999

Madison, Wis, USA

41 48

15.

Baudat

Anouar

Generalized discriminant analysis using a kernel approach

Neural Computation 2000 12 10 2385 2404

2-s2.0-0034296402

16.

Plataniotis

K. N.

Venetsanopoulos

A. N.

Face recognition using kernel direct discriminant analysis algorithms

IEEE Transactions on Neural Networks 2003 14 1 117 126

2-s2.0-0037276932

10.1109/TNN.2002.806629

17.

Chapelle

Vapnik

Bousquet

Mukherjee

Choosing multiple parameters for support vector machines

Machine Learning 2002 46 1–3 131 159

2-s2.0-0036161011

10.1023/A:1012450327387

18.

Sonnenburg

Rätsch

Schäfer

A general and efficient multiple kernel learning algorithm

Proceedings of the Neural Information Processing Systems

2005

19.

Wang

Chen

Sun

MultiK-MHKS: a novel multiple kernel learning algorithm

IEEE Transactions on Pattern Analysis and Machine Intelligence 2008 30 2 348 353

2-s2.0-37549013404

10.1109/TPAMI.2007.70786

20.

Zhang

Bennett

K. P.

Column-generation boosting methods for mixture of kernels

Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, (KDD '04)

August 2004

New York, NY, USA

521 526

2-s2.0-12244300139

21.

Lanckriet

G. R. G.

Cristianini

Bartlett

El Ghaoui

Jordan

M. I.

Learning the kernel matrix with semidefinite programming

Journal of Machine Learning Research 2004 5 27 72

2-s2.0-8844278523

22.

Bach

F. R.

Lanckriet

G. R. G.

Jordan

M. I.

Multiple kernel learning, conic duality, and the SMO algorithm

Proceedings of the 21th International Conference on Machine Learning, (ICML '04)

July 2004

New York, NY, USA

41 48

2-s2.0-14344252374

23.

Bennett

K. P.

Momma

Embrechts

M. J.

MARK: a boosting algorithm for heterogeneous kernel models

Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, (KDD '02)

July 2002

New York, NY, USA

24 31

2-s2.0-0242540474

24.

Liu

X.-Z.

Feng

G.-C.

Multiple kernel learning in fisher discriminant analysis for face recognition

International Journal of Advanced Robotic Systems 2013 10, article 142

10.5772/52350

25.

Mercer

Functions of Positive and Negative Type and Their Connection with the Theory of Integral Equations 1909 209

The Royal Society

Philosophical Transactions of the Royal Society of London Series A

26.

Jiang

Zhang

Thrun

Saul

Schölkopf

Efficient and robust feature extraction by maximum margin criterion

Advances in Neural Information Processing Systems 16 2004

Cambridge, Mass, USA

The MIT Press

157 1165

27.

Jonathon Phillips

Moon

Rizvi

S. A.

Rauss

P. J.

The FERET evaluation methodology for face-recognition algorithms

IEEE Transactions on Pattern Analysis and Machine Intelligence 2000 22 10 1090 1104

2-s2.0-0034290919

10.1109/34.879790

28.

Sim

Baker

Bsat

The CMU pose, illumination, and expression (PIE) database

Proceedings of the 5th IEEE International Conference on Automatic Face and Gesture Recognition

May 2002

Washington, DC, USA

Distributed Face Recognition Using Multiple Kernel Discriminant Analysis in Wireless Sensor Networks

Abstract

1. Introduction

2. Module Based Distributed Wireless Face Recognition

3. Optimization of Multiple Kernels

3.1. Some Notations on Kernel Discriminant Analysis

3.2. Diagonalization Strategy

3.2.1. Eigenanalysis of S b Φ in the Feature Space

3.2.2. Eigenanalysis of S w Φ in the Feature Space

3.3. Optimization Criterion and Objective

3.4. Solving the Optimization Problem

3.5. Identification Using Multiple Kernels

4. Simulation

4.1. Face Image Datasets

4.2. Recognition Results

5. Conclusion

Footnotes

Conflict of Interests

Acknowledgments

References

3.2.1. Eigenanalysis of $S_{b}^{Φ}$ in the Feature Space

3.2.2. Eigenanalysis of $S_{w}^{Φ}$ in the Feature Space