Sage Journals: Discover world-class research

Abstract

A manifold adaptive kernel semisupervised discriminant analysis algorithm for gait recognition is proposed in this paper. Motivated by the fact that the nonlinear structure captured by the data-independent kernels (such as Gaussian kernel, polynomial kernel, and Sigmoid kernel) may not be consistent with the discriminative information and the intrinsic manifold structure information of gait image, we construct two graph Laplacians by using the two nearest neighbor graphs (i.e., an intrinsic graph and a penalty graph) to model the discriminative manifold structure. We then incorporate these two graph Laplacians into the kernel deformation procedure, which leads to the discriminative manifold adaptive kernel space. Finally, the discrepancy-based semi-supervised discriminant analysis is performed in the manifold adaptive kernel space. Experimental results on the well-known USF HumanID gait database demonstrate the efficacy of our proposed algorithm.

1. Introduction

In the past two decades, gait recognition has become a hot research topic in pattern recognition and computer vision, owing to its wide applications in many areas such as information surveillance, identity authentication, and human-computer interface. While many algorithms have been proposed for gait recognition [1 –5], the most successful and popular approaches to date are the average silhouettes-based methods with subspace learning. The common goal of theses approaches is to find a compact and representative low-dimensional feature subspace for gait representation, so that the intrinsic characteristics of the original gait image are well preserved. Representative algorithms include principal component analysis (PCA) [6], linear discriminant analysis (LDA) [6], locality preserving projection (LPP) [7], and marginal Fisher analysis (MFA) [8].

PCA aims to generate a set of orthonormal basis vectors where the samples have the minimum reconstruction error. Since PCA is an unsupervised method, it is optimal in terms of reconstruction, but not for discriminating one class from others. LDA is a supervised subspace learning approach which seeks the projection directions that maximize interclass scatter and at the same time minimize intraclass scatter. When the label information is available, LDA usually outperforms PCA for pattern classification tasks. However, LDA has a critical drawback: its available feature dimension is limited by the number of classes in the data. To overcome this problem, Tao et al. proposed the general averaged divergence analysis (GADA) [9] framework and the maximization of the geometric mean of all divergences (MGMD) [10] method, respectively. In addition, in order to efficiently and robustly estimate the low-rank and sparse structure of high-dimensional data, Zhou and Tao [11] developed “Go Decomposition” (GoDec) method and proved its asymptotic and convergence speed. While these algorithms have attained reasonable good performance in gait recognition, face recognition, and object classification, they are designed for discovering only the global Euclidean structure, whereas the local manifold structure is ignored.

Recently, various researches have shown that images possibly reside on a nonlinear submanifold [12 –14]. Therefore, gait image representation is fundamentally related to the manifold learning, which aims to derive a compact low-dimensional embedding that preserves local geometric properties of underlying high-dimensional data. The most representative manifold learning algorithm is locality preserving projection (LPP) [7], which aims to find the optimal linear approximation to eigenfunctions of the Laplace-Beltrami operator on the data manifold. Since LPP is originally unsupervised and does not take the class label information into account, it does not necessarily work well in supervised dimensionality reduction scenarios. By jointly considering the local manifold structure and the class label information, as well as characterizing the separability of different classes with the margin criterion, marginal Fisher analysis (MFA) [8] delivers reasonably good performance in many pattern classification applications. While the motivations of these methods are different, they can be interpreted into a general graph embedding framework (GEF) [8] or the patch alignment framework (PAF) [15]. In addition, the discriminative information preservation (DIP) [16] algorithm was also proposed by using PAF. Although the above vector-based dimensionality reduction algorithms have achieved great success in image analysis and computer vision, they seriously destroyed the intrinsic tensor structure of high-order data. To overcome this issue, Tao et al. [17, 18] generalized the vector-based learning to the tensor-based learning and proposed the supervised tensor learning (STL) framework. More recently, it has been shown that the slow feature analysis (SFA) [19] can extract useful motion patterns and improve the recognition performance.

In general, the supervised dimensionality reduction approaches are suitable for pattern classification tasks when there are sufficient labeled data available. Unfortunately, in many practical applications of pattern classification, one often faces a lack of sufficient labeled data, since the labeling process usually requires much human labor. Meanwhile, in many cases, large numbers of unlabeled data can be easier to obtain. To effectively utilize the labeled and unlabeled data simultaneously, semisupervised learning [20] was proposed and introduced into the dimensionality reduction process. The motivation behind semisupervised learning is to employ a large number of unlabeled data to help build more accurate models from the labeled data. In the last decades, many semisupervised learning methods have been proposed, such as transductive SVM (TSVM) [21], cotraining [22], and graph-based semisupervised learning algorithms [23]. In addition, motivated by the recent progress in Hessian eigenmaps, Tao et al. [24] introduced the Hessian regularization into SVM for semisupervised learning and mobile image annotation on the cloud. All these algorithms only considered the classification problem, either transductive or inductive. Semisupervised dimensionality reduction has been considered recently, the most representative algorithm is semisupervised discriminant analysis (SDA) [25], which aims to extract discriminative features and preserve geometrical information of both labeled and unlabeled data for dimensionality reduction. While SDA has achieved reasonably good performance in face image and image retrieval, there are still some problems that are still not properly addressed till now.

The original SDA is still a linear technique in nature. It can only extract the linear features of input patterns, and it fails for nonlinear features. So SDA is inadequate to describe the complexity of real gait images because of viewpoints, surface, and carrying status variations.

The original SDA suffers from the singular (small sample size) problem, which exists in high-dimensional pattern recognition tasks such as gait recognition, where the dimension of the samples is much larger than the number of available samples.

To address the above issues, we propose a novel manifold adaptive kernel semisupervised discriminant analysis (MAKSDA) algorithm for gait recognition in this paper. First, we reformulate the optimal objective function of SDA using the discrepancy criterion rather than the ratio criterion, so that the singular problem can be avoided. Second, the discrepancy-based SDA is extended to the nonlinear case through kernel trick [26]. Meanwhile, the discriminative manifold adaptive kernel function is proposed to enhance the learning capability of the MAKSDA. Finally, experimental results on gait recognition are presented to demonstrate the effectiveness of the proposed algorithm.

In summary, the contributions of this paper are as follows.

We propose MAKSDA algorithm. MAKSDA integrates the discriminative information obtained from the labeled gait images and the manifold adaptive kernel function explored by the unlabeled gait images to form the low-dimensional feature space for gait recognition.

In order to avoid the singular problem, we explore the discrepancy criterion rather than the ratio criterion in the kernel space.

We have analyzed different parameter settings of MAKSDA algorithm, including the kernel function type, the nearest neighbor size in the intrinsic graph, and the nearest neighbor size in the penalty graph.

The remainder of this paper is organized as follows. Section 2 describes how to extract the Gabor-based gait representation. Section 3 briefly reviews SDA. In Section 4, we propose the MAKSDA algorithm for gait recognition. The experimental results are reported in Section 5. Finally, we provide the concluding remarks and suggestions for future work in Section 6.

2. Gabor Feature Representation of Gait Image

The effective representation of gait image is a key issue of gait recognition. In the following, we employ the averaged gait image as the appearance model [1, 2], since it can employ a compact representation to characterize the motion patterns of the human body. In addition, the Gabor wavelets [27], whose kernels are similar to the 2D receptive field profiles of the mammalian cortical simple cells, exhibit desirable characteristics of spatial locality and orientation selectivity. Therefore, it is reasonable to use Gabor functions to model averaged gait images. Partialarly, the averaged gait image is first decomposed by using Gabor filters; we then combine the decomposed images to give a new Gabor feature representation, which has been demonstrated to be an effective feature for gait recognition.

The Gabor filters are the product of an elliptical Gaussian envelope and a complex plane wave, which can be defined as follows:

ψ_{μ, v} (z) = \frac{{∥ k_{μ, v} ∥}^{2}}{σ^{2}} \exp (- \frac{{∥ k_{μ, v} ∥}^{2} {∥ z ∥}^{2}}{σ^{2}}) \times [\exp (i k_{μ, v} z) - \exp (- \frac{σ^{2}}{2})],

(1)

where μ and ν define the orientation and scale of Gabor filters, respectively, z = (μ, ν), $∥ \cdot ∥$ denotes the norm operator, and the definition of wave vector k_{μ, v} is as follows:

k_{μ, v} = k_{v} e^{i ϕ_{μ}},

(2)

where $k_{v} = k_{\max} / f^{ν}$ and $ϕ_{μ} = π μ / 8$ . k_max represents the maximum frequency and its value is usually set as $π / 2$ . f denotes the spacing factor between kernels in the frequency domain and its value is usually set as $\sqrt{2}$ .

The Gabor filters defined in (1) are all self-similar since they can be generated from one filter (the mother wavelet) by scaling and rotating via the wave vector k_{μ, v}. The term $\exp (- σ^{2} / 2)$ is subtracted in order to make the kernel DC-free. Thus, a band of Gabor filters is generated by a set of various scales and rotations.

In this paper, following the conventional settings, we use Gabor filters at five scales $ν ∊ {0,1, 2,3, 4}$ and eight orientations $μ ∊ {0,1, 2,3, 4,5, 6,7}$ with the parameter σ = 2π. Then, we have 40 Gabor kernel functions from five scales and eight orientations. Figures 1 and 2 show the real part of the Gabor filters used in this paper and their magnitude, respectively. As can be seen, the Gabor filters demonstrate desirable features of spatial localization, orientation selectivity, and frequency selectivity.

Figure 1:

The real part of the Gabor filters at five scales and eight orientations.

Figure 2:

The magnitude of the Gabor filters at five scales and eight orientations.

The Gabor feature representation of a gait image is obtained by convolving the Gabor filters with the averaged gait image. Let AT (x, y) be the averaged gait image; the convolution of the gait image AT and Gabor filters ψ_{μ, v}(z) can be defined as follows:

O_{μ, ν} (z) = AT (x, y) * ψ_{μ, v} (z),

(3)

where z = (x, y), * represents the convolution operator, and O_{μ, ν}(z) is the convolution result corresponding to the Gabor filters ψ_{μ, v}(z). As a result, the set

S = {O_{μ, ν} (z) : μ ∊ {0,1, 2,3, 4,5, 6,7}, ν ∊ {0,1, 2,3, 4}}

(4)

forms the Gabor feature representation of the gait image AT (z). As can be seen, for each averaged gait image, we can obtain 40 Gabor-filtered images after convolving the averaged gait image with the Gabor filters. In addition, as suggested in [28, 29], in order to encompass different spatial frequencies (scales), spatial localities, and orientation selectivity, we concatenate all these representation results and derive the final Gabor feature vector. Before the concatenation, O_{μ, ν}(z) is downsampled by a factor ρ to reduce the space dimension and normalize it to zero and unit variance. We then construct a vector out of the O_{μ, ν}(z) by concatenating its rows (or columns). Now, let O_{μ, ν}^ρ(z) represent the normalized vector constructed from O_{μ, ν}(z); the final Gabor feature vector O^ρ can be defined as follows:

O^{ρ} = {O_{μ, ν}^{ρ} (z) : μ ∊ {0,1, 2,3, 4,5, 6,7}, ν ∊ {0,1, 2,3, 4}} .

(5)

Consequently, the vector O^ρ serves as the Gabor feature representation of the averaged gait image for gait recognition.

3. Brief Review of SDA

Given a set of l labeled samples {x₁, …, x_l}, each of them has a class label c_i ∊ {1, …, C} and m unlabeled samples {x_{l + 1}, …, x_{l + m}} with unknown class labels. Let l + m = n and x_i ∊ ℝ^D; the optimal objective function of SDA is defined as follows:

U_{opt} = \max_{U} \frac{U^{T} S_{b} U}{U^{T} S_{t} U + β J (U)},

(6)

where S_b and S_t denote the between-class scatter matrix and total scatter matrix, respectively. According to the graph perspective of LDA in [7, 8], S_b and S_t can be defined as follows:

\begin{matrix} S_{b} = X W X^{T}, \\ S_{t} = X X^{T}, \end{matrix}

(7)

where X = [x₁, …, x_l] and the weight matrix W is defined as follows:

W_{i j} = {\begin{cases} \frac{1}{| c_{i} |}, & c_{i} = c_{j}, \\ 0, & otherwise, \end{cases}

(8)

where $| c_{i} |$ denotes the total number of data samples belonging to the class label c_i.

In addition, the regularizer item J(U) is used to model the manifold structure. By using locally invariant idea of manifold learning, J(U) is defined as follows:

J (U) = \frac{1}{2} \sum_{i, j} {(U^{T} x_{i} - U^{T} x_{j})}^{2} W_{i j} = U^{T} X L X^{T} U,

(9)

where L = D – S is the graph Laplacian, D is a diagonal matrix given by $D_{i i} = \sum_{j} S_{i j}$ , and S denotes the following weight matrix:

S_{i j} = {\begin{cases} 1, & if x_{i} is among the k -nearest neighbor of x_{j} \\ or x_{j} is among the k -nearest neighbors of x_{i}, \\ 0, & otherwise . \end{cases}

(10)

By substituting (7) and (9) into (6), we have

U_{opt} = \arg \max_{U} \frac{U^{T} X W X^{T} U}{U^{T} X X^{T} U + β U^{T} X L X^{T} U} .

(11)

Then, the projection vector U is given by the maximum eigenvalue solution to the generalized eigenvalue problem:

X W X^{T} U = λ (X X^{T} + β X L X^{T}) U .

(12)

Although SDA has exploited both discriminant and geometrical information for dimensional reduction and achieved reasonably good performance in many fields, there are still some problems that are not properly addressed until now.

SDA suffers from the singular problem in gait recognition, since the number of gait images is much smaller than the dimension of each gait image.

SDA is a linear technique in nature, so it is inadequate to describe the complexity of real gait images. Although the nonlinear extension of SDA through kernel trick has been proposed in [25], it still has two shortcomings: (1) it suffers from the singular problem and (2) it adopts the data-independent kernels which may not be consistent with the intrinsic manifold structure revealed by labeled and unlabeled data samples.

To fully address the above issues, we propose a novel manifold adaptive kernel semisupervised discriminant analysis (MAKSDA) algorithm for gait recognition in the following section.

4. Manifold Adaptive Kernel SDA (MAKSDA) Algorithm

Although SDA can produce linear discriminating feature, the problem of numerical computation for gait recognition still exists; that is, the matrix (XX^T + βXLX^T) in (12) may suffer from the singular problem. In this paper, the discrepancy criterion [30 –32] is proposed as an alternative way to avoid the singular problems of SDA, since the ratio criterion can be well solved by the discrepancy criterion. Then, the discrepancy-based SDA can be defined as follows:

\begin{matrix} Q (U) = \arg \max_{U} (U^{T} S_{b} U - (U^{T} S_{t} U + β J (U))) \\ = \arg \max_{U} (U^{T} X W X^{T} U - (U^{T} X X^{T} U + β U^{T} X L X^{T} U)) \\ = \arg \max_{U} (U^{T} (X W X^{T} - X X^{T} - β X L X^{T}) U) . \end{matrix}

(13)

Then, maximizing Q(U) is equivalent to maximizing S_b and minimizing $(S_{t} + β J (U))$ simultaneously, which is consistent with the ratio criterion of the original SDA.

Since we can freely multiply U by some nonzero constant, we assume $∥ U ∥ = 1$ . Then, the maximization problem in (13) can be equivalently transformed into the following Lagrange function:

L (U, λ) = Q (U) - λ (∥ U ∥ - 1) .

(14)

Let $\partial L (U, λ) / \partial U = 0$ ; we can obtain

(X W X^{T} - X X^{T} - β X L X^{T}) U = λ U .

(15)

Then, the SDA problem can be transformed into finding the leading eigenvectors of matrix (XWX^T – XX^T – βXLX^T). Since no matrix inverse operation needs to be calculated, the discrepancy-based SDA successfully avoids the singular problem of the original SDA.

Let the column vectors U₁, U₂, …, U_d be the solution of (15) ordered in terms of their eigenvalues λ₁, λ₂, …, λ_d. Thus, the SDA embedding is given by y_i = U^Tx_i, where y_i denotes the lower-dimensional feature representation of x_i and U = (U₁, U₂, …, U_d) is the optimal projection matrix of SDA.

Although the above discrepancy-based SDA algorithm avoids the singular problem of the original SDA algorithm, it is still a linear algorithm. It may fail to discover the nonlinear geometry structure when gait images are highly nonlinear. Thus, in order to solve the nonlinear problem, the discrepancy-based SDA needs to be generalized to its nonlinear version via kernel trick. The main idea of kernel trick is to map the input data into a feature space with a nonlinear mapping function, where the inner products in the feature space can be easily computed through a kernel function without knowing the nonlinear mapping function explicitly. In the following, we discuss how to perform the discrepancy-based SDA in reproducing kernel hilbert Space (RKHS) and how to produce a manifold adaptive kernel function which is consistent with the intrinsic manifold structure, which gives rise to the manifold adaptive kernel SDA (MAKSDA).

To extend SDA to MAKSDA, let φ: x ∊ ℝ^N → φ(x) ∊ F be a nonlinear mapping function from the input space to a high-dimensional feature space F. The idea of MAKSDA is to perform the discrepancy-based SDA in the feature space F instead of the input space ℝ^N. For a proper chosen φ, an inner product $〈, 〉$ can be defined on F, which makes for a so-called RKHS. More specifically, $〈 φ (x), φ (y) 〉 = K (x, y)$ holds, where K(·, ·) is a positive semidefinite kernel function.

Let S_b^φ, S_t^φ, and J^φ(U) denote the between-class scatter matrix, the total scatter matrix, and the regularizer item in the feature space, respectively. According to (7) and (9), we can obtain

\begin{matrix} S_{b}^{φ} = φ (X) W φ^{T} (X), \\ S_{t}^{φ} = φ (X) φ^{T} (X), \\ J^{φ} (U) = \frac{1}{2} \sum_{i, j} {(U^{T} φ (x_{i}) - U^{T} φ (x_{j}))}^{2} W_{i j} \\ = U^{T} φ (X) L φ^{T} (X) U, \end{matrix}

(16)

where φ(X) = [φ(x₁), φ(x₂), …, φ(x_n)] and the definition of L is similar to the definition in Section 3.

Then, according to (13), the optimal objective function of MAKSDA in the feature space F can be defined as follows:

\begin{matrix} Q^{φ} (U) = \arg \max_{U} (U^{T} S_{b}^{φ} U - (U^{T} S_{t}^{φ} U + β J^{φ} (U))) \\ = \arg \max_{U} (U^{T} φ (X) W φ^{T} (X) U - (U^{T} φ (X) φ^{T} (X) U + β U^{T} φ (X) L φ^{T} (X) U)) \\ = \arg \max_{U} (U^{T} (φ (X) W φ^{T} (X) - φ (X) φ^{T} (X) - β φ (X) L φ^{T} (X)) U) \end{matrix}

(17)

with the constraint

∥ U ∥ = 1 .

(18)

To solve the above optimal problem, we introduce the following Lagrangian multiplier method:

L^{φ} (U, λ) = Q^{φ} (U) - λ (∥ U ∥ - 1)

(19)

with the multiplier λ.

Let $\partial L^{φ} (U, λ) / \partial U = 0$ ; we can obtain

(φ (X) W φ^{T} (X) - φ (X) φ^{T} (X) - β φ (X) L φ^{T} (X)) U = λ U .

(20)

Since any solution U ∊ F must be the linear combinations of φ(x_i), there exist coefficients α_i (i = 1, 2, …, n) such that

U = \sum_{i = 1}^{n} α_{i} φ (x_{i}) = φ (X) α = φ α,

(21)

where φ denotes the data matrix in F; that is, φ = φ(X) = [φ(x₁), φ(x₂), …, φ(x_n)], and $α = {[α_{1}, α_{2}, \dots, α_{n}]}^{T}$ .

Substituting (21) into (20) and following some algebraic transformation, we can obtain

(φ (X) W φ^{T} (X) - φ (X) φ^{T} (X) - β φ (X) L φ^{T} (X)) φ (X) α = λ φ (X) α,

(22)

(φ^{T} (X) φ (X) W φ^{T} (X) φ (X) - φ^{T} (X) φ (X) φ^{T} (X) φ (X) - β φ^{T} (X) φ (X) L φ^{T} (X) φ (X)) α = λ φ^{T} (X) φ (X) α,

(23)

(K W K - K K - β K L K) α = λ K α,

(24)

where K denotes the kernel matrix K = φ^T(X)φ(X) and its element is K_ij = K(x_i, x_j).

Thus, the MAKSDA problem can be transformed into finding the leading eigenvectors of (KWK – KK – βKLK). Since no matrix inverse operation needs to be computed, MAKSDA successfully avoids the singular problem. Meanwhile, each eigenvector α gives a projective function U in the feature space. For a new testing data sample x, its low-dimensional embedding can be computed according to $〈 U, φ (x) 〉 = α^{T} K (\cdot, x)$ , where kernel matrix $K (\cdot, x) = {[K (x_{1}, x), K (x_{2}, x), \dots, K (x_{n}, x)]}^{T}$ .

From the above derivation procedure, we can observe that the kernel function K plays an important role in the MAKSDA algorithm. The traditional kernel-based methods commonly adopt data-independent kernels, such as Gaussian kernel, polynomial kernel, and Sigmoid kernel. However, the nonlinear structure captured by those data-independent kernels may not be consistent with the discriminative information and the intrinsic manifold structure [33]. To address this issue, in the following, we discuss how to design the discriminative manifold adaptive kernel function of MAKSDA, which fully takes account of the discriminative information and the intrinsic manifold structure, thus leading to much better performance.

Let V be a linear space with a positive semidefinite inner product (quadratic form) and let E: H → V be a bounded linear operator. In addition, we define $\tilde{H}$ to be the space of function from H with the modified inner product:

{〈 f, g 〉}_{\tilde{H}} = {〈 f, g 〉}_{H} + {〈 E f, E g 〉}_{V} .

(25)

Sindhwani et al. have proved that $\tilde{H}$ is still a RKHS [33].

Given the data examples x₁, x₂, …, x_n, let E: H → ℝⁿ be the evaluation map

E (f) = (f (x_{1}), \dots, f (x_{n})) .

(26)

Denote $f = (f (x_{1}), \dots, f (x_{n})) ∊ V$ and $g = (g (x_{1}), \dots, g (x_{n})) ∊ V$ ; thus we can obtain

\begin{matrix} {〈 E f, E g 〉}_{V} = 〈 f, g 〉 = f^{T} M g, \\ {∥ E f ∥}_{V}^{2} = f^{T} M f, \end{matrix}

(27)

where M is a positive semidefinite matrix. Let K_x denote

K_{x} = (K (x_{1}, x), \dots, K (x_{n}, x)) .

(28)

Reference [33] has shown that the reproducing kernel $\tilde{K}$ in $\tilde{H}$ is

\tilde{K} (x, z) = K (x, z) - K_{x}^{T} {(I + M K)}^{- 1} M K_{z},

(29)

where K denotes the kernel matrix in H and I is an identity matrix. The key issue now is the choice of M, so that the deformation of the kernel induced by the data-dependent norm is motivated with respect to the discriminative information and the intrinsic manifold structure of gait images.

In order to model the discriminative manifold structure, we construct two nearest neighbor graphs, that is, an intrinsic graph G_c and a penalty graph G_p. For each data sample x_i, the intrinsic graph G_c is constructed by finding its k₁ nearest neighbors from data samples that have the same class label with x_i, and putting an edge between x_i and its neighbors. The weight matrix W^c on the intrinsic graph G_c is defined as follows:

W_{i j}^{c} = {\begin{cases} 1, & if x_{i} ∊ N_{k_{1}} (x_{j}) or x_{j} ∊ N_{k_{1}} (x_{i}); \\ 0, & otherwise, \end{cases}

(30)

where N_{k
₁}(x_i) denotes the data sample set of the k₁ nearest neighbors of x_i that are in the same class.

Similarly, for each data sample x_i, the penalty graph G_p is constructed by finding its k₂ nearest neighbors from data samples that have class labels different from that of x_i and putting an edge between x_i and its neighbors from different classes. The weight matrix W^p on the penalty graph G_p is defined as follows:

W_{i j}^{p} = {\begin{cases} 1, & if (x_{i}, x_{j}) ∊ P_{k_{2}} (c_{i}) or (x_{i}, x_{j}) ∊ P_{k_{2}} (c_{j}); \\ 0, & otherwise, \end{cases}

(31)

where P_{k
₂}(c) denotes a set of data pairs that are the k₂ nearest pairs among the data pair set ${(x_{i}, x_{j}) ∣ c_{i} = c, c_{j} \neq c}$ .

To encode the discriminative information, we maximize margins between different classes. The between-class separability is modeled by the graph Laplacian defined on the penalty graph:

f^{T} L^{p} f = \frac{1}{2} \sum_{i, j} {(f (x_{i}) - f (x_{j}))}^{2} W_{i j}^{p},

(32)

where L^p = D^p – W^p is the Laplacian matrix of the penalty graph and the ith element of the diagonal matrix D^p is $D_{i i}^{p} = \sum_{j} W_{i j}^{p}$ .

To encode the intrinsic manifold structure, the graph Laplacian provides the following smoothness penalty on the intrinsic graph:

f^{T} L^{c} f = \frac{1}{2} \sum_{i, j} {(f (x_{i}) - f (x_{j}))}^{2} W_{i j}^{c},

(33)

where L^c = D^c – W^c is the Laplacian matrix of the intrinsic graph G_c and the ith element of the diagonal matrix D^c is $D_{i i}^{c} = \sum_{j} W_{i j}^{c}$ .

We minimize (33) to retain the intrinsic manifold structure information and maximize (32) to make the data samples in different classes separable. Thus, by combining discriminative information and the intrinsic manifold structure information together, we can set M in (29) as

M = {({(L^{p})}^{- 1 / 2})}^{T} L^{c} {(L^{p})}^{- 1 / 2} .

(34)

Then, by substituting (34) into (29), we eventually get the following discriminative manifold adaptive kernel function:

\tilde{K} (x, z) = K (x, z) - K_{x}^{T} {(I + {({(L^{p})}^{- 1 / 2})}^{T} L^{c} {(L^{p})}^{- 1 / 2} K)}^{- 1} \times {({(L^{p})}^{- 1 / 2})}^{T} L^{c} {(L^{p})}^{- 1 / 2} K_{z} .

(35)

As can be seen, the main idea of constructing discriminative manifold adaptive kernel is to incorporate the discriminative information and the intrinsic manifold structure information into the kernel deformation procedure simultaneously. Thus, the resulting new kernel can take advantage of information from labeled and unlabeled data. When an input initial kernel is deformed according to (35), the resulting manifold-adaptive kernel function may be able to achieve much better performance than the original input kernel function. In this paper, we simply use the Gaussian kernel as the input initial kernel.

In summary, by combining the above discussions, we can outline the proposed manifold-adaptive kernel SDA (MAKSDA) algorithm as follows.

Calculate the initial kernel matrix K: K_ij = K(x_i, x_j) in the original data space. Construct an intrinsic graph G_c with the weight matrix defined in (30) and calculate the graph Laplacian L^c = D^c – W^c. Construct a penalty graph G_p with the weight matrix defined in (31) and calculate the graph Laplacian L^p = D^p – W^p. Calculate M and the discriminative manifold-adaptive kernel function $\tilde{K}$ in terms of (34) and (35), respectively.

Replace K in (24) with $\tilde{K}$ defined in (35) and obtain the following generalized eigenproblem:

(\tilde{K} W \tilde{K} - \tilde{K} \tilde{K} - β \tilde{K} L \tilde{K}) α = λ \tilde{K} α .

(36)

Compute the eigenvectors and eigenvalues for the generalized eigenproblem (36). Let the column vector $α = [α_{1}, α_{2}, \dots, α_{d}]$ be solutions of (36) ordered according to their eigenvalues λ₁ > λ₂ > · > λ_d. Thus, the MAKSDA embedding can be computed as follows:

\begin{matrix} 〈 U, φ (x) 〉 = \sum_{i = 1}^{n} α_{i} 〈 φ (x_{i}), φ (x) 〉 \\ = \sum_{i = 1}^{n} α_{i} \tilde{K} (x_{i}, x) \\ = α^{T} \tilde{K} (\cdot, x), \end{matrix}

(37)

where the discriminative manifold adaptive kernel $\tilde{K} (\cdot, x) = {[\tilde{K} (x_{1}, x), \tilde{K} (x_{2}, x), \dots, \tilde{K} (x_{n}, x)]}^{T}$ .

Now, we obtain the low-dimensional representations of the original gait images with (37). In the reduced feature space, those images belonging to the same class are close to each other, while those images belonging to different classes are far apart. Thus, the traditional classifier algorithm can be applied to classify different gait images. In this paper, we apply the nearest neighbor classifier for its simplicity, and the Euclidean metric is used as the distance measure.

The time complexity analysis of MAKSDA is outlined as follows. Computing the input initial kernel matrix needs O(n²). Constructing the intrinsic graph and the penalty graph needs O(k₁n²) and O(k₂n²), respectively. Computing the discriminative manifold adaptive kernel matrix and the generalized eigenproblem needs O(n³). Projecting the original image into the lower-dimensional feature space needs O(dn²). Thus, the total computational complexity of MAKSDA is O(n³), which is the same as the traditional kernel SDA algorithm in the kernel space.

5. Experimental Results

In this section, we report experimental results on the well-known USF HumanID gait database [1] to investigate the performance of our proposed MAKSDA algorithm for gait recognition.

The system performance is compared with the kernel PCA (KPCA) [34], kernel LDA (KLDA) [35], kernel LPP (KLPP) [36], kernel MFA (KMFA) [8], and kernel SDA (KSDA) [25], five of the most popular nonlinear methods in gait recognition. We adopt the commonly used Gaussian kernel as kernel function of these four algorithms. In the following experiments, the Gaussian kernel with parameters $δ = 2^{(n - 10) / 2.5} δ_{0}$ , n = 0, 1, …, 20 is used, where δ₀ is the standard deviation of the data set. We report the best result of each algorithm from among the 21 experiments. There are two important parameters in our proposed MAKSDA algorithm, that is, the number of nearest neighbor k₁ in the intrinsic graph and the number of nearest neighbor k₂ in the penalty graph. We empirically set them to 6 and 15, respectively. In the following section, we will discuss the effect on the recognition performance with different values of k₁ and k₂. In addition, since the original SDA algorithm is robust to the regularization parameter β, we simply set β = 1 in KSDA and MAKSDA for fair comparison.

We carried out all of our experiments upon the USF HumanID gait database, which consists of 1870 sequences from 122 subjects (people). As suggested in [1], the whole sequence is partitioned into several subsequences according to the gait period length N_gait, which is provided by Sarkar et al. [1]. Then, the binary images within one gait cycle are averaged to acquire several gray-level average silhouette images as follows:

A T_{i} = \frac{1}{N_{gait}} \sum_{k = (i - 1) N_{gait} + 1}^{k = i N_{gait}} T (k), i = 1, \dots, ⌊ \frac{F}{N_{gait}} ⌋,

(38)

where ${T (1), \dots, T (F)}$ represent the binary images for one sequence with F frames and $⌊ F / N_{gait} ⌋$ denotes the largest integer less than or equal to $F / N_{gait}$ . Some original binary images and the average silhouettes of two different peoples are shown in Figure 3, where the first seven images and the last image in each row denote the binary silhouette images and the average silhouette image, respectively. As can be seen, different individuals have different average silhouette images.

Figure 3:

Some original binary images and the average silhouette images of two different peoples in the USF HumanID gait database.

In this paper, to perform gait recognition, the averaged gait image is decomposed by Gabor filters introduced in Section 2. We combine the decomposed images to give a new Gabor feature representation defined in (5), which is suitable for gait recognition. Our use of the Gabor-based feature representation for the averaged gait-image-based recognition is based on the following considerations: (1) Gabor functions provide a favorable tradeoff between spatial resolution and frequency resolution, which can be implemented by controlling the scale and orientation parameters; (2) it is supposed that Gabor kernels are similar to the 2D-receptive field profiles of the mammalian cortical simple cells; and (3) Gabor-function-based representations have been successfully employed in many machine vision applications, such as face recognition, scene classification, and object recognition.

In short, the gait recognition algorithm has three steps. First, we calculate the Gabor feature representation of the averaged gait image. Then, the Gabor feature representations are projected into lower-dimensional feature space via our proposed MAKSDA algorithm. Finally, the nearest neighbor classifier is adopted to identify different gait images. As suggested in [1 –3], the distance measure between the gallery sequence and the probe sequence adopts the following median operator, since it is more robust to noise than the traditional minimum operator. Consider

Dist (L S_{P}, L S_{G}) = Media n_{i = 1}^{N_{p}} (Mi n_{j = 1}^{N_{g}} {∥ L S_{P} (i) - L S_{G} (j) ∥}^{2}),

(39)

where LS_P(i), i = 1, …, N_p, and LS_G(j), j = 1, …, N_g, are the lower-dimensional feature representations from one probe sequence and one gallery sequence, respectively. N_p and N_g denote the total number of average silhouette images in the probe sequence and one gallery sequence, respectively.

Three metrics (the Rank-1, Rank-5, and Average recognition accuracies) are used to measure the recognition performance. Rank-1 means that the correct subject is ranked as the top candidate, Rank-5 means that the correct subject is ranked among the top five candidates, and Average denotes the recognition accuracy among all the probe sets, that is, the ratio of correctly recognized persons to the total number of persons in all the probe sets. Tables 1 and 2 show the best results obtained by KPCA, KLDA, KLPP, KMFA, KSDA, and MAKSDA. From the experimental results, we can make the following observations.

Our proposed MAKSDA algorithm consistently outperforms KPCA, KLDA, KLPP, KMFA, and KSDA algorithms, which implies that extracting the discriminative feature by using both labeled and unlabeled data and explicitly considering the discriminative manifold adaptive kernel function can achieve the best gait recognition performance.

KPCA obtains the worst performance on the USF HumanID gait database even though it is a kernel-based method. The possible reason is that it is unsupervised and only adopts the data-independent kernel function, which are not necessarily useful for discriminating gait images with different persons.

The average performances of KLDA and KLPP are almost similar. For some probe sets, KLPP outperforms KLDA, while KLDA is better than KLPP for other probe sets. This indicates that it is hard to evaluate whether manifold structure or the class label information is more important, which is consistent with existing studies [37, 38].

KMFA is superior to KLDA and KLPP, which demonstrates that KMFA can effectively utilize local manifold structure as well as the class label information for gait recognition.

The semisupervised kernel algorithms (i.e., KSDA and MAKSDA) consistently outperform the pure unsupervised kernel algorithms (i.e., KPCA and KLPP) and the pure supervised kernel algorithms (i.e., KLDA and KMFA). This observation demonstrates that the semisupervised learning can effectively utilize both labeled and unlabeled data to improve gait recognition performance.

Although KSDA and MAKSDA are all the nonlinear extensions of SDA via kernel trick, MAKSDA performs better than SDA. The main reason could be attributed to the following fact. First, MAKSDA avoids the numerical computation problem without computing matrix inverse. Second, KSDA adopts the commonly used data-independent kernel functions; thus the nonlinear structure captured by these kernel functions may not be consistent with the intrinsic manifold structure of gait image. For MAKSDA, it adopts the discriminative manifold-adaptive kernel function; thus the nonlinear structure captured by these data-adaptive functions is consistent with the discriminative information revealed by labeled data as well as the intrinsic manifold structure information revealed by unlabeled data.

MAKSDA obtains the best recognition performance on all the experiments, which implies that both discriminant and geometrical information contained in the kernel function are important for gait recognition.

Table 1:

Performance comparison in terms of Rank-1 recognition accuracies (%).

Probe	A	B	C	D	E	F	G	H	I	J	K	L	Average

KPCA	85	87	73	25	28	16	18	58	62	52	7	9	43.3
KLDA	88	90	79	34	31	20	24	65	70	64	15	17	49.8
KLPP	89	91	78	36	34	19	22	66	73	61	14	19	50.2
KMFA	90	93	80	41	45	23	29	81	79	65	23	22	55.9
KSDA	92	96	84	43	47	26	31	82	83	67	24	24	58.3
MAKSDA	96	98	85	47	49	28	35	86	86	70	29	28	61.4

Table 2:

Performance comparison in terms of Rank-5 recognition accuracies (%).

Probe	A	B	C	D	E	F	G	H	I	J	K	L	Average

KPCA	89	94	82	69	57	44	40	87	81	63	19	16	61.8
KLDA	94	96	90	71	62	47	45	91	83	82	21	18	66.7
KLPP	95	97	89	74	66	45	44	93	85	81	19	20	67.3
KMFA	96	97	92	75	70	51	53	94	89	82	32	33	72.0
KSDA	97	98	93	77	71	53	57	96	91	85	34	36	74.0
MAKSDA	99	99	95	83	79	56	61	98	97	87	35	38	77.3

We also conduct an in-depth investigation of the performance of Gabor-based feature with respect to different parameters, such as the number of scales and orientations of Gabor features. In this study, we adopt the default settings; that is, we have 40 Gabor kernel functions from five scales $ν ∊ {0,1, 2,3, 4}$ and eight orientations μ ∊ {0, 1, 2, 3, 4, 5, 6, 7}. By using MAKSDA algorithm, we test the performance of Gabor-based feature under different parameters, in which we still adopt the nearest neighbor classifier with the distance defined in (39) for fair comparison. The average Rank-1 and Rank-5 recognition accuracies of Gabor-based feature with different parameters are shown in Table 3. As can be seen, the recognition accuracies using the default parameter setting (i.e., five scales and eight orientations) are unanimously better than using other parameter settings, which is consistent with recent studies from several other research groups [13].

Table 3:

Average Rank-1 and Rank-5 recognition accuracies (%) under different Gabor parameters.

Parameter settings	Average Rank-1	Average Rank-5

Five scales and eight orientations	61.4	77.3
Five scales and four orientations	56.2	71.8
Three scales and eight orientations	57.8	73.4
Three scales and four orientations	56.5	71.9

The construction of the kernel function is one of the key points in our proposed MAKSDA algorithm. Our proposed MAKSDA algorithm adopts the discriminative manifold adaptive kernel function to capture the discriminative information and the intrinsic manifold structure information. Of course, we can also use other kinds of traditional kernel functions, such as Gaussian kernel, polynomial kernel, and Sigmoid kernel. To illustrate the superiority of our proposed discriminative manifold adaptive kernel function, we test the average Rank-1 and Rank-5 recognition accuracies under different kernel functions. The experimental results are shown in Table 4. As can be seen, our proposed discriminative manifold adaptive kernel function achieves the best performance, while the rest kernel functions have comparative performance. The superiority of discriminative manifold adaptive kernel is due to the fact that it is the data-dependent kernel, and the nonlinear structure captured by it may be consistent with the intrinsic manifold structure of gait image, which has been shown very useful for improving the learning performance by many previous studies. However, Gaussian kernel, polynomial kernel, and Sigmoid kernel are all data-independent common kernels, which might not be optimal in discriminating gait images with different semantics. This also demonstrates that simultaneously considering the local manifold structure and discriminative information is essential in designing kernel SDA algorithms for gait recognition.

Table 4:

Average Rank-1 and Rank-5 recognition accuracies (%) under different kernel functions.

Kernel functions	Average Rank-1	Average Rank-5

Discriminative manifold adaptive kernel	61.4	77.3
Gaussian kernel	57.5	70.4
Polynomial kernel	55.3	68.7
Sigmoid kernel	55.8	68.6

In addition, our proposed MAKSDA algorithm has two essential parameters: the number of nearest neighbor k₁ in the intrinsic graph and the number of nearest neighbor k₂ in the penalty graph. We empirically set k₁ to 6 and k₂ to 15 in the previous experiments. In this section, we investigate the influence of different choices of k₁ and k₂. We vary k₁ while fixing k₂ and vary k₂ while fixing k₁. Figures 4 and 5 show the performance of MAKSDA as a function of the parameters k₁ and k₂, respectively. As can be seen, the performance of MAKSDA is very stable with respect to the two parameters. It achieves much better performance than other kernel algorithms when k₁ varies from 4 to 8 and k₂ varies from 10 to 20. Therefore, the selection of parameters is not a very crucial problem in our proposed MAKSDA algorithm.

Figure 4:

The performance of MAKSDA versus parameter k₁: (a) Average Rank-1 recognition accuracy versus k₁; (b) Average Rank-5 recognition accuracy versus k₁.

Figure 5:

The performance of MAKSDA versus parameter k₂: (a) Average Rank-1 recognition accuracy versus k₂; (b) Average Rank-5 recognition accuracy versus k₂.

6. Conclusions

We have introduced a novel manifold adaptive kernel semisupervised discriminant analysis (MAKSDA) algorithm for gait recognition. It can make use of both labeled and unlabeled gait image to learn a low-dimensional feature space for gait recognition. Unlike the traditional kernel-based SDA algorithm, MAKSDA not only avoids the singular problem by not computing the matrix inverse, but also can explore the data-dependent nonlinear structure of the gait image by using the discriminative manifold adaptive kernel function. Experimental results on the widely used gait database are presented to show the efficacy of the proposed approach.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Footnotes

Acknowledgments

This work is supported by NSFC (Grant no. 70701013), the National Science Foundation for Post-Doctoral Scientists of China (Grant no. 2011M500035), and the Specialized Research Fund for the Doctoral Program of Higher Education of China (Grant no. 20110023110002).

References

Sarkar

Phillips

P. J.

Liu

Vega

I. R.

Grother

, and Bowyer

K. W.

, “The humanID gait challenge problem: data sets, performance, and analysis,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 2, pp. 162–177, 2005.

Han

and Bhanu

, “Individual recognition using gait energy image,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, no. 2, pp. 316–322, 2006.

Tao

, and Maybank

S. J.

, “General tensor discriminant analysis and Gabor features for gait recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 29, no. 10, pp. 1700–1715, 2007.

Wang

Tan

Ning

, and Hu

, “Silhouette analysis-based gait recognition for human identification,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 25, no. 12, pp. 1505–1518, 2003.

Liu

and Sarkar

, “Improved gait recognition by gait dynamics normalization,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, no. 6, pp. 863–876, 2006.

Duda

R. O.

Hart

P. E.

, and Stork

D. G.

, Pattern Classification, Wiley-Interscience, Hoboken, NJ, USA, 2nd edition, 2000.

Yan

Niyogi

, and Zhang

H.-J.

, “Face recognition using Laplacianfaces,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 3, pp. 328–340, 2005.

Yan

Zhang

H.-J.

Yang

, and Lin

, “Graph embedding and extensions: a general framework for dimensionality reduction,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 29, no. 1, pp. 40–51, 2007.

Tao

, and Maybank

S. J.

, “General averaged divergence analysis,” in Proceedings of the 7th IEEE International Conference on Data Mining (ICDM '07), pp. 302–311, October 2007.

10.

Tao

, and Maybank

S. J.

, “Geometric mean for subspace selection,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 2, pp. 260–274, 2009.

11.

Zhou

and Tao

, “GoDec: randomized low-rank & sparse matrix decomposition in noisy case,” in Proceedings of the 28th International Conference on Machine Learning (ICML '11), pp. 33–40, usa, July 2011.

12.

Cai

Han

, and Zhang

H.-J.

, “Orthogonal laplacianfaces for face recognition,” IEEE Transactions on Image Processing, vol. 15, no. 11, pp. 3608–3614, 2006.

13.

Huang

Zeng

, and Xu

, “Human gait recognition using patch distribution feature and locality-constrained group sparse representation,” IEEE Transactions on Image Processing, vol. 21, no. 1, pp. 316–326, 2012.

14.

Lin

Yan

, and Xu

, “Discriminant locally linear embedding with high-order tensor data,” IEEE Transactions on Systems, Man, and Cybernetics B, vol. 38, no. 2, pp. 342–352, 2008.

15.

Zhang

Tao

, and Yang

, “Patch alignment for dimensionality reduction,” IEEE Transactions on Knowledge and Data Engineering, vol. 21, no. 9, pp. 1299–1313, 2009.

16.

Tao

and Jin

, “Discriminative information preservation for face recognition,” Neurocomputing, vol. 91, pp. 11–20, 2012.

17.

Tao

Maybank

, and Wu

, “Supervised tensor learning,” in Proceedings of the 5th IEEE International Conference on Data Mining (ICDM '05), pp. 450–457, November 2005.

18.

Tao

, and Maybank

S. J.

, “Supervised tensor learning,” Knowledge and Information Systems, vol. 13, no. 1, pp. 1–42, 2007.

19.

Zhang

and Tao

, “Slow feature analysis for human action recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 34, no. 3, pp. 436–450, 2012.

20.

Zhu

, “Semi-supervised learning literature survey,” Tech. Rep. 1530, Computer Science Department, University of Wisconsin, Madison, Wis, USA, 2008.

21.

Vapnik

V. N.

Chapelle

, and Weston

, “Transductive inference for estimating values of functions,” in Advances in Neural Information Processing, pp. 421–427, 1999.

22.

Blum

and Mitchell

, “Combining labeled and unlabeled data with co-training,” in Proceedings of the 11th Annual Conference on Computational Learning Theory (COLT '98), pp. 92–100, July 1998.

23.

Belkin

Niyogi

, and Sindhwani

, “Manifold regularization: a geometric framework for learning from labeled and unlabeled examples,” Journal of Machine Learning Research, vol. 7, pp. 2399–2434, 2006.

24.

Tao

Jin

Liu

, and Li

, “Hessian regularized support vector machines for mobile image annotation on the cloud,” IEEE Transactions on Multimedia, vol. 15, no. 4, pp. 833–844, 2013.

25.

Cai

, and Han

, “Semi-supervised discriminant analysis,” in Proceedings of the 11th IEEE International Conference on Computer Vision (ICCV '07), pp. 1–7, Rio de Janeiro, Brazil, October 2007.

26.

Vapnik

V. N.

, The Nature of Statistical Learning Theory, Springer, New York, NY, USA, 1995.

27.

Lee

T. S.

, “Image representation using 2d gabor wavelets,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 18, no. 10, pp. 959–971, 1996.

28.

Liu

and Wechsler

, “Gabor feature based classification using the enhanced Fisher linear discriminant model for face recognition,” IEEE Transactions on Image Processing, vol. 11, no. 4, pp. 467–476, 2002.

29.

Liu

, “Gabor-based kernel PCA with fractional power polynomial models for face recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 26, no. 5, pp. 572–581, 2004.

30.

Jiang

, and Zhang

, “Efficient and robust feature extraction by maximum margin criterion,” IEEE Transactions on Neural Networks, vol. 17, no. 1, pp. 157–165, 2006.

31.

Zhang

Tao

, and Yang

, “Discriminative locality alignment,” in Computer Vision—ECCV 2008, vol. 5302 of Lecture Notes in Computer Science, pp. 725–738, 2008.

32.

Zhang

Lin

, and Tang

, “Learning semi-Riemannian metrics for semisupervised feature extraction,” IEEE Transactions on Knowledge and Data Engineering, vol. 23, no. 4, pp. 600–611, 2011.

33.

Sindhwani

Niyogi

, and Belkin

, “Beyond the point cloud: from transductive to semi-supervised learning,” in Proceedings of the 22nd International Conference on Machine Learning, pp. 824–831, August 2005.

34.

Schölkopf

Smola

, and Müller

K.-R.

, “Nonlinear component analysis as a kernel eigenvalue problem,” Neural Computation, vol. 10, no. 5, pp. 1299–1319, 1998.

35.

Baudat

and Anouar

, “Generalized discriminant analysis using a kernel approach,” Neural Computation, vol. 12, no. 10, pp. 2385–2404, 2000.

36.

and Niyogi

, “Locality preserving projections,” in Proceedings of the International Conference on Advances in Neural Information Processing Systems (NIPS '03), pp. 585–591, 2003.

37.

Yan

Tao

Lin

, and Zhang

H.-J.

, “Marginal fisher analysis and its variants for human gait recognition and content-based image retrieval,” IEEE Transactions on Image Processing, vol. 16, no. 11, pp. 2811–2821, 2007.

38.

Sugiyama

, “Dimensionality reduction of multimodal labeled data by local fisher discriminant analysis,” Journal of Machine Learning Research, vol. 8, pp. 1027–1061, 2007.

Manifold Adaptive Kernel Semisupervised Discriminant Analysis for Gait Recognition

Abstract

1. Introduction

2. Gabor Feature Representation of Gait Image

3. Brief Review of SDA

4. Manifold Adaptive Kernel SDA (MAKSDA) Algorithm

5. Experimental Results

6. Conclusions

Conflict of Interests

Footnotes

Acknowledgments

References