Sage Journals: Discover world-class research

Abstract

In the field of pattern recognition, using the symmetric positive-definite matrices to represent image set has been widely studied, and sparse representation-based classification algorithm on the symmetric positive-definite matrix manifold has attracted great attention in recent years. However, the existing kernel representation-based classification methods usually use kernel trick with implicit kernel to rewrite the optimization function and will have some problems. To address the problem, a neighborhood preserving explicit kernel representation-based classification-based Nyström method is proposed on symmetric positive-definite manifold by embedding the symmetric positive-definite matrices into a Reproducing Kernel Hilbert Space with an explicit kernel based on Nyström method. Thus, we can take full advantage of kernel space characteristics. Through the experimental results, we demonstrate the better performance of our method in the task of image set classification.

Keywords

Sparse representation-based classification symmetric positive-definite matrix Nyström method Riemannian manifold image set classification

Introduction

In the field of pattern recognition, the classification method based on image set has received a lot of attention.^1–5 Each image set contains a number of images and may offer more discriminative and robust information. Compared to single-shot image-based classification, image set classification can better handle multi-angle cameras or intra-class divergence task.^6,7 The commonly used image-set representation methods model image sets as covariance matrix,^8–10 linear subspaces,^11,12 and Gaussian mixture model,¹³ among which, covariance matrix is obtained based on second-order statistics of image features which is a point lying on a Riemannian manifold spanned by symmetric positive definite (SPD) matrices and has been widely applied in action recognition,¹⁴ pedestrian detection,¹⁵ face recognition,¹⁶ texture classification,² etc.

Sparse representation (SR) has a wide range of applications in the field of digital image processing and pattern recognition because of its robustness,^17,18 which uses a linear combination of atoms in the dictionary to reconstruct the data, meanwhile keeping the reconstruction error as small as possible. In order to make the dictionary more discriminating, Yang et al.¹⁸ proposed the FDDL method to obtain a dictionary that maintains the local discriminant information of the training data. Combined with the Locality Preserving Projection (LPP), Qiao et al.¹⁹ proposed sparsity preserving projections method. In recent years, the kernel method has been widely applied which attempts to implicitly map training data into a high-dimensional Reproducing Kernel Hilbert Space (RKHS) by using the nonlinear mapping associated with a kernel function and can get an SR solution in high dimensional space.^20,21

Most of the work mentioned above is based on vector-valued data. However, in many visual applications, visual data points actually lie in some Riemannian manifolds form such as the space of symmetric positive-definite (SPD) matrices. Because of the nonlinear Riemannian manifold of SPD matrices, the application of the SR of Euclidean space directly to the SPD matrix is applicable. One approach is to use Riemannian metric to calculate the reconstruction error; Sivalingam et al.¹⁶ proposed a tensor SR method, the logdet divergence is used to measure the reconstruction error, and Sra and Cherian²² proposed to use Frobenius norm as the error metric. An alternative approach is to embed the manifold data into tangent space with logarithmic mapping and then can make use of the existing SR methods. To exploit the Log-Euclidean metric of SPD manifold, Zhang et al.²³ obtained the vectorized form of original covariance matrices for SR.

In order to make use of the manifold structure of the SPD matrices, many people attempt to map these data into a high-dimensional RKHS by using the implicit kernel function. Harandi et al.²⁴ first used the Stein kernel to solve the Riemannian SR problem in kernel space. Subsequently, Li et al.²⁵ and Harandi et al.²⁶ also presented kernel SR by Log-Euclidean kernels and Jeffrey kernel. In order to strengthen the discriminability of the kernel space dictionary, similar to the Euclidean space, Li et al.²⁷ proposed semantic and neighborhood preserving dictionary learning. All of the above methods update the Riemannian dictionaries in Riemannian space.

The above kernel SR methods use implicit kernel functions to implement SR and dictionary update through kernel tricks. The data in kernel space have no explicit representation, which brings some inconvenience to the dictionary update, and may lead to the singular problem of the dictionary atom. Inspired by Nyström method,^28,29 we propose an SR with explicit kernel function based on Nyström method. An approximate vector representation of the sample in the kernel space can be obtained by Nyström method, and then we can update sparse coefficients and dictionaries in kernel space. In order to maintain the local discriminant information of the original data, we also add the neighborhood preserving constraint on the sparse coefficients. Furthermore, to reduce information redundancy and improve operational efficiency, we use the (2D)² principal component analysis (PCA) method to reduce the dimensionality of the SPD matrix. Different with Li et al.,²⁷ our method use explicit kernel and Riemannian neighborhood graph.

This paper extends a neighborhood preserving explicit kernel SRC-based Nyström method on SPD manifold (NYSKSR) with application to image set classification. The contributions of our proposed method are threefold:

Using an explicit kernel mapping framework for kernel SR.

Updating sparse coefficients and dictionaries in kernel space simultaneously.

We apply our proposed method to several image set classification tasks where the data are depicted as covariance matrices.

The rest of this paper is organized as follows. “Related work” section is the review of the related works. In “Sparse representation based on Nyström method” section, we introduce our proposed method. We present experimental results in “Experiments and analysis” section and draw conclusions and future work in the final section.

Related work

Before we introduce our method, in this section, we briefly review the SPD manifold characteristic, the Nyström method, the ${(2 D)}^{2}$ PCA method, and SR-based classification methods, respectively.

SPD manifold

Given an image set $S = {s_{1}, s_{2}, \dots, s_{n}}$ containing n image samples, s_i represents the ith sample in the image set, which is a d-dimensional vector. Image set S then can be represented by a covariance matrix of d × d dimension as

C = \frac{1}{n - 1} \sum_{i = 1}^{n} (s_{i} - \bar{s}) {(s_{i} - \bar{s})}^{T}

(1)

where

\bar{s}

in the formula represents the mean of the samples in the image set S. Since the number of samples in the image set may be smaller than the dimensionality of the sample, the covariance matrix may be singular, so a small perturbation value needs to be added

C^{*} = C + λ I

(2)

where λ was set as

10^{- 3} \times tr (C)

and I is the identity matrix, the image sets are represented as the SPD matrices, which lie on SPD manifold.

The similarity between two SPD matrices on the manifold can be described by the length of geodesic curve.⁸ For points C_i and C_j on the SPD manifold, the affine invariant Riemannian metric⁹ can be expressed as

d_{AIRM} (C_{i}, C_{j}) = {‖ log (C_{i}^{- \frac{1}{2}} C_{j} C_{i}^{- \frac{1}{2}}) ‖}_{F}

(3)

The Riemannian kernel function⁸ can be computed by the inner product of points in the tangent space based on logarithm mapping

log (C) = Ulog (Σ) U^{T}

(4)

k (C_{i}, C_{j}) = tr [\log (C_{i}) \cdot \log (C_{j})]

(5)

Equation (4) represents the logarithm mapping of the SPD matrix, where $C = U Σ U^{T}$ is the eigenvalue decomposition of the matrix C, and Log-Euclidean kernel is a symmetric kernel function.

(2D)² Principal component analysis

Compared with the traditional PCA dimensionality reduction method, ${(2 D)}^{2}$ PCA is a method for 2 D data. Unlike 2DPCA, it performs two-way dimensional reduction on 2 D data³⁰ and has a wide range of applications in face recognition.

Suppose there are m training image sets $X = {x_{1}, x_{2}, \dots, x_{m}}, x_{i} \in R^{d \times d}$ , we consider the SPD matrix of image set as the image matrix in ${(2 D)}^{2}$ PCA, then the two-dimensional covariance matrix is expressed as

C' = \frac{1}{m - 1} \sum_{i = 1}^{m} {(x_{i} - \bar{x})}^{T} (x_{i} - \bar{x})

(6)

The $\bar{x}$ in the formula represents the mean of the samples in the image set X. Since x_i is a symmetric matrix, the covariance matrices are the same which is calculated in both row and column directions, and the obtained projection matrices are the same. Then we obtain the projection matrix W by performing Eigen-decomposition on $C'$ and taking eigenvectors corresponding to the k largest eigenvalues.³⁰ The training sample, obtained by reducing the dimensionality of the projection matrix, is expressed as

y' = W^{T} x W

(7)

Nyström method

Existing kernel mappings are usually implicit mappings, which may cause some inconvenience, so the Nyström method was proposed to estimate the original kernel matrix in RKHS.²⁸ Given $T = {x_{1}, x_{2}, \dots, x_{M}}$ , a collection of M sampling training sample sets, and the rank of kernel matrix of samples $K = {[k (x_{i}, x_{j})]}_{M \times M}$ , is r; here, the kernel function is a Riemannian kernel function in equation (5). Based on low-rank approximation, we can obtain an r-dimensional approximation vector representation of any sample x in kernel space by performing SVD decomposition on the K

z (x) = Σ^{- \frac{1}{2}} V {(k (x, x_{1}), k (x, x_{2}), \dots, k (x, x_{M}))}^{T}

(8)

where

Σ = diag (λ_{1}, λ_{2}, \dots, λ_{r})

and

V = (v_{1}, v_{2}, \dots, v_{r})

are the largest r eigenvalues and the corresponding eigenvectors of K. For any two samples x_i and x_j in the training sets, there always have

z {(x_{i})}^{T} z (x_{j}) = {[k]}_{i, j}

, and the training sample sets

X = {x_{1}, x_{2}, \dots, x_{m}}

in the kernel space can be approximated as:

[z (x_{1}), z (x_{2}), \dots, z (x_{m})]

SR on SPD matrices

Based on implicit mapping function $ϕ$ , a kernel version of SR in the RKHS is proposed.^24,25 Let $X = {x_{1}, x_{2}, \dots, x_{m}}$ be the training sample sets with m SPD matrices of size d × d, $D = {d_{1}, d_{2}, \dots, d_{n}, d_{i} \in R^{d \times d}}$ be the dictionary where each column represents an atom, and $α = {α_{1}, α_{2}, \dots, α_{m}} \in R^{n \times m}$ be the sparse coefficient. The purpose of SR is to learn a dictionary and corresponding sparse coefficient, such that original sample can be well approximated by sparse coefficients and dictionaries. By implicit mapping function $ϕ$ , the general kernel SR in the RKHS [³¹] is formulated as

min_{D, α} {‖ ϕ (X) - ϕ (D) α ‖}_{F}^{2} + λ \sum_{i = 1}^{m} {‖ α_{i} ‖}_{1}

(9)

where the

{‖ \cdot ‖}_{F}

denotes the Frobenius norm of a matrix.

SR based on Nyström method

In this section, we start by introducing the SR based on Nyström method in kernel space. Then, we present the dictionary updating in kernel space. Finally, we introduce the classification via our SR. Figure 1 shows the flow chart of our method, and Algorithm 1 can summarize our method.

Figure 1.

The flow chart of our method. (a) Input: image set, each image set is represented as an SPD matrix. (b) Dimensionality reduced samples by ${(2 D)}^{2}$ PCA. (c) Approximation vector representation in kernel space by Nyström method. (d) Our NYSKSR method which is shown in equation (15).

Neighborhood preserving SR based on Nyström method

Let $X = {x_{1}, x_{2}, \dots, x_{m}}, x_{i} \in R^{d \times d}$ X is the training set composed by m image sets and each set is expressed as a SPD matrix, due to the high-dimensional data redundancy problem, it is not conducive to data processing, we use ${(2 D)}^{2} P CA$ in equation (7) to reduce it, then we get dimensionally reduced training sample sets $Y = {y_{1}, y_{2}, \dots, y_{m}}, y_{i} \in R^{k \times k} (k < d)$ . We embed training sample sets into the kernel space by Nyström method $z : R^{d} \to R^{F}$ in equation (8), and data points X are transformed to the corresponding feature space ${z (y_{1}), z (y_{2}), \dots, z (y_{m})} \in R^{r \times m}$ with r-dimensional approximation vector representation. Substituting the mapped samples to the formulation of sparse coding in equation (9), sparse coding with Nyström method can be expressed as

min_{α} {‖ z (Y) - D α ‖}_{F}^{2} + λ_{1} \sum_{i = 1}^{m} {‖ α_{i} ‖}_{1}

(10)

where

α \in R^{n \times m}

is the sparse coefficient. Since the Nyström method is an approximate vector representation of sample in kernel space, certain information of SPD matrices and the geometrical structure of the manifold may be lost, which might result that the distance between representations of nearby SPD matrices becomes dramatically large. In order to alleviate this problem, inspired by LPP,¹⁹ we add neighborhood preserving constraints on the sparse coefficients. It makes that, after the mapping, the distance between neighbors with the same label as small as possible and the distance between neighbors with different labels as large as possible. The form is modeled as

min \sum_{i = 1}^{m} \sum_{j = 1}^{m} {‖ a_{i} - a_{j} ‖}_{2}^{2} w_{ij} = min 2 \times tr (AL A^{T})

(11)

The w_ij is weight coefficient, which represents the neighborhood of y_i and y_j, $tr (\cdot)$ represents the trace of a matrix, $L = D - W$ , in which D is a diagonal matrix, whose elements are the sums of column (or row) of W. The weight matrix W of the training data on SPD manifold can be modeled by building a within-class similarity graph G_w and a between-class similarity graph G_b⁹ as

G_{w} (i, j) = {\begin{matrix} 1, if y_{i} \in N_{w} (y_{j}) or y_{j} \in N_{w} (y_{i}) \\ 0, otherwise \end{matrix}

(12)

G_{b} (i, j) = {\begin{array}{l} 1, if y_{i} \in N_{b} (y_{j}) or y_{j} \in N_{b} (y_{i}) \\ 0, otherwise \end{array}

(13)

where

N_{w} (y_{i})

is the set of n_w nearest neighbors of y_i that has the same label as y_i, and

N_{b} (y_{i})

contains the n_b nearest neighbors of y_i having different labels with y_i. The weight matrix W is defined as

W = G_{w} - G_{b}

(14)

Adding the neighborhood preserving constraint to equation (10), our SR model can be rewritten as

min_{α} {‖ z (Y) - D α ‖}_{F}^{2} + λ_{1} \sum_{i = 1}^{m} {‖ α_{i} ‖}_{1} + λ_{2} t r (AL A^{T})

(15)

where λ₁ and λ₂ are regularization parameters. Minimization of the above equation is similar in Euclidean space, based on the IPM³² framework, the optimization problem of equation (15) can be solved by iterative update. The i + 1th iteration of the sparse coefficient is given as follows

\begin{array}{l} α^{(i + 1)} = S_{τ / σ} {α^{(i)} - \frac{1}{2 σ} [2 D^{T} (D α^{(i)} - z (Y)) + λ_{2} \cdot α^{(i)} (L + L^{T})]} \\ S_{τ / σ} (x) = \max (| x - τ / σ |, 0) \end{array}

(16)

where

S_{τ / σ}

is a threshold function defined in Wu et al.,³²

τ = λ_{1} / 2

, and we use the same value σ as in Wu et al.³²

Dictionary updating

Based on the alternating direction method of multipliers, dictionary can be updated, such that the reconstruction error for each α_i is minimized. The problem of learning a dictionary $D = {d_{1}, d_{2}, \dots, d_{n}} \in R^{r \times n}$ can be formulated as

min_{D} {‖ z (Y) - D α ‖}_{F}^{2}

(17)

Since the samples are embedded into the kernel space by Nyström method, we only need to update the dictionary in the kernel space. Equation (17) is a convergence problem. Based on the dictionary update method proposed in Rosasco et al.,³³ the solution of equation (17) can be solved iteratively, and the ith result is

\begin{array}{l} u \leftarrow D^{(i - 1)} + {(A A^{T})}^{- 1} (z (Y) - D^{(i - 1)} A) A^{T} \\ D^{(i)} \leftarrow u / {‖ u ‖}_{2} \end{array}

(18)

Classification via sparse codes

Because the dictionary we have obtained is located in the kernel space, we should map the test samples into the kernel space by Nyström method. Let $X^{'} = {x_{1}^{'}, x_{2}^{'}, \dots, x_{t}^{'}}, x_{i}^{'} \in R^{d \times d}$ be the test data containing t image sets samples. After doing the same ${(2 D)}^{2}$ PCA dimensionality reduction, the samples can be represented as $Y^{'} = {y_{1}^{'}, y_{2}^{'}, \dots, y_{t}^{'}}, y_{i}^{'} \in R^{k \times k} (k < d)$ , then we map it into kernel space, and the kernel space approximate vector representation is ${z (y_{1}^{'}), z (y_{2}^{'}), \dots, z (y_{t}^{'})} \in R^{r \times t}$ .

Algorithm 1: The proposed NYSKSR algorithm

Training:

Input:

Training sample sets: $X = {x_{1}, x_{2}, \dots, x_{m}}, x_{i} \in R^{d \times d}$

Dimensionality reduction parameter $k (k < d)$ of ${(2 D)}^{2} P CA$

The number of sampling training samples M and rank r in Nyström method

The number of nearest neighbors n_w and n_b in equations (12) and (13)

The regularization parameter λ₁ and λ₂

The number of iterations iter

Output:

Dictionary $D = {d_{1}, d_{2}, \dots, d_{n}} \in R^{r \times n}$

Get the dimensionally reduced sample sets $Y = {y_{1}, y_{2}, \dots, y_{m}}, y_{i} \in R^{k \times k}$ by ${(2 D)}^{2} P CA$ in equation (7).

Compute the kernel space approximate vector representation $z (Y) = {z (y_{1}), z (y_{2}), \dots, z (y_{m})} \in R^{r \times m}$ by Nyström method in equation (8).

Initialize the dictionary $D = {d_{1}, d_{2}, \dots, d_{n}} \in R^{r \times n}$ by selecting n samples from $z (Y)$ randomly.

For $t = 1 \to iter$ do

Optimize α with fixed D, solve equation (15) by IPM method.

Optimize D with fixed α, solve equation (17) by online dictionary learning method in Rosasco et al.³³

End

Testing:

Input:

Testing sample sets: $X^{'} = {x_{1}^{'}, x_{2}^{'}, \dots, x_{t}^{'}}, x_{i}^{'} \in R^{d \times d}$

Output:

Class label

Get the dimensionality reduced sample sets $Y^{'} = {y_{1}^{'}, y_{2}^{'}, \dots, y_{t}^{'}}, y_{i}^{'} \in R^{k \times k}$ by ${(2 D)}^{2} P CA$ in equation (7).

Compute the kernel space approximate vector representation $z (Y^{'}) = {z (y_{1}^{'}), z (y_{2}^{'}), \dots, z (y_{t}^{'})} \in R^{r \times t}$ by Nyström method in equation (8).

Compute the reconstruction error of the test sample and the sample for class j by equation (19).

Get the class label by equation (20).

Before classification, we first need to get the sparse coefficients of the test samples on the dictionary D. For a test sample belonging to the ith class, the residual error between it and the SR of the ith sample will be relatively small, the reconstruction error of the test sample and the sample for class j is

Er r_{j} = {‖ z (y^{'}) - D_{(j)} α_{(j)} ‖}_{F}^{2}

(19)

where

D_{(j)}

and

α_{(j)}

represent the dictionary and sparse coefficients for class j, then we classify the test sample into a class with the smallest reconstruction error

label = min_{j} (Er r_{j}), j = 1, 2, \dots N

(20)

where N indicates that there are a total of N class labels.

Experiments and analysis

This section presents comparative experimental results of our NYSKNP_SR method against the conventional methods used on SPD manifold for the task of face recognition, object categorization, virus cell classification, and dynamic scene classification.

Datasets and settings

To evaluate the proposed method, we have performed the experiments on four tasks: ETH-80,³⁴ Virus,³⁵ MDSD,³⁶ and YTC⁸ datasets, respectively. Meanwhile, we compare our proposed method with other four conventional methods and two SR methods on SPD manifold including Covariance Discriminant Learning (CDL),⁸ Projection Metric Learning (PML),¹² Grassmann Discriminant Analysis (GDA),¹¹ Log-Euclidean Metric Learning (LEML),¹ Generalized dictionary learning and sparse coding using Frobenius norm (Frob_SR),²² and Log-Euclidean Kernels for SR (LogEK_SR).²⁵

The dataset of ETH-80 consists of eight categories: pears, tomatoes, dogs, cows, apples, cars, horses, and cups. Each category has 10 image sets, and each image set consists of 41 images. The size of each image is 256 × 256, in order to reduce the computational complexity, we resize each to 20 × 20, and each image set covariance matrix dimension is 400 × 400. Moreover, we randomly select five from each category as training samples and the rest for test. Table 1 shows the average classification accuracy of each method.

Table 1.

Average recognition rates and standard deviations of different methods on ETH-80.³⁴

Method	ETH-80³⁴
CDL⁸	93.75 ± 3.43
GDA¹¹	93.25 ± 4.80
PML¹²	90.00 ± 3.53
LEML¹	92.25 ± 3.19
Frob_SR²²	93.25 ± 4.43
LogEK_SR²⁵	94.88 ± 4.17
NYSKSR	96.00 ± 2.69

The Virus dataset contains 15 categories, each consisting of five image sets, and each set has 20 images. The size of each cell image has been resized to 20 × 20. For each class, three image sets are chosen for training randomly and the rest two image sets are used as test samples. Table 2 summarizes the average identification accuracy of each method.

Table 2.

Average recognition rates and standard deviations of different methods on virus.³⁵

Method	Virus³⁵
CDL⁸	48.33 ± 3.60
GDA¹¹	47.00 ± 2.49
PML¹²	17.33 ± 3.43
LEML¹	55.67 ± 9.94
Frob_SR²²	54.75 ± 7.83
LogEK_SR²⁵	55.67 ± 7.04
NYSKSR	62.67 ± 8.58

The MDSD dataset is composed of 13 different categories of dynamic scenes, with each class consisting of 10 videos. We resize each frame to 20 × 20. We randomly select seven videos for training and the rest for testing in each class, the classification accuracies are given in Table 3.

Table 3.

Average recognition rates and standard deviations of different methods on MDSD.³⁶

Method	MDSD³⁶
CDL⁸	30.51 ± 2.82
GDA¹¹	30.51 ± 7.78
PML¹²	29.32 ± 4.66
LEML¹	29.74 ± 3.89
Frob_SR²²	23.75 ± 5.67
LogEK_SR²⁵	27.95 ± 4.94
NYSKSR	31.79 ± 4.87

The YouTube Celebrities contains 1910 video clips of 47 subjects. Each clip consists of hundreds of frames. We resize each frame to 20 × 20. For the sample selection, we randomly choose three image sets in each subject for training and seven for testing, the mean recognition accuracies are given in Table 4.

Table 4.

Average recognition rates and standard deviations of different methods on YTC.⁸

Method	YTC⁸
CDL⁸	68.72 ± 2.96
GDA¹¹	65.78 ± 3.34
PML¹²	67.62 ± 3.32
LEML¹	69.04 ± 3.84
Frob_SR²²	63.16 ± 2.72
LogEK_SR²⁵	72.55 ± 2.77
NYSKSR	74.58 ± 3.19

According to the experimental results, we resize each image to 20 × 20 and choose appropriate dimensionality reduction parameter k on the ${(2 D)}^{2} P CA$ method, parameter M, rank r for the Nyström method, the number of nearest neighbors n_w, n_b, and regularization parameters λ₁, λ₂, which are shown in Table 5.

Table 5.

Parameter setting in the experiment.

Dataset	k	M	r	n_w	n_b	λ ₁	λ ₂
ETH-80	20	36	30	3	15	0.05	0.0001
Virus	90	36	30	3	15	0.05	0.0005
MDSD	60	90	45	15	10	0.05	0.0001
YTC	70	104	80	15	20	0.05	0.0005

Result and analysis

The experimental results in the above tables are obtained by 10-iteration experiments on each data set, we take the average of 10-iterations. For the six methods used for comparison, CDL,⁸ PML,¹² GDA,¹¹ and LEML¹ are conventional methods on SPD manifold, Frob_SR²² is SR method on SPD manifold, and LogEK_SR²⁵ is a kernel SR method on SPD manifold. From the results in the four tables, the proposed method is the best in the classification accuracy rate on the five data sets. In the ETH-80, the best accuracy from the traditional method and SR method is CDL and LogEK_SR; however, our method is higher than these recognition rates with accuracy of 96.00% and has the smallest standard deviation of 2.69. This shows that on this data set, our method not only has a good recognition rate, but also has strong robustness. On the data set of Virus, our method has significantly improved the recognition rate; the highest recognition rate is 62.67%, far above LEML and LogEK_SR. The standard deviation 8.58 is not the smallest, but is relatively small in all methods, indicating that the robustness of our method on this data set is still guaranteed. About performance on the MDSD dataset, the recognition rate of all methods is generally low. However, our method still achieves the best recognition rate of 31.79% with a small standard deviation of 4.87. For the YTC data set, in addition to LogEK_SR method, our advantage is very obvious; the recognition rate reaches 74.58%, again with a relatively small standard deviation of 3.19.

The effect of dimensionality reduction parameter k

The original image set sample has a covariance matrix dimensionality of 400 × 400. If we directly use it for training, it will cause high dimensionality and large computational load; meanwhile, the recognition rate is not substantially improved. In order to make the method with good recognition rate and running efficiency, we should reduce dimensionality. The goal of dimensionality reduction is to obtain lower dimensional data and save the main information of the data set, and the performance and recognition rate of the algorithm cannot be significantly reduced. Figure 2 shows a graph of the recognition rates of different dimensionality reduction parameters. For the four data sets, the recognition rate of the algorithm fluctuates apparently when the parameters are small. This is mainly because when the dimensionality is too low, the effective information in the image set will be lost. Meanwhile, information redundancy occurs when the dimensionality is too large leading to the drop of the recognition rate, so we choose the most appropriate dimensionality reduction parameter for each image set by cross validation method.

Figure 2.

Effects of different dimensionality reduction parameter on recognition rates.

Conclusion

In this paper, based on Nyström method, we map samples into kernel space by explicit kernel representation and propose a neighborhood preserving SR. Our method can maintain the neighbor’s information of the original data in SPD manifold. Compared with other conventional methods for SPD manifold, our method has high recognition rate. For the future work, we will consider how to extend our proposed method to the other types of manifolds and applications.

Footnotes

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

References

Huang

Wang

Shan

et al. Log-Euclidean metric learning on symmetric positive definite manifold with application to image set classification. In: International conference on machine learning, Lille, France, 2015, pp.720–729.

Chen

K-X

X-J.

Component SPD matrices: a low dimensional discriminative data descriptor for image set classification. Comp Visual Media 2018; 4: 245–252.

Chen

K-X

X-J

Wang

, et al. Riemannian kernel based Nyström method for approximate infinite-dimensional covariance descriptors with application to image set classification. In: 2018 24th International conference on pattern recognition (ICPR), 2018, pp.651–656. New York: IEEE.

Wang

Kittler

A simple Riemannian manifold network for image set classification. Ithaca, NY: Cornell University, 2018.

Wang

X-J

Chen

K-X

, et al. Multiple manifolds metric learning with application to image set classification. In: 2018 24th International conference on pattern recognition (ICPR), 2018, pp.627–632. New York: IEEE.

Faraki

Harandi

Porikli

Image set classification by symmetric positive semi-definite matrices. In: 2016 IEEE Winter conference on applications of computer vision (WACV), 2016, pp.1–8. New York: IEEE.

Chen

Jiang

Tang

, et al. Image set representation and classification with attributed covariate-relation graph model and graph sparse representation classification. Neurocomputing 2017; 226: 262–268.

Wang

Guo

Davis

, et al. Covariance discriminative learning: a natural and efficient approach to image set classification. In: IEEE conference on computer vision and pattern recognition (CVPR), 2012, pp.2496–2503. New York: IEEE.

Harandi

Salzmann

Hartley

From manifold to manifold: geometry-aware dimensionality reduction for SPD matrices. In: European conference on computer vision, 2014, pp.17–32. Berlin, Germany: Springer.

10.

Harandi

Salzmann

Hartley

Dimensionality reduction on SPD manifolds: the emergence of geometry aware methods. IEEE Trans Pattern Anal Mach Intell 2018; 40: 48–62.

11.

Hamm

Lee

DD.

Grassmann discriminant analysis: a unifying view on subspace-based learning. In: Proceedings of the 25th international conference on machine learning, 2008, pp.376–383. New York, NY: ACM.

12.

Huang

Wang

Shan

, et al. Projection metric learning on Grassmann manifold with application to video based face recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp.140–149. New York: IEEE.

13.

Wang

Huang

, et al. Discriminant analysis on Riemannian manifold of Gaussian distributions for face recognition with image sets. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp.2048–2057. New York: IEEE.

14.

Yuan

, et al. Human action recognition under log-Euclidean Riemannian metric. In: Asian conference on computer vision, 2009, pp.343–353. Berlin, Germany: Springer.

15.

Tuzel

Porikli

Meer

Pedestrian detection via classification on Riemannian manifolds. IEEE Trans Pattern Anal Mach Intell 2008; 30: 1713–1727.

16.

Sivalingam

Boley

Morellas

, et al. Tensor sparse coding for region covariances. In: European conference on computer vision, 2010, pp.722–735. Berlin, Germany: Springer.

17.

Zhang

Yang

Feng

Sparse representation or collaborative representation: which helps face recognition? In: 2011 International conference on computer vision, 2011, pp.471–478. New York: IEEE.

18.

Yang

Zhang

Feng

, et al. Fisher discrimination dictionary learning for sparse representation. In: 2011 International conference on computer vision, 2011, pp.543–550. New York: IEEE.

19.

Qiao

Chen

Tan

Sparsity preserving projections with applications to face recognition. Pattern Recogn 2010; 43: 331–341.

20.

Zhang

Zhou

W-D

Chang

P-C

, et al. Kernel sparse representation-based classifier. IEEE Trans Signal Process 2012; 60: 1684–1695.

21.

Gao

Tsang

IW-H

Chia

L-T.

Sparse representation with kernels. IEEE Trans Image Process 2013; 22: 423–434.

22.

Sra

Cherian

Generalized dictionary learning for symmetric positive definite matrices with application to nearest neighbor retrieval. In: Joint European conference on machine learning and knowledge discovery in databases, 2011, pp.318–332. Berlin, Germany: Springer.

23.

Zhang

, et al. Block covariance based l1 tracker with a subtle template dictionary. Pattern Recogn 2013; 46: 1750–1761.

24.

Harandi

Sanderson

Hartley

, et al. Sparse coding and dictionary learning for symmetric positive definite matrices: a kernel approach. In: European conference on computer vision, 2012, pp.216–229. Berlin, Germany: Springer.

25.

Wang

Zuo

, et al. Log-Euclidean kernels for sparse representation and dictionary learning. In: Proceedings of the IEEE international conference on computer vision, 2013, pp.1601–1608. New York: IEEE.

26.

Harandi MT, Hartley R, Lovell B, et al. Sparse coding on symmetric positive definite manifolds using Bregman divergences. IEEE Trans Neural Netw Learn Syst 2016; 27: 1294–1306.

27.

Li D, Chen L and Wang F. Semantic and neighborhood preserving dictionary learning for symmetric positivedefinite matrices. In: 2016 IEEE 13th international conference on signal processing (ICSP), 2016, pp.654–658. New York: IEEE.

28.

Yang T, Li Y-F, Mahdavi M, et al. Nystrom method vs random Fourier features: a theoretical and empirical comparison. In: Advances in neural information processing systems. Cambridge, MA: MIT Press, 2012, pp.476–484.

29.

Kumar S, Mohri M and Talwalkar A. Sampling methods for the Nystrom method. J Machine Learn Res 2012; 13: 981–1006.

30.

Zhang D and Zhou Z-H. (2d)2 PCA: Two-directional two dimensional PCA for efficient face representation and recognition. Neurocomputing 2005; 69: 224–231.

31.

Wu Y, Jia Y, Li P, et al. Manifold kernel sparse representation of symmetric positive-definite matrices and its applications. IEEE Trans Image Process 2015; 24: 3729–3741.

32.

Rosasco

Verri

Santoro

, et al. Iterative projection methods for structured sparsity regularization, 2009.

33.

Mairal

Bach

Ponce

, et al. Online learning for matrix factorization and sparse coding. J Machine Learn Res 2010; 11: 19–60.

34.

Leibe

Schiele

Analyzing appearance and contour based methods for object categorization. In: Proceedings of IEEE computer society conference on vision and pattern recognition, 2003. vol. 2, pp.II–409. New York: IEEE.

35.

Kylberg

Uppstrom

Sintorn

I-M.

Virus texture analysis using local binary patterns and radial density profiles. In: Iberoamerican congress on pattern recognition, 2011, pp.573–580. Berlin, Germany: Springer.

36.

Shroff

Turaga

Chellappa

Moving vistas: exploiting motion for describing scenes. In: 2010 IEEE conference computer vision and pattern recognition (CVPR), 2010, pp.1911–1918. New York: IEEE.

Neighborhood preserving sparse representation based on Nyström method for image set classification on symmetric positive definite matrices

Abstract

Keywords

Introduction

Related work

SPD manifold

(2D) 2 Principal component analysis

Nyström method

SR on SPD matrices

SR based on Nyström method

Neighborhood preserving SR based on Nyström method

Dictionary updating

Classification via sparse codes

Experiments and analysis

Datasets and settings

Result and analysis

The effect of dimensionality reduction parameter k

Conclusion

Footnotes

Declaration of Conflicting Interests

Funding

References

(2D)² Principal component analysis