Sage Journals: Discover world-class research

Abstract

Sparse representation has been widely researched for image-based classification. However, sparse representation classification directly treats training samples as a dictionary, so it needs a large training set and is time consuming, especially for a large training set. To derive a small dictionary, many dictionary learning algorithms are researched. Thus, object recognition problem is transformed to optimize the sparse representation errors on the compact dictionary. The sparse representation optimization is constraint by $l_{0}$ -norm, which is NP-hard problem. Though we can use $l_{1}$ -norm minimization instead to work effectively, it is still time consuming for optimization. To make the algorithm discriminative and simultaneously decrease the computational burden, we proposed a fast discriminative collaborative representation–based classification algorithm. The new algorithm incorporated the within-class scatter and the linear classification error terms into the objective function to derive a more discriminative dictionary and simultaneously added collaborative representation mechanism to cut off the time consuming. At the end of this article, we designed two experiments to validate our method using near-infrared and AR visible databases for multimodal face recognition. The results showed that our algorithm outperformance sparse representation–based, collaborative representation–based, and discriminative-KSVD classification algorithms.

Keywords

Object recognition and classification collaborative representation sparse representation sparse coding multimodal image

Introduction

In the past few years, sparse representation, also called sparse coding, were widely researched by computer community. The technology of sparse representation was used for image denoising^1,2 image analysis,³ image super resolution,⁴ and especially image-based classification such as face recognition,^5

–11 automatic target recognition,¹² and traffic sign recognition.¹³ Wright et al.⁶ used sparse representation for classification task and proposed a sparse representation–based classification (SRC) in face recognition task. In SRC framework, training samples of all classes are arranged as columns of a matrix usually called dictionary, and the query image is considered linearly represented by the atoms of the dictionary. Most entries of the linear representation coefficient coding vector are zeros or approximate zeros, that is, the coefficient vectors are sparse. The minimal reconstruction errors are used for classification. However, when the training data set is larger, SRC is time consuming for all training samples are regarded as a dictionary.

In order to cut down the time consuming, many researchers proposed dictionary learning (DL) algorithms^{3,5,11,14

–18} to derive a smaller dictionary. K-SVD algorithm¹⁴ is the representative one of DL algorithms. An overcomplete dictionary can be learned from a training data set by K-SVD, which works well for signal representation but not for classification tasks. To address the image classification issue, Mairal et al.³ added a discriminative reconstruction constraint and optimized both class discrimination and sparse reconstruction components based on K-SVD. In a study by Pham and Venkatesh,¹⁶ a dictionary construction and joint learning algorithm was proposed for classification task. Later, a discriminative K-SVD (D-KSVD) was proposed for face recognition by Zhang and Li et al.⁵ The classification error was incorporated into the objective function based on extending K-SVD algorithm. The linear classifier was utilized for obtaining the query image’s label in D-KSVD method. Jiang et al.¹⁷ introduced a new discriminative sparse coding error constraint to jointly learn a single dictionary and a linear classification, the algorithm is called label consistent K-SVD (LC-KSVD). The common character of the methods mentioned above is that a shared dictionary is learned during DL scheme. Different from the above methods, Yang et al.¹⁸ proposed to learn a structured dictionary. Fisher discrimination dictionary learning method, which made the coding coefficients having big between-class scatter but small within-class scatter, was used in the classification scheme. Xu et al.¹¹ synthesized the advantages of the methods mentioned above. They proposed supervised within-class similar discriminative DL method, which incorporates linear classification error term and within-class representation coefficients constraint into objective function for face recognition.

The sparsity of the sparse representation coefficients is measured by $l_{0}$ -norm, which is NP-hard. Using $l_{1}$ -norm minimization instead is popular. It is still time consuming, though $l_{1}$ -norm minimization is more efficient. Zhang et al.¹⁹ indicated that it was the collaborative representation (CR) mechanism that improved the face recognition accuracy. Motivated by this idea, we proposed a discriminative collaborative representation–based classification (DCRC), which incorporated the within-class scatter and the linear classification error terms into the objective function to derive a more discriminative dictionary, and simultaneously added CR mechanism to decrease the time cost. Besides, recently researchers proposed algorithms for multisource image classification.^20,21 Liu et al.²⁰ proposed a joint sparse coding model to solve the room-level localization using multiple sets. Li et al.²² proposed multimodal fusion method for object recognition. Our algorithm is also effective for multimodal image sets.

This article was organized as follows. In the second section, we described the related works about classification on sparse representation and CR, respectively. In the third section, we described the methodology of the DCRC algorithm. Experiments were performed in the fourth section using different well-known databases to prove the validity of the proposed method. The conclusions were given in the fifth section.

Related works

Brief introduction of sparse representation–based classification

SRC was first proposed for face recognition by Wright et al.⁶ The SRC framework contains two procedures, sparse coding and classification. Suppose the training samples from K different classes are denoted as $D = [_{1}, ...,_{i}, ...,_{K}] \in ℝ^{d \times n}, i = 1, ..., K$ , where $D_{i} \in ℝ^{d \times n_{i}}$ is $n_{i}$ training samples subset from class i, and $n = \sum_{i = 1}^{K} n_{i}$ . Given a query image $z \in ℝ^{d}$ , SRC considers the query image as a sparse linear combination of the training data set, and the sparse coding can be described as follows

x = arg min_{x} {‖ z - Dx ‖}_{2}^{2}, s . t . {‖ x ‖}_{0} \leq τ

where x is the code coefficient vector, τ is a sparsity constant, and $l_{0}$ -norm counts the number of nonzero elements of x . This optimization is an NP-hard problem, and it is generally transformed to an $l_{1}$ -norm minimization instead

x = arg min_{x} {‖ z - Dx ‖}_{2}^{2}, s . t . {‖ x ‖}_{1} \leq τ

x = arg min_{x} {{‖ z - Dx ‖}_{2}^{2} + λ {‖ x ‖}_{1}}

where λ is a scalar constant to balance the sparsity and reconstruction error terms. Equation (3) is equivalent to equation (2).

The coding vector $x = [x_{1}, ..., x_{i}, ..., x_{K}]$ is obtained, and then we do classification via reconstruction errors minimization:

\begin{array}{l} L a b e l (z) = arg min_{i} r_{i} (z) \\ = arg min_{i} {‖ z - D_{i} x_{i} ‖}_{2}, i = 1, ..., K \end{array}

where $x_{i}$ is the representation coefficient vector corresponding to class i.

Dictionary learning model

The SRC framework directly used training data set as the dictionary. In order to learn a smaller dictionary, DL model was proposed. DL model used the training data set to learn a corresponding compact dictionary $D = [d_{1}, d_{2}, ..., d_{m}] \in ℝ^{d \times m}$ with m atoms ( $m < n$ ). Then, DL framework was written as follows¹¹

\begin{array}{l} 〈 D, X 〉 = arg min_{D, X} {{‖ Y - DX ‖}_{F}^{2} + λ {‖ X ‖}_{1}} \\ s . t . {‖ d_{j} ‖}_{2}^{2} = 1, j = 1, 2, ..., m \end{array}

where $Y = [Y_{1}, ..., Y_{i}, ..., Y_{K}] \in ℝ^{d \times n}, i = 1, ..., K$ is the training data set, $X = [x_{1}, x_{2}, ..., x_{n}] \in ℝ^{m \times n}$ is the sparse representation coefficients matrix of training set Y on dictionary D , ${‖ \cdot ‖}_{F}$ is the Frobenius norm, and ${‖ X ‖}_{1} = \sum_{i = 1}^{n} {‖ x_{i} ‖}_{1}$ , λ is a scalar constant to balance the sparsity and reconstruction error terms as equation (3).

The first term in equation (5) denotes reconstruction errors, so the model is suitable for signal representation tasks but not for classification tasks. The model in equation (5) is an unsupervised DL framework, because the labels of the training set are not taken into account. A supervised DL framework is designed for classification tasks via adding discriminative terms into objective function. Xu et al.¹¹ synthesized the methods in literature^5,17,18 and proposed a supervised DL framework called SCDDL.

Classification based on collaborative representation

The DL methods mentioned above are $l_{1}$ -norm minimizing optimization problem. Though there are many speeding up methods proposed as reviewed in the study by Zhang et al.,²² they are complicated and still time consuming. Zhang et al.¹⁹ analyzed the mechanism of SRC and indicated that CR played an essential role for classification in SRC but not $l_{1}$ -norm sparsity. To collaboratively represent the query image z , they proposed to utilize the regularized least square method. The CR-based classification with regularized least square (CRC-RLS) model is as follows

x = arg min_{x} {{‖ z - Dx ‖}_{2}^{2} + λ {‖ x ‖}_{2}^{2}}

where λ is the regularization parameter. It is easy to derive the analytical solution of equation (7) as

x = Pz = {(D^{T} D + λ I)}^{- 1} D^{T} z

where P is a projection matrix independent of z . Matrix P can be calculated offline, and this makes CR more effective.

It indicates that equation (6) makes the classification discriminative, while equation (7) makes the classification fast. Can we combine these models to make the classification procedure more discriminative and faster?

Discriminative collaborative representation for image-based classification

Drawing inspiration from literatures,^{5,11,17
–19} we proposed a DCRC algorithm. The DCRC method was described as follows.

Discriminative collaborative representation–based dictionary learning

Suppose $Y = [Y_{1}, ..., Y_{i}, ..., Y_{K}] \in ℝ^{d \times n}, i = 1, ..., K$ is the set of training samples from K classes. Every training sample is a d-dimensional column vector of Y . Discriminative collaborative representation–based DL is to derive a discriminative dictionary $D \in ℝ^{d \times m}$ and a classifier $W \in ℝ^{K \times m}$ for classification. Dictionary D consists of m atoms, and $X = [X_{1}, ..., X_{i}, ..., X_{K}] \in ℝ^{m \times n}$ is the coding coefficients matrix of training samples on dictionary D . The discriminative collaborative representation–based DL model can be written as follows

\begin{array}{l} 〈 D, W, X 〉 = arg min_{D, W, X} {\begin{cases} {‖ Y - DX ‖}_{F}^{2} + λ_{1} {‖ X ‖}_{F}^{2} + α {‖ H - WX ‖}_{F}^{2} + \\ β {‖ W ‖}_{F}^{2} + λ_{2} \sum_{i = 1}^{K} ({‖ X_{i} - M_{i} ‖}_{F}^{2} + η {‖ X_{i} ‖}_{F}^{2}) \end{cases}} \\ s . t . {‖ d_{j} ‖}_{2}^{2} = 1, j = 1, ..., m \end{array}

where ${‖ X ‖}_{F}^{2} = \sum_{i = 1}^{K} {‖ X_{i} ‖}_{F}^{2}$ and ${‖ \cdot ‖}_{F}$ is the Frobenius norm. Then we can rewrite equation (8) as

\begin{array}{l} 〈 D, W, X 〉 = arg min_{D, W, X} {\begin{cases} {‖ Y - DX ‖}_{F}^{2} + λ_{1} \sum_{i = 1}^{K} {‖ X_{i} ‖}_{F}^{2} + α {‖ H - WX ‖}_{F}^{2} + \\ β {‖ W ‖}_{F}^{2} + λ_{2} \sum_{i = 1}^{K} ({‖ X_{i} - M_{i} ‖}_{F}^{2} + η {‖ X_{i} ‖}_{F}^{2}) \end{cases}} \\ \begin{matrix} = \end{matrix} arg min_{D, W, X} {\begin{cases} {‖ Y - DX ‖}_{F}^{2} + (λ_{1} + λ_{2} η) \sum_{i = 1}^{K} {‖ X_{i} ‖}_{F}^{2} + \\ α {‖ H - WX ‖}_{F}^{2} + β {‖ W ‖}_{F}^{2} + λ_{2} \sum_{i = 1}^{K} ({‖ X_{i} - M_{i} ‖}_{F}^{2}) \end{cases}} \\ s . t . {‖ d_{j} ‖}_{2}^{2} = 1, j = 1, ..., m \end{array}

Here, we set $η = 1$ as in Xu et al.¹¹ for simplicity. Then, equation (9) is rewritten as

\begin{array}{l} 〈 D, W, X 〉 = arg min_{D, W, X} {\begin{cases} {‖ Y - DX ‖}_{F}^{2} + (λ_{1} + λ_{2}) {‖ X ‖}_{F}^{2} + \\ α {‖ H - WX ‖}_{F}^{2} + β {‖ W ‖}_{F}^{2} + λ_{2} \sum_{i = 1}^{K} ({‖ X_{i} - M_{i} ‖}_{F}^{2}) \end{cases}} \\ s . t . {‖ d_{j} ‖}_{2}^{2} = 1, j = 1, ..., m \end{array}

where ${‖ Y - DX ‖}_{F}^{2}$ is the reconstruction errors term of training data set Y on dictionary D , $(λ_{1} + λ_{2}) {‖ X ‖}_{F}^{2}$ is the coding coefficients restriction term, $α {‖ H - WX ‖}_{F}^{2} + β {‖ W ‖}_{F}^{2}$ is the classification errors term, and $λ_{2} \sum_{i = 1}^{K} ({‖ X_{i} - M_{i} ‖}_{F}^{2})$ is the within-class similar term as in the study by Xu et al.¹¹ $W \in ℝ^{K \times m}$ is the linear classifier, and $H \in ℝ^{K \times n}$ is the label matrix corresponding to the training samples. Each column of H is a vector with the form ${[0, ...0, 1, 0, ..., 0]}^{T} \in ℝ^{K}$ , where 1 locates the training samples to the corresponding class. $X_{i}$ is the coding coefficients corresponding to class i. $M_{i}$ is the same type matrix of $X_{i}$ , and each column of $M_{i}$ is the mean vector of $X_{i}$ . Parameters $λ_{1}$ , $λ_{2}$ , α, and β are all nonnegative constants. This framework simultaneously derives the dictionary D and classifier W . Once the classifier is obtained, the query image can be easily classified by the classifier.

Optimization scheme

The dictionary D and classifier W can be optimized simultaneously. We can synthesize D and W to derive an extended matrix as in literature,^5,11,17 and equation (10) can be transformed to

〈 D, W, X 〉 = arg min_{D, W, X} {\begin{cases} {∥ (\begin{matrix} Y \\ \sqrt{α} H \end{matrix}) - (\begin{matrix} D \\ \sqrt{α} W \end{matrix}) X ∥}_{F}^{2} + (λ_{1} + λ_{2}) {‖ X ‖}_{F}^{2} + \\ β {‖ W ‖}_{F}^{2} + λ_{2} \sum_{i = 1}^{K} ({‖ X_{i} - M_{i} ‖}_{F}^{2}) \end{cases}}

If $(\begin{matrix} D \\ \sqrt{α} W \end{matrix})$ is normalized column-wise, the regularization penalty can be dropped.⁵ Denote $(\begin{matrix} Y \\ \sqrt{α} H \end{matrix})$ as $Y^{*}$ and denote $(\begin{matrix} D \\ \sqrt{α} W \end{matrix})$ as $D^{*}$ . Then, the objective function can be simply written as

\begin{array}{l} 〈 D^{*}, X 〉 = arg min_{D^{*}, X} {\begin{cases} {‖ Y^{*} - D^{*} X ‖}_{F}^{2} + (λ_{1} + λ_{2}) {‖ X ‖}_{F}^{2} + \\ λ_{2} \sum_{i = 1}^{K} ({‖ X_{i} - M_{i} ‖}_{F}^{2}) \end{cases}} \\ s . t . {‖ d_{j}^{*} ‖}_{2}^{2} = 1, j = 1, ..., m \end{array}

We can alternatively update X and $D^{*}$ , that is, fix one of them, and update the other.

Now, we fix the value of $D^{*}$ and then update X . The function in equation (12) is turned to a coding problem

〈 X 〉 = arg min_{X} {\begin{cases} {‖ Y^{*} - D^{*} X ‖}_{F}^{2} + (λ_{1} + λ_{2}) {‖ X ‖}_{F}^{2} + \\ λ_{2} \sum_{i = 1}^{k} ({‖ X_{i} - M_{i} ‖}_{F}^{2}) \end{cases}}

We can derive X class by class. For example, if we want to obtain $X_{i}$ of class i, the function in equation (13) can be written by class

〈 X_{i} 〉 = arg min_{X_{i}} {{‖ Y_{i}^{*} - D^{*} X_{i} ‖}_{F}^{2} + (λ_{1} + λ_{2}) {‖ X_{i} ‖}_{F}^{2} + λ_{2} {‖ X_{i} - M_{i} ‖}_{F}^{2}}

The objective function in equation (14) is differentiable, so we can obtain $X_{i}$ via derivative of objective function and setting to be zero. Denote f as the objective function and then derivative

\frac{\partial f}{\partial X_{i}} = {\begin{cases} - 2 D^{* T} Y_{i}^{*} + 2 X_{i} + \\ 2 (λ_{1} + λ_{2}) X_{i} + 2 λ_{2} (X_{i} - M_{i}) \end{cases}}

Set $\frac{\partial f}{\partial X_{i}}$ to be zero and denote the obtained variables from the k-th to (k+1)-th iteration using the subscripts $(k)$ and $(k + 1)$ . Thus, $X_{i}$ can be calculated as follows

X_{i (k + 1)} = \frac{λ_{2} M_{i (k)} + D_{(k)}^{* T} Y_{i (k)}^{*}}{1 + λ_{1} + 2 λ_{2}}

After obtaining the coding coefficient X , we fix it and update $D^{*}$ . Then the function in equation (12) is reduced to a DL problem

\begin{array}{l} 〈 D^{*} 〉 = arg min_{D^{*}} {‖ Y^{*} - D^{*} X ‖}_{F}^{2} \\ s . t . {‖ d_{j}^{*} ‖}_{2}^{2} = 1, j = 1, ..., m \end{array}

The objective function in equation (17) is differentiable, so we can derive $D^{*}$ as follows

D_{(k + 1)}^{*} = Y_{(k)}^{*} X_{(k + 1)}^{T} {(X_{(k + 1)} X_{(k + 1)}^{T})}^{- 1}

$D^{*} = (\begin{matrix} D \\ \sqrt{α} W \end{matrix})$ contains W and D , so we can derive dictionary D and classifier W simultaneously via the lemma in the study by Zhang and Li.⁵ Suppose $D^{*} = (\begin{matrix} d_{1}^{*}, d_{2}^{*}, ..., d_{m}^{*} \\ w_{1}^{*}, w_{2}^{*}, ..., w_{m}^{*} \end{matrix})$ , then we can obtain D and W as follows

D = (d_{1}, d_{2}, ..., d_{m}) = (\frac{d_{1}^{*}}{{‖ d_{1}^{*} ‖}_{2}}, \frac{d_{2}^{*}}{{‖ d_{2}^{*} ‖}_{2}}, ..., \frac{d_{m}^{*}}{{‖ d_{m}^{*} ‖}_{2}})

W = (w_{1}, w_{2}, ..., w_{m}) = (\frac{w_{1}^{*}}{\sqrt{α} {‖ d_{1}^{*} ‖}_{2}}, \frac{w_{2}^{*}}{\sqrt{α} {‖ d_{2}^{*} ‖}_{2}}, ..., \frac{w_{m}^{*}}{\sqrt{α} {‖ d_{m}^{*} ‖}_{2}})

Classification scheme

To collaboratively represent a given test sample z on normalized D with low computational burden, we utilize the regularized least square method via equation (6). The solution of equation (6) can be analytically and easily derived by equation (7).

The classification of z is based on its CR coefficient x , which contains most discriminative information for classification. We can easily use the linear classifier W with x to derive the label of the test sample z

Label (z) = arg max_{i} {(Wx)}_{i}, i = 1, 2, ..., K

The maximum i-th element of Wx locates the class label i of z .

Flow of our algorithm

The DL is proposed for one-dimensional signal processing, while image-based classification is usually two-dimensional signals. Therefore, an image preprocessing is needed. First, we need to arrange the image data in a line as a vector. The transformation makes the original data being high-dimensional data. Thus, a dimensionality reduction (DR) processing is needed, for high-dimensional data lead to the inefficiency of data processing. In our method, the basic DR method called principal component analysis (PCA)²³ is used, for it is simple and widely used. The procedures of DCRC are summarized in Table 1.

Table 1.

The flow of discriminative collaborative representation classification algorithm.

Algorithm 1. Discriminative collaborative representation classification (DCRC)
Task: Derive the label of the test sample of z Input: A set of n h-dimensional training samples from K classes $T = [T_{1}, ..., T_{i}, ..., T_{K}] \in ℝ^{h \times n}$ , and test sample z
Initialize: Use PCA to initialize T and get the low d-dimensional training set $Y = [Y_{1}, ..., Y_{i}, ..., Y_{K}] \in ℝ^{d \times n}$ Initialize constant parameters: $α, λ, λ_{1}, λ_{2}$ Initialize $D_{(0)}$ using random matrix, then $X_{(0)} = \frac{1}{1 + λ_{1}} D_{(0)}^{T} Y$ Initialize labels of training samples $H_{(0)}$ Initialize $W_{(0)} = H_{(0)} X_{(0)}^{T} {(X_{(0)} X_{(0)}^{T} + I)}^{- 1}$ Initialize $D_{(0)}^{*} = (\begin{matrix} D_{(0)} \\ \sqrt{α} W_{(0)} \end{matrix})$
Dictionary Learning Procedure: While not converged Updating X via Eq. (16) Updating $D^{*}$ via Eq. (18) Normalizing D and W via Eq. (20) and Eq. (21) end while Output: D , W , X
Classification Procedure: CR coding via Eq. (7) Classification via Eq. (21) Output: $L a b e l (z)$

Experiments for multimodal databases

To verify the validity of DCRC algorithm, we design two experiments with two databases: near-infrared database²⁴ and AR visible database.²⁵ We compared our algorithm with SRC,⁶ CRC,¹⁹ D-KSVD¹⁷ algorithms on accurate recognition rate and time for classifying one test sample.

All the experiments were run on Matlab R2011a. The PC is Lenovo E40-80 notebook computer with an Intel Corel i5 2.30 GHz CPU and 8 GB RAM. The GPUs are AMD Radeon(TM) R5 M330 and Intel(R) HD Graphics 5500 with 2 GB RAM. We also employed sparse solver SPAMS²⁶ to optimize a standard sparse representation.

Experiment with near-infrared database

The near-infrared database contains 50 distinct subjects and 10 different infrared images for each one. Each image is $100 \times 80$ pixels. Figure 1 shows some images of the near-infrared database.

Figure 1.

Samples from the near-infrared database.

With this database, we tested SRC, CRC, D-KSVD, and our method DCRC. The samples from the database were average divided into two groups: one group was used as training sample and the other group was used as test sample. In order to reduce the calculation cost, we used PCA²⁰ to reduce the dimension of the sample vector from $ℝ^{100 \times 80}$ to $ℝ^{100}$ . For the SRC and CRC methods, we used all the training samples as the dictionary atoms (total 250 atoms). For the D-KSVD method, we used 150 atoms for the dictionary learning. For the DCRC method, we used a dictionary with 70 atoms, and with the parameters $α = 1$ , $λ = 0.001$ , $λ_{1} = 0.001$ , $λ_{2} = 0.1$ .

Table 1 shows that the proposed DCRC algorithm contains two procedures: DL and classification. In the dictionary training procedure, the representation reconstruction errors are quickly convergent as shown in Figure 2.

Figure 2.

The representative reconstruction error.

The effect on the accurate recognition rate versus dictionary size is shown in Figure 3. The result shows that the curve is boosting quickly with the increasing atoms forepart, and terminal recognition rates change smooth to 99%. Dictionary size is the main factor affecting the recognition rate. The parameters $λ, λ_{1}, λ_{2}$ affect the recognition rate weakly. The parameter $λ_{1}$ affects the sparsity, while the parameter $λ_{2}$ affects the within-class errors. The coding coefficients of one test sample for the four methods are shown in Figure 4, and from top to bottom followed by the coefficient of SRC, CRC, D-KSVD, and DCRC. From the results, we can see that SRC and D-KSVD are strong sparse, while CRC and DCRC are weak sparse.

Figure 3.

The accurate recognition rate versus dictionary size.

Figure 4.

The coding coefficients of one test sample for SRC, CRC, D-KSVD, and DCRC. SRC: sparse representation–based classification; CRC: collaborative representation–based classification; D-KSVD: discriminative-KSVD; DCRC: discriminative collaborative representation–based classification.

Compared with SRC, CRC, and D-KSVD, the proposed method DCRC has very competitive recognition rate but with significantly lower complexity. We recorded the time for classifying one test sample of the three methods, and the results were shown in Table 2.

Table 2.

Face recognition rate and time for classifying one test sample of different methods using near-infrared database.

Method	Recognition rate (%)	Time (ms)
SRC	98.8	0.9139
CRC	99.0	0.3928
D-KSVD	98.0	0.04858
DCRC	99.2	0.01918

SRC: sparse representation–based classification; CRC: collaborative representation–based classification; D-KSVD: discriminative-KSVD; DCRC: discriminative collaborative representation–based classification.

The results indicate that the proposed method DCRC has the approximate recognition rate with the SRC, CRC, and D-KSVD methods for this database. The recognition rates are near the same or almost the same as each other, maybe because the near-infrared database is without noise. SRC and CRC directly used the training sample as a dictionary, while D-KSVD and DCRC learned a compact dictionary from the training set. Therefore, the latter two methods are faster for object recognition, and DCRC is the fastest one of all. To validate our method being competitive for recognition accurate rate, we designed a new experiment in subsection “Experiment with AR database.”

Experiment with AR database

The AR database contains images for 126 persons, with 26 images for each one. Each image is 165 × 120 pixels. This database is widely used for face recognition. Figure 5 shows some images of the AR database. We can see that these images are captured with different viewpoints, different illuminations, different facial expressions, and different disguises (sunglass and neckerchief). These interfering factors make it more difficult for face recognition. In order to validate our method is still effective for samples with noise, we designed an experiment with the AR database.

Figure 5.

Samples from the AR database.

In our experiment, we chose 2600 images from 50 males and 50 females. For each person, there are 26 samples; we chose 20 samples for training and the other 6 samples for testing. First, we used PCA algorithm to reduce the character vector dimension from $ℝ^{165 \times 120}$ to $ℝ^{1000}$ . For the SRC and CRC methods, we still used all training samples as a dictionary (2000 atoms). For the DKSVD method, we used 1600 atoms for the dictionary learning. For the DCRC method, we used a dictionary with 100 atoms, and with the parameters $α = 1$ , $λ = 0.001$ , $λ_{1} = 0.01$ , $λ_{2} = 0.0001$ . The results are given in Table 3.

Table 3.

Face recognition rate and time for classifying one test sample of different methods using AR visible database.

Method	Recognition rate (%)	Time (ms)
SRC	77.17	205.9
CRC	83.66	115.1
D-KSVD	98.0	1.300
DCRC	99.1	0.4965

The results show that the recognition rates of the four methods decreased in different degrees compared with the near-infrared results. The recognition rate of DCRC performs better than that of SRC, CRC, and D-KSVD. The occurrence of such results may be the interfering with noise. The D-KSVD and DCRC methods have the discriminative ability, while the SRC and CRC methods focus on representation of raw signals. Our method is robust for face recognition. The proposed method is the fastest one of the three methods.

Conclusions

A fast discriminative CR image classification method DCRC was proposed in this article. It incorporated the within-class scatter and the linear classification error terms into the objective function, so the method had the discriminative ability for classification. In order to decrease the computing time, we added CR mechanism to the objective function. The experimental results showed that DCRC had better recognition performance than the other three methods, and it was suitable for multimodal image (infrared or visible) classification. The calculating speed of our method was improved a lot.

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported in part by the National Natural Science Foundation of China (grant no 61403398).

References

Elad

Aharon

. Image denoising via sparse and redundant representations over learned dictionaries. IEEE Trans Image Process 2006; 15(12): 3736–3745.

Protter

Elad

. Image sequence denoising via sparse and redundant representation. IEEE Trans Image Process 2009; 18(1): 27–35.

Mairal

Bach

Ponce

. Discriminative learned dictionaries for local image analysis. In: Proceedings of 2008 IEEE conference on computer vision and pattern recognition (CVPR), Anchorage, AK, USA, 6 2008, pp. 1–8. IEEE.

Yang

Wright

Huang

. Image super-resolution via sparse representation. IEEE Trans Image Process 2010; 19(11): 2861–2873.

Zhang

. Discriminative K-SVD for dictionary learning in face recognition. In: Proceedings of 2010 IEEE conference on computer vision and pattern recognition (CVPR), San Francisco, CA, USA, 6 2010, pp. 2691–2698. IEEE.

Wright

Yang

Ganesh

. Robust face recognition via sparse representation. IEEE Trans Pattern Anal Mach Intell 2009; 31(2): 210–227.

Kang

Liao

Xiang

. Kernel sparse representation with local patterns for face recognition. In: 18th IEEE international conference on image processing, Brussels, Belgium, 9 2011, pp. 3009–3012. IEEE.

Yang

Zhang

Feng

. Sparse representation based fisher discrimination dictionary learning for image classification. Int J Comput Vis 2014; 109(3): 209–232.

Zhao

Sun

. Robust face recognition based l ₂₁-norm sparse representation. In: 5th international conference on digital home, Guangzhou, China, 11 2014, pp. 25–29. IEEE.

10.

Lai

Jiang

. Class-wise sparse and collaborative patch representation for face recognition. IEEE Trans Image Process 2016; 25(7): 3261–3272.

11.

Chen

. Supervised within-class-similar discriminative dictionary learning for face recognition. J Vis Commun Image Represent 2016; 38: 561–572.

12.

Liu

Guo

Sun

. Object recognition using tactile measurements: kernel sparse coding methods. IEEE Trans Instrum Meas 2016; 65(3): 656–665.

13.

Liu

Sun

. Traffic sign recognition using group sparse coding. Inform Sci 2014; 266: 75–89.

14.

Aharon

Elad

Bruckstein

. K-SVD: an algorithm for designing overcomplete dictionaries for sparse representation. IEEE Trans Signal Process 2006; 54(11): 4311–4322.

15.

Liu

Sun

. Robust exemplar extraction using structured sparse coding. IEEE Trans Networks Learn Syst 2015; 26(8): 1816–1821.

16.

Pham

Venkatesh

. Joint learning and dictionary construction for pattern recognition. In: IEEE conference on computer vision and pattern recognition (CVPR), Anchorage, AK, USA, 6 2008, pp. 1–8. IEEE.

17.

Jiang

Lin

Davis

. Label consistent K-SVD: learning a discriminative dictionary for recognition. IEEE Trans Pattern Anal Mach Intell 2013; 35(11): 2651–2664.

18.

Yang

Zhang

Feng

. Fisher discrimination dictionary learning for sparse representation. In: IEEE international conference on computer vision (ICCV), Barcelona, Spain, 11 2011, pp. 543–550. IEEE.

19.

Zhang

Yang

Feng

. Sparse representation or collaborative representation: which helps face recognition? In: IEEE international conference on computer vision (ICCV), Barcelona, Spain, 11 2011, pp. 471–478. IEEE.

20.

Liu

Sun

Fang

. Robotic room-level localization using multiple sets of sonar measurements. IEEE Trans Instrum Meas 2017; 66(1): 2–13.

21.

Liu

. Multi-modal local receptive field extreme learning machine for object recognition. In: International joint conference on neural networks (IJCNN), Vancouver, BC, Canada, 7 2016, pp. 1696–1701. IEEE.

22.

Zhang

Yang

. A survey of sparse representation: algorithms and applications. IEEE Access 2015; 3: 490–530.

23.

Turk

Pentland

. Eigenfaces for recognition. J Cognitive Neurosci 1991; 3(1): 71–86.

24.

Yong

David

. Bimodal biometrics based on a representation and recognition approach. Opt Eng 2011; 50(3): 037202–037202.

25.

Martinez

Benavente

The AR face database. CVC Tech. Report no. 24, 1998.

26.

Mairal

Bach

Ponce

. Online learning for matrix factorization and sparse coding. J Mach Learn Res 2010; 11(1): 19–60.

Discriminative collaborative representation for multimodal image classification

Abstract

Keywords

Introduction

Related works

Brief introduction of sparse representation–based classification

Dictionary learning model

Classification based on collaborative representation

Discriminative collaborative representation for image-based classification

Discriminative collaborative representation–based dictionary learning

Optimization scheme

Classification scheme

Flow of our algorithm

Experiments for multimodal databases

Experiment with near-infrared database

Experiment with AR database

Conclusions

Footnotes

Declaration of conflicting interests

Funding

References