Face Recognition Based on Optimized Projections for Distributed Intelligent Monitoring Systems

Abstract

Compressive sensing (CS), as a new theory of signal processing, has found many applications. This paper deals with a CS-based face recognition system design. A novel framework, called projection matrix optimization- (PMO-) based compressive classification, is proposed for distributed intelligent monitoring systems. Unlike the sparse preserving projection (SPP) approach, the projection matrix is designed such that the coherence between different classes of faces is reduced and hence a higher recognition rate is expected. The optimal projection matrix problem is formulated as identifying a matrix that minimizes the Frobenius norm of the difference between a given target Gram and that of the equivalent dictionary. A class of analytical solutions is derived. With the PMO-based CS system, two frameworks are proposed for compressive face recognition. Experiments are carried out with five popularly utilized face databases (i.e., ORL, Yale, Yale Extend, CMU PIE, and AR) and simulation results show that the proposed approaches outperform those existing compressive ones in terms of the recognition rate and reconstruction error.

1. Introduction

Face recognition (FR) has played a very important role in multimedia based applications. In spite of many years' research, it remains an interesting and challenging research area [1]. Figure 1 depicts the conventional face recognition process. As it involves storing and transmitting high dimensional images, image compression techniques such as JPEG and JPEG2000 are used to alleviate the problem [2, 3]. At the receiver end, users have to decompress (reconstruct) images and extract the image features for classification. Such a procedure usually requires a lot of computations and hence makes the systems expensive. It can be much simplified if the images can be acquired using compressive sensing, which outputs features of images extracted directly, and the classification and reconstruction are done with the extracted features. See Figure 2.

Figure 1

Block-diagram of the conventional FR Systems.

Figure 2

Block-diagram for compressive sensing-based FR Systems.

With the development of Internet of Things and information technology, the demand for the distributed intelligent image monitor systems, as shown in Figure 3, increases greatly, which need an intelligent Internet end with the capacity of sensing and classifying. The cameras are connected by wireless Internet. The main problems with such a system are the transmission and storage of the images and the cost, including the infrastructure and the communications. To implement this architecture, technically speaking, there are several factors to consider: data storage space, RAM space, the computation time, and the transmission bandwidth. The key to solve these issues is to develop a data compression algorithm which can integrate the signal reconstruction and the classification with an accepted performance.

Figure 3

Block-diagram of a distributed intelligent monitoring system.

Dimension reduction plays an extensively important role in high dimensional data analysis and studies. In recent years, many dimensionality reduction methods are successfully applied in pattern recognition [4, 5]. The principal component analysis (PCA) intends to represent faces by projecting the facial images to the directions of maximal covariance in the facial image data. One of the advantages of PCA is to reduce dimensionality of data but such an approach ignores the relationship between data in high dimensions. Linear discrimination analysis (LDA) explicitly attempts to model the difference between the classes of data. The Fisherface method combines PCA and the Fisher criterion to extract the information that discriminates the differences between the classes of a sample set. The projection matrix is chosen to maximize the ratio between the determinant of the between-class scatter matrix of the projected samples and that of the within-class scatter matrix. Nevertheless, Martinez et al. demonstrated that when the training data set is small, the eigenface method outperforms the Fisherface method. Some novel algorithms attempt to reduce the data dimensionality, while keeping the intrinsic characteristics. The local preserve project (LPP) aims to find embedding which can preserve local information and obtain a face subspace that best detects the essential face manifold structure. Because LPP has an excellent ability to find a better projection direction when the distances between classes are large, it can keep the local structure of the data very well. But when the distance between two classes is close or even partially overlapping, it can not process classification effectively due to the characteristics of keeping local information.

The recently developed compressed sensing (CS) is a signal processing technique that can acquire a signal efficiently and reconstruct it by finding the solution to an undetermined linear system [6, 7]. Its essence is to achieve analog signal discretization with sampling-compression integration, namely, the analog-to-discrete CS. The basic principle of such a CS framework is similar to that of discrete-to-discrete CS [8, 9], which can be explained below. In the standard CS framework, it is assumed that the high dimension signals $x \in R^{N \times 1}$ can be represented as a linear combination of L vectors ${ψ_{k}}$ :

\begin{matrix} x = \sum_{k = 1}^{L} s_{k} ψ_{k} ≜ Ψ s, \end{matrix}

(1)

where

Ψ \in R^{N \times L}

is known as the dictionary (matrix), while s is

{‖s‖}_{0}

-sparse vector with

{‖s‖}_{0}

denoting the number of nonzero elements of s and corresponding x is said to be

{‖s‖}_{0}

-sparse in the dictionary

Ψ

. The basic mathematical problem of CS is to study how to reconstruct the original high dimensional signal x from its low dimensional projection y which is mathematically of the form

\begin{matrix} y = Φ x, \end{matrix}

(2)

where

Φ \in R^{M \times N}

is called a projection matrix (it is also called measurement or sensing matrix, which will be used alternatively in this paper) with

M < N

. Signal reconstruction means to find x from (2) with y and the pair

(Φ, Ψ)

given. There are two conditions under which recovery is possible. The first one is sparsity which requires the signals x to be sparse in some

Ψ

. The signal reconstruction problem is given as

\begin{matrix} s ≜ a r g \underset{\tilde{s}}{m i n} {‖\tilde{s}‖}_{0} \\ s . t . y = A \tilde{s}, \end{matrix}

(3)

where

A ≜ Φ Ψ

is called equivalent dictionary. The solution to (3) is unique for sparse signals if A satisfies the restricted isometric property (RIP). See [6, 7].

The second condition is related to the mutual coherence [10, 11] of the equivalent dictionary A, which is defined below:

\begin{matrix} μ (A) ≜ \underset{1 \leq i \neq j \leq L}{m a x} \frac{|A {(:, i)}^{T} A (:, j)|}{{‖A (:, i)‖}_{2} {‖A (:, j)‖}_{2}}, \end{matrix}

(4)

where T denotes the transpose operator.

μ (A)

represents the worst-case coherence between any two atoms of A. As shown in [10], the K-sparse signal can be exactly recovered from the measurement as long as

\begin{matrix} μ (A) < \frac{1}{2 K - 1} . \end{matrix}

(5)

As seen, the smaller the value of

μ (A)

is, the bigger the value of K is allowed. The latter implies a wider range of signals that can be recovered exactly using such a CS system. So, minimizing

μ (A)

is of importance. Simulations showed that the signal reconstruction accuracy is more related to the number of columns that are strongly correlated compared with

μ (A)

. Noting this fact, Elad in [11] proposed to minimize averaged mutual coherence with respect to

Φ

for a given dictionary

Ψ

. Since then, many different approaches have appeared. Roughly speaking, these approaches are all based on the same framework: to design the sensing matrix

Φ

such that the Gram matrix of A, defined as

A^{T} A

, is as close to a target Gram matrix

G_{t}

as possible in the sense of

\begin{matrix} \underset{Φ, G_{t} \in S_{t}}{m i n} {‖G_{t} - A^{T} A‖}_{F}^{2}, \end{matrix}

(6)

where

{‖\cdot‖}_{F}

denotes the Frobenius norm,

S_{t}

is a class of Gram matrices possessing certain properties, and the dictionary is assumed to be given. See [12–15]. As the diagonal elements of

G_{t}

are all assumed to be one, the Frobenius norm-based error above actually represents the averaged coherence except for a constant factor

1 / L (L - 1)

if all the diagonal entries of

A^{T} A

are equal to one. The properties of the CS systems designed using (6) are strongly related to the choice of the target Gram

G_{t}

. A column normalized matrix

A \in R^{M \times L}

is said to be equiangular tight frame (ETF) if

|A (:, i)^{T} A (:, j)| = \sqrt{(L - M) / M (L - 1)}

\forall i \neq j

, where

\sqrt{(L - M) / M (L - 1)}

is the smallest mutual coherence that an

M \times L

matrix possibly has. Such a problem has been investigated for

G_{t}

being the identity matrix and some analytical solutions are available [13]. In the ETF-based approach, the target Gram is taken as the one of a relaxed ETF matrix and the obtained sensing matrix shows an improved performance [13, 15]. Developing algorithms to solve (6) for arbitrary

G_{t}

is still an interesting topic.

Sparse representation (SR), an important prerequisite for CS theory, has been applied in face classification [16]. Such an approach, usually referred to as SRC standing for sparse representation classification, can yield a higher recognition rate with varying illumination and expression. With SRC, the recognition is converted to the problem of classification among multiple linear regression models. The desired representation is sparse since the test sample should only be represented in terms of training samples belonging to the same class. The sparse representation can be computed with $l_{1}$ minimization [6]. However, this algorithm is very time-consuming for large scale databases with images of high resolution. The recognition rate degrades with the feature dimension reduction in general [16]. Sparse preserving projection (SPP), proposed in [17], builds the graphs based on the sparse reconstruction of data. Such a technique tries to preserve the sparsity without considering the coherence issue, which would affect the face recognition rate.

Sparse representation works well in applications where the original signal x needs to be reconstructed as accurately as possible, such as denoising, image inpainting, and coding. However, sometimes we just need to discriminate the signal from its representation rather than reconstructing it. The difference between reconstruction and discrimination has been widely investigated in the literature. It is known that typical reconstructive methods, such as PCA and independent component analysis (ICA), aim at obtaining a representation that enables sufficient reconstruction, and thus they are able to deal with signal corruption, that is, noise, missing data, and outliers. On the other hand, discriminative methods, such as LDA, generate a signal representation that maximizes the separation of distributions of signals from different classes. While both classes of methods have broad applications in classification, the discriminative methods, as expected, have often outperformed the reconstructive methods for the classification task.

The main objective of this paper is to develop an algorithm for face classification and reconstruction, which is intended to be used in distributed monitoring systems and to be implemented in a low-cost microprocessor (e.g., ARM Cortex-M3 and ARM Cortex-M4) platform. The algorithm to be proposed in this paper is projection matrix optimization- (PMO-) based compressive classification, which tries to design the projection matrix $Φ$ in such a way that the within-class coherence is enhanced while between-class one is reduced. The receiving end uses the sparse representation coefficients in the equivalent dictionary for image reconstruction and classification. Precisely speaking, the main contributions in this paper are given as follows: (i)

A new distributed monitoring system oriented face recognition framework is proposed based on compressed sensing. Instead of the high dimension original images, their lower dimension counterparts obtained using projection are transmitted and used for reconstruction and recognition. A new target Gram, denoted by $G_{t}$ , is proposed for designing the projection matrix in order to improve the discrimination between classes.

(ii)

The optimal projection matrix design problem is formulated in terms of identifying those sensing matrices that minimize the difference between the Gram of the equivalent dictionary and the proposed target Gram $G_{t}$ in Frobenius norm sense. A class of analytical solutions is derived for the proposed problem which is a generalization of that in [14].

(iii)

With the PMO-based CS system, two frameworks are proposed for face recognition. Experiments are carried out and the results confirm that the proposed approaches can effectively improve the system performance in terms of face classification and reconstruction.

The paper is outlined as follows. Section 2 is devoted to providing some existing works on compressive classification and recognition, which are closely related to ours. Our main contribution is given in Section 3, in which the PMO for compressive classification problem is formulated and an algorithm is derived to solve this problem. With the obtained PMO-based CS systems, two FR frameworks are proposed for distributed intelligent monitoring systems. Experiments are carried out in Section 4 to examine the performance of the proposed approaches and to compare them with some of the prevailing ones. To end this paper, some concluding remarks are given in Section 5.

2. Related Works and Problem Formulation

This section intends to review the sparse representation classification and some projection-based SRC methods, which have been considered as successful applications of signal sparse representation and compressive sensing theory to face recognition and are closely related to our present work to be developed in this paper.

2.1. Sparse Representation Classification

In [18], the sparse representation is shown to achieve state-of-the-art performance in image denoising. Such a technique was also used as an inpainting method in [19] for recovering missing pixels in images. It was extended to face recognition in [16].

Suppose we have P class samples ${C_{p}}$ ; each class $C_{p}$ has a set of samples with the same size, from which we randomly select Q samples for training purpose. Each sample forms a vector of dimension $N \times 1$ and scaled to unit in $l_{2}$ -norm, yielding an atom of the dictionary $Ψ_{p}$ :

\begin{matrix} Ψ ≜ [\begin{bmatrix} Ψ_{1} & Ψ_{2} & \dots & Ψ_{p} & \dots & Ψ_{P} \end{bmatrix}] = \{ψ_{k}\}, \end{matrix}

(7)

where

Ψ_{p} \in R^{N \times Q}

is the dictionary for the pth class.

In the standard SR framework, it is assumed that the original face image signals $x \in R^{N \times 1}$ can be represented as a linear combination of $P Q$ vectors ${ψ_{k}}$ :

\begin{matrix} x = \sum_{k = 1}^{L} s_{k} ψ_{k} ≜ Ψ s, \end{matrix}

(8)

where

L = Q P

and s is sparse.

The procedure for classifying given $l_{2}$ -normalized x contains the following steps [16].

${A l g}_{S R C}$

Step 1.

Normalize x in $l_{2}$ :

\begin{matrix} \bar{x} ≜ \frac{x}{{‖x‖}_{2}} . \end{matrix}

(9)

Step 2.

Find the sparse vector s with

\begin{matrix} s ≜ a r g \underset{\tilde{s}}{m i n} {‖\tilde{s}‖}_{1} \\ s . t . \bar{x} = Ψ \tilde{s} . \end{matrix}

(10)

Such a problem can be solved efficiently using a linear programming technique as the constraint is also linear.

Step 3 (compute the residual energy).

Denote by $s_{p} \in R^{Q \times 1}$ the subvector of $s \in R^{L \times 1}$ , corresponding to the pth dictionary $Ψ_{p}$ . Calculate

\begin{matrix} σ_{p}^{2} ≜ {‖\bar{x} - Ψ_{p} s_{p}‖}_{2}^{2}, \forall p = 1,2, \dots, P . \end{matrix}

(11)

Step 4 (classification).

x belongs to the $\hat{p}$ th class, where $\hat{p}$ is determined with

\begin{matrix} \hat{p} ≜ a r g \underset{p \in \{1, \dots, P\}}{m i n} \{σ_{p}^{2}\} . \end{matrix}

(12)

The input-output relationship of this algorithm is simply denoted by

\begin{matrix} \hat{p} = {A l g}^{S R C} [Ψ, x], \end{matrix}

(13)

where the dictionary

Ψ

l_{2}

-normalized.

It should be pointed out that the SRC described above works on high dimensional signals x. See (10). In the remainder of this section, we will present some methods that carry out the SRC in lower dimensionality domain by working with the projection y of original x via (2); that is, $y = Φ x$ . Such a class of classification techniques is referred to as compressive SRC. Therefore, a method in this class differs from another by the ways in which the projection matrix $Φ$ is designed.

2.2. PCA-Based SRC

The PCA provides a dimensionality reduction technique to represent the high dimensional signal x with a lower dimensional one y obtained using (2). The projection matrix $Φ$ is determined as follows.

Let $\bar{x} ≜ (1 / L) \sum_{k = 1}^{L} x_{k}$ be the mean of ${x_{k}}$ . Denote $e_{k} ≜ x_{k} - \bar{x}$ and

\begin{matrix} R ≜ \sum_{k = 1}^{L} e_{k} e_{k}^{T} . \end{matrix}

(14)

By singular value decomposition (SVD), R can be rewritten as

\begin{matrix} R = {U Σ U}^{T}, \end{matrix}

(15)

where

Σ = d i a g (σ_{1}, \dots, σ_{k}, \dots, σ_{L})

with the principal components satisfying

σ_{k} \geq σ_{k + 1}

\forall k

Under the PCA, the optimal projection matrix $Φ \in R^{M \times N}$ with $M \leq N$ is given by the first M columns of the orthonormal matrix U:

\begin{matrix} Φ = U (:, 1 : M) . \end{matrix}

(16)

The original signal x is projected into

y = Φ x

The PCA-based SRC approach, denoted by ${A l g}_{P C A}^{S R C}$ , is a classification based on y rather than x.

${A l g}_{P C A}^{S R C}$

Step 1.

With obtained $Φ$ above, compute $y = Φ x$ and $Υ ≜ Φ Ψ$ .

Step 2.

Normalize the equivalent dictionary:

\begin{matrix} A ≜ Υ d i a g (d_{1}, \dots, d_{k}, \dots, d_{L}), \end{matrix}

(17)

where

d_{k} ≜ 1 / {‖Υ (:, k)‖}_{2}

with the denominator being the

l_{2}

-norm of the kth column of

Υ

for all

k \in {1,2, \dots, L}

Step 3.

Determine the class that x belongs to using

\begin{matrix} \hat{p} = {A l g}^{S R C} [A, y] . \end{matrix}

(18)

Such a procedure was proposed in [16].

2.3. SPP-Based SRC

The SPP proposed in [17] aims at carrying out the classification/recognition in a lower dimensional space. Precisely speaking, it works with signals y which are obtained using (2), where the projection matrix $Φ$ belongs to $M \times N$ with $M < N$ .

In such a framework, the kth sample $x_{k}$ in the dictionary is featured by its sparse vector ${\bar{s}}_{k}$ obtained with

\begin{matrix} {\bar{s}}_{k} ≜ a r g \underset{\tilde{s} \in S_{k}}{m i n} {‖\tilde{s}‖}_{1} \\ s . t . x_{k} = Ψ \tilde{s}, 1 = 1^{T} \tilde{s}, \end{matrix}

(19)

where

1 \in R^{L \times 1}

is the column vector whose entries are all equal to

1

and

S_{k}

is the vector space of dimension L, in which each of the vectors has its kth element equal to zero. Denote

\begin{matrix} E ≜ Ψ - Ψ \bar{S}, \end{matrix}

(20)

where

\bar{S} ≜ [\begin{bmatrix} {\bar{s}}_{1} & \dots & {\bar{s}}_{k} & \dots & {\bar{s}}_{L} \end{bmatrix}]

as the sparse representation error matrix in the sense specified by (19). The projection matrix is chosen such that each row vector

ϕ

Φ

will minimize the sparse representation errors [17]:

\begin{matrix} ϕ ≜ a r g \underset{\tilde{ϕ}}{m i n} {‖\tilde{ϕ} E‖}_{2}^{2} \\ s . t . \tilde{ϕ} Ψ Ψ^{T} {\tilde{ϕ}}^{T} = 1, \end{matrix}

(21)

where the constraint is for avoiding degenerate solutions. Using Lagrange multiplier approach, one can find the solution to

ϕ

(21) from the following generalized eigenvalue problem:

\begin{matrix} E E^{T} v = λ Ψ Ψ^{T} v, \end{matrix}

(22)

and the M row vectors of optimal

Φ

are given the transposed eigenvectors, that is,

{v_{k}^{T}}

, corresponding to the top M eigenvalues of the above equation. Equation (22) can be solved using MATLAB command eig.m:

\begin{matrix} [V, Λ] = e i g (E E^{T}, Ψ Ψ^{T}), \end{matrix}

(23)

where

Λ = d i a g (λ_{1}, \dots, λ_{k}, \dots, λ_{L})

with

λ_{k} \geq λ_{k + 1}

assumed. Then, the optimal projection matrix

Φ \in R^{M \times L}

for SPP is given by first M row vectors of

V^{T}

\begin{matrix} Φ = (V^{T}) (1 : M, :) . \end{matrix}

(24)

As noted in [17], to avoid singularity of this problem, PCA-based preprocessing is usually used (let

X X^{T} = [\begin{bmatrix} W_{1} & W_{2} \end{bmatrix}] [\begin{smallmatrix} Σ & 0 \\ 0 & 0 \end{smallmatrix}] {[\begin{bmatrix} W_{1} & W_{2} \end{bmatrix}]}^{T}

be the SVD of

X X^{T}

; then, replace X with

\hat{X} = W_{1}^{T} X

As understood, the SPP-based SRC method, denoted by ${A l g}_{S P P}^{S R C}$ , is exactly the same as ${A l g}_{P C A}^{S R C}$ except that the projection matrix $Φ$ is given by (24).

2.4. Sparse Related Face Recognition

In [20], a class of structured sparsity-inducing norms was included into SRC framework which mainly concerns the misalignment, shadow, and occlusion scenarios. But the authors did not take dimensionality reduction into account which is crucial for the implementation of distributed systems. SRC assumes that the sparse representation residual follows Gaussian or Laplacian distribution, while robust sparse coding (RSC) considered in [21] models the sparse coding as a sparsity constrained robust regression problem and seeks the maximum likelihood estimation solution of the sparse coding. All these make it much more robust than SRC. However, the computational complexity of RSC is also much higher than SRC's which is not practical for the investigated distributed system.

2.5. Problem Formulation

In a distributed intelligent monitoring system, $N_{1} \times N_{2}$ images ${I_{k}}$ acquired by the agents are usually of high dimension and have to be transmitted to the back-end server for recognition. In order to have an efficient transmission, it is desired to compress these high dimension signals ${x_{k}}$ with $x_{k} \in R^{N \times 1}$ , where $N ≜ N_{1} N_{2}$ . One way to do so is through projection $y_{k} = Φ x_{k}$ , where the projection/sensing matrix $Φ \in R^{M \times N}$ with $M ≪ N$ compresses $x_{k}$ into a much lower dimension signal $y_{k}$ . Instead of $x_{k}$ , $y_{k}$ is transmitted.

Once $y_{k}$ is received by the back-end server, the recognition of $y_{k}$ (corresponding to image $I_{k}$ ) is done with the equivalent dictionary A:

\begin{matrix} A = Φ Ψ = [\begin{bmatrix} Φ Ψ_{1} & \dots & Φ Ψ_{p} & \dots & Φ Ψ_{P} \end{bmatrix}] ≜ [\begin{bmatrix} A_{1} & \dots & A_{p} & \dots & A_{P} \end{bmatrix}], \end{matrix}

(25)

where the dictionary

Ψ

, as formed in (7), is assumed to be given. The main problem to be investigated in the remainder of this paper is how to design the sensing matrix

Φ

such that the image

I_{k}

can be recognized with the measurement

y_{k}

and the equivalent dictionary A.

3. The Proposed CS System for FR Distributed Monitoring Systems

As mentioned before, the key to solve the problems encountered in distributed intelligent monitoring systems is to compress the original (high dimensional) image signals and to carry out face recognition effectively with the low dimensional measurements.

This can be done using CS systems. In fact, both ${A l g}_{S P P}^{S R C}$ and ${A l g}_{P C A}^{S R C}$ , described in Section 2, can be viewed as two special CS systems in which the sensing matrix is designed to preserve principal components of the signal covariance matrix and the sparse representation of the samples in the dictionary $Ψ$ , respectively.

As seen in the previous section, the optimal sensing matrix design problem is formulated by (6). Such a problem has been studied intensively for signal compression application, in which achieving high reconstruction accuracy is the ultimate goal of the design.

In the context of face recognition, what we are interested in the most is the recognition rate. Therefore, it is desired to design the projection matrix $Φ$ to enhance the recognition rate. This is strongly related to how to choose the target Gram $G_{t}$ . Another important issue is how to solve (6) with arbitrary $G_{t}$ .

3.1. How to Choose the Target Gram?

Optimizing projection matrix for CS systems has been an important research topic in CS theory, which was initiated by Elad's work reported in 2007 [11]. Since then, many theoretical achievements have been obtained [12–15]. Roughly speaking, all these results are based on the following formulation:

\begin{matrix} \underset{Φ}{m i n} {‖G - G_{t}‖}_{F}^{2} \\ s . t . G = A^{T} A, \end{matrix}

(26)

where

{‖\cdot‖}_{F}

is defined as the Frobenius norm, G is the Gram matrix of the equivalent dictionary

A = Φ Ψ

, and

G_{t}

is a target Gram matrix.

For a given dictionary, the performance of the projection matrix depends on the choice of $G_{t}$ . When a CS system is designed for signal compression, the reconstruction accuracy is the ultimate goal to achieve. In such a case, $G_{t}$ is chosen such that the resultant projection matrix makes the equivalent dictionary A have small coherence between its columns and the system robust against noises, say sparse representation errors of signals. Recently, Cleju showed in [14] that the images can be well reconstructed from the low dimensional measurements projected with the sensing matrix $Φ$ designed based on $G_{t} = Ψ^{T} Ψ$ , the Gram of the dictionary.

Comment 1.

It should be pointed out that, in $N_{1} \times N_{2}$ -image compression oriented CS systems, the images are divided into $n_{1} \times n_{2}$ subimages, each of which forms a signal vector, say $x_{k} \in R^{l \times 1}$ with $l ≜ n_{1} n_{2}$ (this is very different from what has been used in the SRC-based face recognition approaches, in which the signals are of dimension $N_{1} N_{2} \times 1$ ), while the constructive dictionary $Ψ$ is of dimension $l \times L$ with $L ≫ l$ . The K-SVD [18] has been considered as one of the most successful methods for designing such a dictionary. It was observed that the dictionaries obtained with different classes of images/samples are very similar. And our experiments showed that a direct application of such a CS system to face recognition based on reconstruction error does not yield a satisfactory performance in terms of recognition rate. Note that, for face recognition application, our ultimate objective is accurate recognition and to enhance recognition rate it would be better to design the sensing matrix $Φ$ by improving the discrimination between the classes at the price of sacrificing reconstruction accuracy. This is actually one of the main motivations for our proposed approach in this paper, which will be developed further.

Let $Ψ = [\begin{bmatrix} Ψ_{1} & \dots & Ψ_{p} & \dots & Ψ_{P} \end{bmatrix}]$ with $Ψ_{p} \in R^{N \times Q}$ be the overall dictionary defined before for face recognition and its Gram is

\begin{matrix} G_{Ψ} ≜ Ψ^{T} Ψ = [\begin{bmatrix} G_{11} & G_{12} & \dots & G_{1 P} \\ G_{21} & G_{22} & \dots & G_{2 P} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ G_{P 1} & G_{P 2} & \dots & G_{P P} \end{bmatrix}] \end{matrix}

(27)

with

G_{i j} ≜ Ψ_{i}^{T} Ψ_{j}

When choosing $Φ Ψ$ such that ${‖G_{Ψ} - G‖}_{F}^{2}$ is minimized, one emphasizes the reconstruction accuracy as the fundamental assumption of the SRC-based face recognition approach is that a sample (image), belonging to a class, is close to a linear combination of a few atoms of the dictionary of the same class. This, however, can not ensure that such a sample can not be well represented sparsely in a linear combination of other (sub)dictionaries. In other words, the reconstruction error itself may not be a proper measure for recognition. As mentioned before, recognition rate is our primary goal and hence in the CS system design, it is desired to choose the projection matrix $Φ$ such that the coherence between the columns of $A_{i}$ and those of $A_{j}$ for all $i \neq j$ is reduced. By doing so, the discrimination between classes is improved. This can be achieved by choosing the sensing matrix such that the Gram of the corresponding equivalent dictionary is close to the following target Gram as much as possible:

\begin{matrix} G_{t} = G_{Ψ} + Δ, \end{matrix}

(28)

where

\begin{matrix} Δ ≜ [\begin{bmatrix} Δ_{11} & Δ_{12} & \dots & Δ_{1 P} \\ Δ_{21} & Δ_{22} & \dots & Δ_{2 P} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ Δ_{P 1} & Δ_{P 2} & \dots & Δ_{P P} \end{bmatrix}] \end{matrix}

(29)

with

Δ_{i j}

defined below:

\begin{matrix} Δ_{i j} (m, n) = \{\begin{cases} - w * η, & i \neq j \\ η, & i = j, m \neq n \\ 0, & i = j, m = n, \end{cases} \end{matrix}

(30)

where both

η > 0

and

w > 0

are used to adjust the coherence between different classes and that between the columns of the same class, called correction parameter. These parameters should be chosen such that the off-diagonal elements of

G_{t}

are all within

[0,1]

. Such a target Gram

G_{t}

is called discriminative Gram.

Comment 2.

(i) In [12], sensing matrix optimization for block-sparse decoding was investigated. It is observed that our problem is the simplest block-sparse case, where the blocks are prefixed according to the classes of face images. $G_{t}$ is taken as the identity matrix in [12] and in our problem is assumed to be symmetric. Both encounter how to choose the weighting factors, which is application dependent and purely empirical. We will present some experimental results in the next section.

(ii) It should be pointed out that though, as to be seen, the recognition/classification is done based on the errors ${‖y - A_{p} s‖}_{F}^{2}$ , which has a physical meaning close to the reconstruction error, these errors have taken the discrimination between classes into account via all ${A_{p}}$ or the sensing matrix which is designed based on the discriminative Gram $G_{t}$ defined in (28). So, such a measure can be considered as a mixture of reconstruction error and discriminative error. It is worth noting that, in [22], a dictionary learning-based classification scheme was investigated where the dictionary was trained by certain class of signals as the classifier, while, in this paper, we consider the classification task in compressive domain. The sensing matrix is designed according to the given dictionary and hence the equivalent dictionary (corresponding to the compressive domain) possesses good property so that the sparse coding can be implemented accurately.

In the next subsection, we will discuss how to solve the proposed optimal sensing matrix design problem.

3.2. An Algorithm for Optimizing Projection Matrix

With the target Gram $G_{t}$ defined in (28), one can now consider the optimal projection matrix design, which is formulated as follows:

\begin{matrix} \hat{Φ} = a r g \underset{Φ}{m i n} {‖G - G_{t}‖}_{F}^{2} \\ s . t . G = Ψ^{T} Φ^{T} Φ Ψ . \end{matrix}

(31)

Now, let us consider how to solve the above problem. First of all, assume that $Ψ$ has the following SVD:

\begin{matrix} Ψ = U_{Ψ} [\begin{bmatrix} Σ_{Ψ} & 0 \\ 0 & 0 \end{bmatrix}] V_{Ψ}^{T}, \end{matrix}

(32)

where

Σ_{Ψ} \in R^{\tilde{N} \times \tilde{N}} > 0

. Then, we get

\begin{matrix} G = V_{Ψ} [\begin{bmatrix} Σ_{Ψ} & 0 \\ 0 & 0 \end{bmatrix}] W [\begin{bmatrix} Σ_{Ψ} & 0 \\ 0 & 0 \end{bmatrix}] V_{Ψ}^{T}, \end{matrix}

(33)

where

W ≜ U_{Ψ}^{T} Φ^{T} Φ U_{Ψ}

. Let

\begin{matrix} W = [\begin{bmatrix} W_{11} & W_{12} \\ W_{21} & W_{22} \end{bmatrix}] \\ s . t . W_{11} \in R^{\tilde{N} \times \tilde{N}} . \end{matrix}

(34)

Then,

\begin{matrix} {‖G - G_{t}‖}_{F}^{2} = {‖V_{Ψ} [\begin{bmatrix} Σ_{Ψ} & 0 \\ 0 & 0 \end{bmatrix}] W [\begin{bmatrix} Σ_{Ψ} & 0 \\ 0 & 0 \end{bmatrix}] V_{Ψ}^{T} - G_{t}‖}_{F}^{2} ≜ {‖[\begin{bmatrix} Σ_{Ψ} W_{11} Σ_{Ψ} & 0 \\ 0 & 0 \end{bmatrix}] - {\tilde{G}}_{t}‖}_{F}^{2}, \end{matrix}

(35)

where

\begin{matrix} {\tilde{G}}_{t} ≜ V_{Ψ}^{T} G_{t} V_{Ψ} = [\begin{bmatrix} {\tilde{G}}_{11} & {\tilde{G}}_{12} \\ {\tilde{G}}_{21} & {\tilde{G}}_{22} \end{bmatrix}] \\ s . t . {\tilde{G}}_{11} \in R^{\tilde{N} \times \tilde{N}} . \end{matrix}

(36)

Finally, the cost function can be rewritten as

\begin{matrix} {‖G - G_{t}‖}_{F}^{2} = {‖Σ_{Ψ} W_{11} Σ_{Ψ} - {\tilde{G}}_{11}‖}_{F}^{2} + {‖{\tilde{G}}_{t}‖}_{F}^{2} - {‖{\tilde{G}}_{11}‖}_{F}^{2} . \end{matrix}

(37)

We can see that the two right-hand sides have nothing to do with the projection matrix. Define

\begin{matrix} {\tilde{W}}_{11} ≜ Σ_{Ψ} W_{11} Σ_{Ψ} . \end{matrix}

(38)

Equation (31) is equivalent to

\begin{matrix} \hat{Φ} = a r g \underset{Φ}{m i n} {‖{\tilde{W}}_{11} - {\tilde{G}}_{11}‖}_{F}^{2} . \end{matrix}

(39)

Let ${\tilde{W}}_{11} = V_{W} Λ_{W} V_{W}^{T}$ and ${\tilde{G}}_{t}^{11} = V_{t} Λ_{t} V_{t}^{T}$ be an SVD and eigen decomposition of ${\tilde{W}}_{11}$ and ${\tilde{G}}_{11}$ , respectively. Furthermore, assume that both ${Λ_{W} (k, k)}$ and ${Λ_{t} (k, k)}$ are in descending order. It then follows from [23] (see Corollary $7.4 . 9.3$ in page 468) that

\begin{matrix} {‖{\tilde{W}}_{11} - {\tilde{G}}_{11}‖}_{F}^{2} \geq {‖Λ_{W} - Λ_{t}‖}_{F}^{2} ≜ ϱ \end{matrix}

(40)

and the lower bound (equality) is achieved if and only if

\begin{matrix} V_{W} = V_{t} . \end{matrix}

(41)

Noting that

r a n k ({\tilde{W}}_{11}) \leq M

, one can see that the lower bound ϱ can be minimized with

\begin{matrix} Λ_{W} (k, k) = \{\begin{cases} Λ_{t} (k, k), & \forall k \leq M t h a t Λ_{t} (k, k) \geq 0 \\ 0, & \forall k \leq M t h a t Λ_{t} (k, k) < 0 . \end{cases} \end{matrix}

(42)

With (41) and (42), one has optimal

{\tilde{W}}_{11}

and hence

W_{11}

\begin{matrix} {\tilde{W}}_{11} = V_{t} Λ_{W} V_{t}^{T} ⟹ W_{11} = Σ_{Ψ}^{- 1} {\tilde{W}}_{11} Σ_{Ψ}^{- 1} . \end{matrix}

(43)

Let the following be an SVD of $W_{11}$ :

\begin{matrix} W_{11} = V_{11} [\begin{bmatrix} Σ_{11}^{2} & 0 \\ 0 & 0 \end{bmatrix}] V_{11}^{T} . \end{matrix}

(44)

And set

W_{12} = 0

W_{21} = 0

, and

W_{22} = 0

; then

\begin{matrix} W = [\begin{bmatrix} V_{11} & 0 \\ 0 & V_{22} \end{bmatrix}] [\begin{bmatrix} Σ_{11}^{2} & 0 \\ 0 & 0 \end{bmatrix}] {[\begin{bmatrix} V_{11} & 0 \\ 0 & V_{22} \end{bmatrix}]}^{T}, \end{matrix}

(45)

where

V_{22}

is any orthonormal matrix with dimensions

(N - \tilde{N})

Since $Φ^{T} Φ = U_{Ψ} W U_{Ψ}^{T}$ , a class of solutions to (31) is obtained as

\begin{matrix} \hat{Φ} = U [\begin{bmatrix} Σ_{11} & 0 \end{bmatrix}] {[\begin{bmatrix} V_{11} & 0 \\ 0 & V_{22} \end{bmatrix}]}^{T} U_{Ψ}^{T}, \end{matrix}

(46)

where

Σ_{11} \in R^{M \times M}

and U is any orthogonal matrix with dimension of M.

Seen from the above discussion, the optimization of projection matrix $Φ$ is only related to the dictionary $Ψ$ and the correction constants w and η, and, therefore, for given $Ψ$ , w, and η, we can get $Φ$ , which is better than traditional feature selecting process. And (46) is an analytical solution. In addition, there are two degrees of freedom U and $V_{22}$ in the results, which provides the possibility to further improve system performance.

Comment 3.

Our proposed optimal sensing design problem shares the same form as that in [14]. It should be pointed out that, in [14], $G_{t} = G_{Ψ}$ and the solutions are relatively easy to obtain, while our $G_{t}$ is a generalization of $G_{Ψ}$ and the solutions are actually applicable to any symmetric $G_{t}$ . In terms of applications, ours is for face recognition, in which the degrees of freedom, brought by the parameters η and w, make our approach outperform that in [14].

3.3. The Proposed CS-Based Frameworks for FR

With the obtained optimal sensing matrix $\hat{Φ}$ , any given high dimension face image x can be projected to a vector of much lower dimension, which will be transmitted to the server of the distributed monitoring system for classification/recognition. Two PMO-based classification methods are proposed here. The first one, denoted by ${A l g}_{P M O}^{S R C}$ , has exactly the same structure as ${A l g}_{P C A}^{S R C}$ and ${A l g}_{S P P}^{S R C}$ , except that the projection matrix $Φ$ is given by (46).

The second one, denoted by ${A l g}_{P M O}^{l_{2}}$ , is outlined as follows.

${A l g}_{P M O}^{l_{2}}$

Step 1.

At a subsystem that captures a face image x, compress it using

\begin{matrix} y = \hat{Φ} x \end{matrix}

(47)

and then encode and transmit y to the server of the distributed monitoring system.

Step 2.

At the server that receives y, compute

\begin{matrix} σ_{p}^{2} ≜ \underset{\tilde{s}}{m i n} {‖y - A_{p} \tilde{s}‖}_{2}^{2}, p = 1, \dots, P, \end{matrix}

(48)

where

A_{p} \in R^{M \times Q}

is the equivalent dictionary for the pth class:

A = \hat{Φ} Ψ = [\begin{bmatrix} A_{1} & \dots & A_{p} & \dots & A_{P} \end{bmatrix}]

Step 3 (classification).

x belongs to the $\hat{p}$ th class, where $\hat{p}$ is determined with

\begin{matrix} \hat{p} ≜ a r g \underset{p \in \{1, \dots, P\}}{m i n} \{σ_{p}^{2}\} . \end{matrix}

(49)

In the next section, we will examine the performance of the proposed algorithms and compare it with some existing ones.

4. Experiment Results

In this section, we examine the performance of various classification systems on five face databases: ORL [24], Yale [25], Yale Extend [26] (referred to as Yale-E), CMU PIE [27] (referred to as PIE), and AR [28]: (i)

ORL database contains 400 images of 40 individuals (each provides 10 images); that is, $P = 40$ . Some images are captured at different time, varying the lighting, facial expressions, and facial details. In our test, we randomly select $Q = 5$ images from each individual to form the dictionary set, while the remaining ones are for testing.

(ii)

Yale database contains 165 images of 15 individuals (i.e., $P = 15$ ; each provides 11 images) under various illumination conditions and facial expressions. A random subset with $Q = 5$ images per individual is taken to form the dictionary set, and the rest of the database is used for testing.

(iii)

Yale-E database includes the Yale face database B and the extended Yale face database. A subset called Yale Extend face database is collected from these two databases, which contains 2414 face images of 38 subjects; that is, $P = 38$ . In our test, we randomly select $Q = 20$ images from each individual to form the dictionary set, while keeping the remaining for testing.

(iv)

PIE database is composed of 68 subjects (i.e., $P = 68$ ) with 41368 face images which are captured by 13 synchronized cameras and 21 flashes, under varying pose, illumination, and expression. We generate a data set from it by selecting 184 images for each individual and totally 12512 samples are used in our experiments. $Q = 20$ images are sampled to form the dictionary set, while the remaining images are considered for testing.

(v)

AR database contains over 4000 color face images of 126 people (70 men and 56 women; i.e., $P = 126$ ) taken during two distinct photo sessions (separated by two weeks), with different facial expression, lighting conditions, and occlusions. We choose 50 men and 50 women to generate a data set of 100 persons (each provides 26 images). Random selection of $Q = 20$ images per individual is taken to form the dictionary set, and the rest of the database is for testing.

For each of the five databases, we take all P classes (different persons) with each class containing Q different samples (images), leading to

L = P Q

. All images are resized to resolution of

32 \times 32

and normalized in scale; hence, each face sample is represented as a column vector

x \in R^{N \times 1}

with

N = 1024

. By doing so, the corresponding dictionary

Ψ \in R^{N \times L}

is then generated.

The systems consisted of different compression methods (PCA, SPP, RDM for random sampling matrix, and PMO) and various classifiers (SRC, RSC, Wlearning for [22], and $l_{2}$ ) are compared. For instance, ${A l g}_{P C A}^{S R C}$ denotes the system combining PCA and SRC with the subscript referring to compression method and the superscript the classifier.

4.1. The Effect of Q

Firstly, we briefly discuss the effect of the number of training samples with ORL database. Figure 4 shows the results of ${A l g}_{P M O}^{l_{2}}$ versus Q with different η values and $w = 2$ , $M = 80$ .

Figure 4

Recognition rate versus number of training samples Q with $w = 2$ and η given in the legend.

As can be seen, an increasing trend is obtained when Q gets bigger in general with slight fluctuation. It is reasonable that more samples are used for representation, and higher accuracy is achieved. In the following, proper Q values will be chosen for different databases as introduced in the beginning of this section.

4.2. The Choice of the Correction Parameter η

One highlight of the proposed PMO method is the correction parameter η. Now, we set up experiment for testing the effect of η. Fixing $M = 80$ and $w = 2$ , the recognition rate versus the correction parameter η is depicted in Figure 5.

Figure 5

Recognition rate versus η with $M = 80$ and $w = 2$ .

It is clearly seen from the figure that the correction parameter η does affect the performance of classification systems though the value should be chosen as database adapted.

4.3. Comparison of Different Classifiers

In this part, we turn to examine the performance of classification methods (i.e., SRC, RSC, $l_{2}$ , and Wlearning) with PMO applied for compression. Figure 6 shows the recognition results with ORL database used.

Figure 6

Recognition rate versus η with various classifiers for $M = 80$ and $w = 2$ .

The computation times of different methods are given in Table 1.

Table 1

Computation time of different classifiers.

	${A l g}_{P M O}^{S R C}$	${A l g}_{P M O}^{R S C}$	${A l g}_{P M O}^{W l e a r n i n g}$	${A l g}_{P M O}^{l_{2}}$
Time (s)	4.39	29.80	1.96	1.87

For this case, the proposed $l_{2}$ classifier achieves the best results almost in every η. The recognition rates of RSC are comparable somewhere, but its computation complexity is much higher than the others as can be concluded from Table 1, and this is also why RSC is not suitable for large distributed system.

4.4. Comparison of Different Compression Methods

The scenarios where various compression methods are combined with the SRC are investigated and the statistical results are summarized in Table 2.

Table 2

Comparison of different compression methods with $M = 80$ , $η = 0.02$ , and $w = 2$ .

	Database
	ORL	Yale	Yale-E	PIE	AR
${A l g}_{R D M}^{S R C}$	$92.20 %$	$85.56 %$	$91.63 %$	$86.58 %$	$93.60 %$
${A l g}_{P C A}^{S R C}$	$93.53 %$	$92.22 %$	$92.21 %$	$89.71 %$	$95.80 %$
${A l g}_{S P P}^{S R C}$	$46.75 %$	$94.44 %$	$44.36 %$	$21.18 %$	$74.46 %$
${A l g}_{P M O}^{S R C}$	$95.00 %$	$93.26 %$	$91.89 %$	$90.68 %$	$98.24 %$

The computation time with ORL database considered is given in Table 3.

Table 3

Computation time of different compression methods.

	${A l g}_{P C A}^{S R C}$	${A l g}_{S P P}^{S R C}$	${A l g}_{R D M}^{S R C}$	${A l g}_{P M O}^{S R C}$
Time (s)	3.87	5.07	3.82	4.39

The data in Table 2 demonstrates the superiority of the proposed PMO methods. The computation times of various systems differ with each other in the projection matrix design methods, that is, dimensionality reduction procedure. PMO costs more time when compared with PCA. But the design process is an offline job, and it does not put any additional pressure on the front agents.

4.5. Recognition Rate versus Compression Dimension

In this part, we examine the performance of different systems with various compression rates on ORL database for $η = 0$ , $w = 2$ , and $Q = 5$ . Four compression methods are conducted with SRC. In addition, PMO combined with the proposed $l_{2}$ classifier is also carried out for comparison. Figure 7 displays the recognition accuracies versus measurement dimension M.

Figure 7

Recognition rates versus M.

As can be seen, higher recognition rates are obtained when bigger M values are set. The results in Figure 7 are coincident with the statistics of Table 2 that PMO achieves the best recognition rate, which is better than the results of PCA and RDM, and SPP once again performs the worst.

4.6. Effect of Occlusion

One of the most challenging problems in FR techniques is the robustness to face occlusion. In this subsection, we test the performance of our proposed method with different occlusion scenarios. The AR database consists of occlusion face images. Taking these samples for test, Figure 8 shows the recognition rates versus measurement dimension.

Figure 8

Recognition rate versus measurement dimension with AR database.

This figure indicates that even for occlusion scenarios, as the measurement dimension increases, better recognition rate results can be achieved and SPP is the worst of all.

Figure 9 depicts the recognition rates versus the occlusion percentage with ORL database used, and $M = 80$ , $η = 0.02$ , and $w = 2$ for the proposed PMO.

Figure 9

Recognition rate versus occlusion percentage with ORL database tested.

The results of Figure 9 demonstrate that SPP and RDM are more sensitive to the occlusion, and PMO and PCA always perform better than the other two.

4.7. Image Reconstruction Experiment

The PMO algorithm is applied to distributed intelligent monitoring system, and image reconstruction can be performed on the server terminal. Denote the test sample $x \in R^{N \times 1}$ and the reconstructed signal $\hat{x} \in R^{N \times 1}$ . The mean square error (MSE) is defined as

\begin{matrix} σ_{M S E} ≜ \frac{1}{N} {‖\hat{x} - x‖}_{2}^{2} . \end{matrix}

(50)

A popular used indicator to evaluate the image reconstruction accuracy is peak signal-to-noise ratio (PSNR) defined as

\begin{matrix} σ_{P S N R} ≜ 10 \times \log 10 [\frac{{(2^{r} - 1)}^{2}}{σ_{M S E}}] \end{matrix}

(51)

with

r = 8

bits/pixel. Figure 10 shows the evaluation of

σ_{P S N R}

for varying measurement dimension M with

η = 0.02

and

w = 2

for the proposed PMO.

Figure 10

$σ_{P S N R}$ versus measurement dimension M.

Again, the results of PMO and PCA are similar and better than the others. Please note that as ${A l g}_{S P P}^{S R C}$ performs much worse than the others, we omit it in this figure. One example is given in Figure 11 to demonstrate the visual effect of the proposed system.

Figure 11

Visual effect of the reconstructed image. (a) Test image. (b) Reconstructed image.

5. Conclusion

This work presents a novel CS-based face recognition scheme for distributed intelligent monitoring systems. A new compression strategy has been proposed based on the projection matrix optimization. The analytical solution set of the corresponding optimization problem has also been derived. With this new compression method, two frameworks have been presented for compressive face recognition. Experimental results on face recognition tasks demonstrate the superiority of the proposed approaches.

The monitor system always encounters the problem that we need to register new face and update the whole system. The FR oriented online CS system learning is an interesting direction of research for distributed intelligent monitor system design. In our approach, the dictionary is formed directly with the samples. It is expected that more efficient FR oriented CS systems can be achieved if both the dictionary and the sensing matrix are optimized alternatively or simultaneously.

Footnotes

Competing Interests

The authors declare that they have no competing interests.

Acknowledgments

This work was supported by NSFC Grants 61273195, 61304124, 61473262, and 61503339 and ZJNSF Grants LY13F010009 and LQ14F030008.

References

Zhao

W. Y.

Chellappa

Phillips

P. J.

Rosenfeld

Face recognition: a literature survey

ACM Computing Surveys 2003 35 4 399 458

10.1145/954339.954342

2-s2.0-1842499650

Wallace

G. K.

The JPEG still picture compression standard

IEEE Transactions on Consumer Electronics 1992 38 1

2-s2.0-0026818192

Taubman

D. S.

Marcellin

M. W.

JPEG2000 Image Compression Fundamentals, Standards and Practice 2002

New York, NY, USA

Kluwer Academic

Chen

M. J.

PCA and LDA in DCT domain

Pattern Recognition Letters 2005 26 15 2474 2482

10.1016/j.patrec.2005.05.004

2-s2.0-25844493811

Turk

Pentland

Eigenfaces for recognition

Journal of Cognitive Neuroscience 1991 3 1 72 86

2-s2.0-0026065565

Donoho

D. L.

Compressed sensing

Transactions on Information Theory 2006 52 4 1289 1306

10.1109/tit.2006.871582

MR2241189

2-s2.0-33645712892

Candes

E. J.

Wakin

M. B.

An introduction to compressive sampling

IEEE Signal Processing Magazine 2008 25 2 21 30

10.1109/msp.2007.914731

2-s2.0-41949092318

Liu

Zhu

Zhang

Cho

S. H.

Distributed compressed video sensing in camera sensor networks

International Journal of Distributed Sensor Networks 2012 2012 10

352167

10.1155/2012/352167

2-s2.0-84872819007

Jiang

Wang

Compressed sensing based on the characteristic correlation of ECG in hybrid wireless sensor network

International Journal of Distributed Sensor Networks 2015 2015 8

325103

10.1155/2015/325103

10.

Donoho

D. L.

Elad

Optimally sparse representation in general (nonorthonormal) dictionaries via

l^{1}

minimization

Proceedings of the National Academy of Sciences 2003 100 5 2197 2202

10.1073/pnas.0437847100

11.

Elad

Optimized projections for compressed sensing

IEEE Transactions on Signal Processing 2007 55 12 5695 5702

10.1109/TSP.2007.900760

MR2440203

2-s2.0-36749022762

12.

Zelnik-Manor

Rosenblum

Eldar

Y. C.

Sensing matrix optimization for block-sparse decoding

IEEE Transactions on Signal Processing 2011 59 9 4300 4312

10.1109/tsp.2011.2159211

MR2865985

2-s2.0-80051751662

13.

Zhu

Z. H.

Yang

D. H.

Chang

L. P.

Bai

On projection matrix optimization for compressive sensing systems

IEEE Transactions on Signal Processing 2013 61 11 2887 2898

10.1109/tsp.2013.2253776

MR3064101

2-s2.0-84877903708

14.

Cleju

Optimized projections for compressed sensing via rank-constrained nearest correlation matrix

Applied and Computational Harmonic Analysis 2014 36 3 495 507

10.1016/j.acha.2013.08.005

MR3175090

2-s2.0-84895927017

15.

Bai

Jiang

Chang

Alternating optimization of sensing matrix and sparsifying dictionary for compressed sensing

IEEE Transactions on Signal Processing 2015 63 6 1581 1594

10.1109/TSP.2015.2399864

MR3316911

16.

Wright

Yang

A. Y.

Ganesh

Sastry

S. S.

Robust face recognition via sparse representation

IEEE Transactions on Pattern Analysis and Machine Intelligence 2009 31 2 210 227

10.1109/TPAMI.2008.79

2-s2.0-61549128441

17.

Qiao

Chen

Tan

Sparsity preserving projections with applications to face recognition

Pattern Recognition 2010 43 1 331 341

10.1016/j.patcog.2009.05.005

2-s2.0-69049112203

18.

Aharon

Elad

Bruckstein

K-SVD: an algorithm for designing overcomplete dictionaries for sparse representation

IEEE Transactions on Signal Processing 2006 54 11 4311 4322

10.1109/tsp.2006.881199

2-s2.0-33750383209

19.

Chin

T.-J.

Suter

Incremental kernel principal component analysis

IEEE Transactions on Image Processing 2007 16 6 1662 1674

10.1109/TIP.2007.896668

MR2466287

2-s2.0-34249337437

20.

Jia

Chan

T.-H.

Robust and practical face recognition via structured sparsity

Computer Vision—ECCV 2012: 12th European Conference on Computer Vision, Florence, Italy, October 7–13, 2012, Proceedings, Part IV 2012 7575

Berlin, Germany

Springer

331 344 Lecture Notes in Computer Science

10.1007/978-3-642-33765-9_24

21.

Yang

Zhang

Yang

Zhang

Robust sparse coding for face recognition

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR '11)

June 2011

Providence, RI, USA

625 632

10.1109/cvpr.2011.5995393

2-s2.0-80052913149

22.

Mairal

Bach

Ponce

Task-driven dictionary learning

IEEE Transactions on Pattern Analysis and Machine Intelligence 2012 34 4 791 804

10.1109/TPAMI.2011.156

2-s2.0-84857419890

23.

Horn

R. A.

Johnson

C. R.

Matrix Analysis 2012 2nd

Cambridge, UK

Cambridge University Press

24.

Samaria

F. S.

Harter

A. C.

Parameterisation of a stochastic model for human face identification

Proceedings of the 2nd IEEE Workshop on Applications of Computer Vision

December 1994

138 142

2-s2.0-0028734063

25.

Georghiades

F. S. A.

Belhumeur

P. N.

Kriegman

D. J.

From few to many: illumination cone models for face recognition under variable lighting and pose

IEEE Transactions on Pattern Analysis and Machine Intelligence 2001 23 6 643 660

10.1109/34.927464

2-s2.0-0035363672

26.

Lee

K.-C.

Kriegman

D. J.

Acquiring linear subspaces for face recognition under variable lighting

IEEE Transactions on Pattern Analysis and Machine Intelligence 2005 27 5 684 698

10.1109/TPAMI.2005.92

2-s2.0-18144420071

27.

Sim

Baker

Bsat

The CMU pose, illumination, and expression (PIE) database

Proceedings of the 5th IEEE International Conference on Automatic Face Gesture Recognition (FGR '02)

May 2002

Washington, DC, USA

53 58

10.1109/afgr.2002.1004130

2-s2.0-4544292940

28.

Martinez

A. M.

Benavente

The AR face database

CVC Technical Report 1998 24