Sage Journals: Discover world-class research

Abstract

Low-rank representation (LRR) has attracted wide attention of researchers in recent years due to its excellent performance in the exploration of high-dimensional subspace structures. However, in the existing semi-supervised learning problem based on the LRR method, graph construction and semi-supervised learning are two separate steps. Therefore, the existing label information in the data set is not well used to guide the construction of the affinity graph. Therefore, these methods do not guarantee that the final result is a global optimal solution. This paper proposes a graph regularized low-rank representation for semi-supervised learning, called GLR2S2. This method combines the construction of affinity graph with semi supervised learning and unifies them into an optimization framework. By solving the joint optimization problem, the global optimal solution can be obtained. Experimental results on several standard data sets show that the GLR2S2 method proposed in this paper is effective.

Keywords

Low-rank representation sparse representation semi-supervised learning graph construction

Introduction

In most of the problems of pattern recognition, artificial intelligence and computer vision, we often face the problem of insufficient data labeling, and the acquisition of label information is very difficult and expensive. In practice, the data we can get widely is often unlabeled. This brings great difficulties to machine learning problem. How to make full use of limited data labels to improve the performance of learning algorithms is a key concern of many researchers. In this case, semi-supervised learning (SSL) can make full use of limited labeled data and a large amount of unlabeled data,¹ and has been deeply studied. In recent years, semi-supervised learning has attracted widespread attention in the fields of machine learning, computer vision, and pattern recognition. Among the current semi-supervised learning methods, graph-based SSL (G-SSL) is particularly compelling, mainly because it has achieved great success in practical applications.

The graph-based semi-supervised learning approach relies heavily on the construction of a graph $G = (V, E)$ that can represent data structures, where $V$ is the vertex set of the data set and $E$ is the edge set of the graph associated with the weight matrix $W$ . The graph represents the relationship between the data in pairs. Through this graph, the label information of the labeled samples in the data set can be efficiently and efficiently propagated to other unlabeled samples in the data set. Therefore, for many machine learning tasks, such as clustering and classification, it is important to construct a good graph that represents the structure of the data set. A common assumption of the G-SSL approach is the consistency of the labels, for example, adjacent data points tend to have the same label. For semi-supervised learning, many efforts have been made to explore how to construct a good graph.^2–5 From a machine learning perspective, an infographic graph should have three characteristics²: high discriminating power, sparsity, and adaptive neighborhood.

Although we can find millions of such relationships from the data, recent research on sparse representations and low-rank representations suggests the importance of choosing pairwise relationships. Yan et al.^6,7 proposed a $l_{1}$ -graph construction method based on sparse representation (SR),⁸ which solved an $l_{1}$ optimization problem. However, there is an obvious disadvantage of the method based on sparse representation, that is, sparse representation cannot describe the global structure of the data. In order to capture the global structure information of the entire data set, Liu et al. proposed a low-rank representation (LRR)⁹ method and constructed an undirected affinity graph using the obtained low-rank coefficients. Under the global low-rank constraint, the LRR-graph based method can capture the global structure information of the entire data set. Related studies have shown that under appropriate conditions, the LRR method can correctly maintain the membership of data samples in the same subspace. However, the LRR method is more prone to produce denser graphs than the $l_{1}$ -graph, but this situation is not ideal for semi-supervised learning methods based on graph.² Furthermore, since the representation coefficients can be negative, this lacks physical meaning for many image processing problems. In fact, the non-negative coefficients are more in line with the biological modeling of visual data,^10,11 which can lead to better performance of data representation¹¹ and graphic construction.¹² Zhuang et al.¹³ proposed a semi-supervised learning method for non-negative low-rank sparse graphs (NNLRS), which uses the edges of the representation matrix learning graph. In this way, the obtained graph can maintain the global hybrid structure of the subspace and maintain the local linear structure of the subspace. In addition, in order to maintain the local structural information of the data, a regularization term is added to the objective function of the LRR method, and a regularized low-rank representation method¹⁴ is proposed, which can describe the hyperspectral image well.

Although the methods based on sparse representation and low-rank representation have achieved great success, there are still some obvious shortcomings in these methods. In most graph-based semi-supervised learning methods, the structure of the graph is usually predefined. Therefore, graph construction and semi-supervised learning are often two independent steps, so that these algorithms cannot get the global optimal solution. Since the semi-supervised learning algorithm relies heavily on the construction of graphs, it is necessary to combine semi-supervised learning with graph construction to solve jointly. Fang et al.¹⁵ proposed a robust semi-supervised subspace clustering algorithm via non-negative low-rank representation (NNLRS). Yu et al.¹⁶ proposed a nonlinear learning method using local coordinate coding for subspace learning. By using weak label regularization local coordinate coding (LCC), Wang et al.¹⁷ proposed a face annotation method, which simultaneously uses weak label regularization and sparse features based on graphs to improve the labels of similar face images. Peng et al.¹⁸ proposed an enhanced LRR via sparse manifold adaption. By using the label information for semi-supervised learning, Yang et al.¹⁹ proposed the label constrained sparse low-rank representation algorithm

Based on the above analysis, we propose a new graph regularized low-rank representation method for semi-supervised learning (GLR2S2). In this algorithm, the construction of the graph and semi-supervised learning are combined in a unified optimization framework, which ensures that the final result is globally optimal. Through joint optimization of graph construction and semi-supervised learning, the label information of data samples can be accurately propagated in the algorithm learning process. The main contributions of this paper are summarized as follows:

Previously, most G-SSL algorithms usually used graph construction and semi-supervised learning as two independent steps, and the GLR2S2 algorithm proposed in this paper integrates these two tasks into a unified optimization framework, which can guarantee the final result as the overall optimum.

By introducing graph regularization and sparse constraints into the LRR algorithm framework, the proposed GLR2S2 method considers the intrinsic geometric features of the recovered data. In this way, the GLR2S2 algorithm can simultaneously capture the intrinsic local structure information and global structure information of high-dimensional data.

Aiming at the optimization problem of GLR2S2 algorithm, this paper proposes an effective optimization strategy.

The rest of this paper is arranged as follows: Section 2 briefly reviews the relevant work. Section 3 introduces the proposed GLR2S2 and its optimization. A large number of experiments have been carried out to illustrate the effectiveness of GLR2S2 in Section 4. Finally, Section 5 gives the conclusion.

Related works

In this section, we will briefly introduce the algorithms of LRR and GLRR,²⁰ as well as the semi-supervised classification framework used in this paper.

LRR and GLRR

Assume that data $X = [x_{1}, x_{2}, \dots, x_{n}] \in R^{d \times n}$ is a set of points in $d$ -dimensional space and the number of samples is $n$ . The goal of low-rank representation (LRR) is to represent each data sample as a linear combination of the bases in $A = [a_{1}, a_{2}, \dots, a_{m}] \in R^{d \times m}$ as $X = A Z$ , $Z = [z_{1}, z_{2}, \dots, z_{n}]$ is the coefficient matrix, in which each $z_{i}$ is the representation coefficient of the sample $x_{i}$ . Each element $z_{i}$ can be regarded as a reconstruction contribution of $x_{i}$ with $A$ as the basis. However, when the dictionary $A$ is over-complete, there are many feasible solutions to solve this problem. Low-Rank Representation (LRR) finds the lowest rank solution by solving the following optimization problem:

\min_{Z} rank (Z) s . t . X = A Z

(1)

Among them, low-rank constraints have good performance in capturing the global structure of data.⁹ The optimal solution of problem (1) is the lowest rank representation of data matrix $X$ , w.r.t. the dictionary $A$ . In this paper, for simplification, we use the data matrix $X$ itself as a representation dictionary. However, due to the discreteness of rank function, the above problem is NP-hard. Fortunately, in practical applications, we often use the nuclear norm as the approximation of rank function, so that problem (1) can be transformed into the following optimization problems.

\min_{Z} {‖ Z ‖}_{*} s . t . X = A Z

(2)

where

{‖ Z ‖}_{*}

denotes the nuclear norm, which is defined as the sum of all singular values of

Z

. In the real world, data sets are often corrupted by noise. If we consider these situations, we need to add noise term to the objective function. Therefore, a more reasonable objective function of LRR can be expressed as follows:

\min_{Z, E} {‖ Z ‖}_{*} + λ {‖ E ‖}_{2, 1} s . t . X = A Z + E

(3)

where the

l_{2, 1}

-norm is defined as

{‖ E ‖}_{2, 1} = \sum_{j = 1}^{n} \sqrt{\sum_{i = 1}^{d} e_{i j}^{2}}

, and the parameter

λ

is used to balance the weight of the low-rank term and the reconstruction error term.

If the data in the high-dimensional space is located in a union of linear subspaces, then LRR algorithm can extract the low-dimensional structures embedded in the high-dimensional space very well. But in practice, this assumption cannot be satisfied. For example, the face images are sampled from nonlinear low-dimensional manifolds embedded in high-dimensional space. In this case, LRR method cannot find the intrinsic geometry structure and discriminative information of data, which is very unfavorable for practical application.

In order to maintain the intrinsic manifold structure of the data sets,²⁰ introduced the graph regularization term into the objective function of LRR, and proposed the GLRR method. The objective of GLRR is defined as follows:

\min_{Z, E} {‖ Z ‖}_{*} + λ {‖ E ‖}_{2, 1} + β t r (Z L Z^{T}) s . t . X = A Z + E

(4)

where

L

is a graph Laplacian constructed by “HeatKernel” function in Euclidean space. The model emphasizes the importance of local consistency of data and ignores the exclusion information of data sets.

Semi-supervised classification

In this section, we briefly introduce a very popular semi-supervised learning framework Gauss Fields and Harmonic Functions (GFHF).²¹ Suppose $Y \in R^{n \times c}$ is a label matrix, where $Y_{i j} = 1$ if the sample $x_{i}$ is associated with the label $j$ for $j \in \{1, 2, \dots, c\}$ , otherwise $Y_{i j} = 0$ . $F \in R^{n \times c}$ is a predicted label matrix, and the label fitness and manifold smoothness are estimated on the graph. $F_{i}$ and $Y_{i}$ are denoted as the $i$ -th rows of $F$ and $Y$ , respectively. The objective function of GFHF is defined as follows:

\min_{F} \frac{1}{2} \sum_{i, j = 1}^{n} {‖ F_{i} - F_{j} ‖}^{2} S_{i j} + λ_{\infty} \sum_{i = 1}^{u} {‖ F_{i} - Y_{i} ‖}^{2}

(5)

where

λ_{\infty}

is a very large number,

u

is the number of the unlabeled samples, which makes

\sum_{i = 1}^{u} {‖ F_{i} - Y_{i} ‖}^{2} = 0

is approximately satisfied, and

F

is the predicted labels of all samples.

S^{n \times n}

is a graph weight matrix, which represents the similarity of a pair of training samples. The above problem can be reformulated as follows

\min_{F} \frac{1}{2} t r (F^{T} L F) + t r ({(F - Y)}^{T} U (F - Y))

(6)

where

L \in R^{n \times n}

is the graph Laplacian matrix and calculated as

L = D - S

D_{i i} = \sum_{j} S_{i j}

is a diagonal matrix.

U \in R^{n \times n}

is also a diagonal matrix with the first

u

and the rest

n - u

diagonal elements as

λ_{\infty}

and 0, respectively.

Graph regularized low-rank representation for semi-supervised learning

In this section, the graph regularized low-rank representation for semi-supervised learning (GLR2S2) is introduced. The goal of the proposed GLR2S2 method is, under a unified optimization framework, to perform the construction of the graph and the semi-supervised learning at the same time, thus, we can get an overall optimal solution.

Objective function of GLR2S2

In GLR2S2, the graph learning and semi-supervised learning are simultaneously completed within one step. Based on low-rank representation theory and GFHF, we propose the following objective function of GLR2S2

\min_{F, Z, E} \sum_{i = 1}^{n} \sum_{j = 1}^{n} {‖ F_{i} - F_{j} ‖}^{2} Z_{i j} + t r ({(F - Y)}^{T} U (F - Y)) + {‖ Z ‖}_{*} + α {‖ Z ‖}_{1} + β t r (Z L Z^{T}) + γ {‖ E ‖}_{2, 1}

(7)

s.t. $X = A Z + E$ , $Z \geq 0$

where $α$ , $β$ , $γ$ are the parameters, which are used to balance the importance of the corresponding in the objective function. The first two items are a semi-supervised learning framework. The third term uses the low-rank constraint to guarantee the affinity matrix $Z$ to capture the global mixture structure of the subspaces. The fourth term uses the $l_{1}$ -norm to enable the sparsity of the coefficients. The fifth item is the graph Laplacian regularizer, which considers the intrinsic geometric structure of data. As for the effect of noise, we use the $l_{2, 1}$ -norm and $l_{2, 1}$ -norm encourages the columns of $E$ to be zero, which assumes that the damage is “sample-specific”, that is, some data vectors are destroyed while others are clean. Non-negative constraint on $Z$ is designed to ensure that the coefficients are meaningful and better reflect the dependencies between data points.

LADMAP for solving GLR2S2

There are several variables in the optimization objective function, in order to effectively solve the problem (7), in this paper, we use linearized alternating direction method and adaptive penalty method (LADMAP).²² In order to make the objective function separable, we introduce two auxiliary variables $W$ and $J$ . Thus, the optimization problem can be rewritten as follows:

\min_{F, Z, E} \sum_{i = 1}^{n} \sum_{j = 1}^{n} {‖ F_{i} - F_{j} ‖}^{2} W_{i j} + t r ({(F - Y)}^{T} U (F - Y)) + {‖ Z ‖}_{*} + α {‖ J ‖}_{1} + β t r (Z L Z^{T}) + γ {‖ E ‖}_{2, 1}

(8)

s.t. $X = A Z + E$ , $Z = W$ , $Z = J$ , $J \geq 0$

In order to eliminate the three linear constraints in (8), we introduced three Lagrange multipliers $Y_{1}$ , $Y_{2}$ and $Y_{3}$ . Therefore, the optimization problem can be rewritten as the following unconstrained minimization problem:

\min_{F, Z, E} \sum_{i = 1}^{n} \sum_{j = 1}^{n} {‖ F_{i} - F_{j} ‖}^{2} W_{i j} + t r ({(F - Y)}^{T} U (F - Y)) + {‖ Z ‖}_{*} + α {‖ J ‖}_{1} + β t r (Z L Z^{T}) + γ {‖ E ‖}_{2, 1} + 〈Y_{1}, X - A Z - E〉 + 〈Y_{2}, Z - W〉 + 〈Y_{3}, Z - J〉 + \frac{μ}{2} ({‖ X - A Z - E ‖}_{F}^{2} + {‖ Z - W ‖}_{F}^{2} + {‖ Z - J ‖}_{F}^{2})

(9)

where

ψ (Z, W, J, E, Y_{1}, Y_{2}, Y_{3}) = β t r (Z L_{1} Z^{T}) + \frac{μ}{2} {‖ X - A Z - E + \frac{1}{μ} Y_{1} ‖}_{F}^{2} + \frac{μ}{2} {‖ Z - W + \frac{1}{μ} Y_{2} ‖}_{F}^{2} + \frac{μ}{2} {‖ Z - J + \frac{1}{μ} Y_{3} ‖}_{F}^{2}

and

〈A, B〉 = t r (A^{T} B)

μ \geq 0

is a penalty parameter. This problem can be easily solved by alternately updating one variable while other variables are fixed. Then, the multiplier is updated and the optimization process is completed iteratively until the convergence condition is satisfied.

Computation of $Z$

Solving (9) w.r.t. $Z$ is equivalent to optimizing the following objective:

Z_{k + 1} = \arg \min_{Z} {‖ Z ‖}_{*} + 〈\nabla_{Z} ψ (Z_{k}), Z - Z_{k}〉 + \frac{θ μ_{k}}{2} {‖ Z - Z_{k} ‖}_{F}^{2}

(10)

where

ψ (Z, W_{k}, J_{k}, E_{k}, Y_{1}^{k}, Y_{2}^{k}, Y_{3}^{k})

is approximated by its linerization

〈\nabla_{Z} ψ (Z_{k}), Z - Z_{k}〉

Z_{k}

plus a proximal term

\frac{θ μ_{k}}{2} {‖ Z - Z_{k} ‖}_{F}^{2}

and

\nabla_{Z} ψ (Z_{k})

is the partial differential of

ψ

with respect to

Z

. As long as

θ_{μ_{k}} > 2 β {‖ L_{1} ‖}_{2} + μ (1 + {‖ X ‖}_{2}^{2})

, where

{‖ \cdot ‖}_{2}

is the spectral norm of a matrix, i.e., the largest singular value, the above replacement is valid. then (10) has a closed-form solution given by:

Z_{k + 1}^{*} = Θ_{\frac{1}{θ μ_{k}}} (Z_{k} - \nabla_{Z} ψ (Z_{k}) / θ)

(11)

where

θ = {‖ A ‖}_{F}^{2}

Θ (\cdot)

denotes the singular value thresholding operator (SVT).²³

Computation of $J$

Similarly, solving (9) w.r.t. $J$ is equivalent to optimizing the following objective, while other variables are fixed to their current value

J_{k + 1} = \arg \min_{J} α {‖ J_{k} ‖}_{1} + \frac{μ}{2} {‖ J_{k} - \frac{1}{μ_{k}} Y_{3, k} ‖}_{F}^{2}

(12)

The sub-problem (12) has the following objective function:

J_{k + 1} = \max \{S_{\frac{α}{μ}} (Z_{k + 1} + \frac{1}{μ_{k}} Y_{3, k}), 0\}

(13)

where

S (\cdot)

denotes the shrinkage operator.²⁴

Computation of $E$

The sub-problem for updating $E$ can be recast as:

E_{k + 1} = \arg \min_{E} γ {‖ E ‖}_{2, 1} + \frac{μ_{k}}{2} {‖ X - A Z_{k + 1} + \frac{Y_{1, k}}{μ_{k}} - E ‖}_{F}^{2}

(14)

The solution is defined by

E_{k + 1} = Γ_{\frac{γ}{μ}} (X - A Z_{k + 1} + \frac{1}{μ_{k}} Y_{1, k})

(15)

where

Γ (\cdot)

denotes the

l_{2, 1}

minimization operator.⁹

Computation of $W$

Solving (9) w.r.t. $W$ is equivalent to optimizing the following objective:

W_{k + 1} = \arg \min_{W \geq 0} t r (Ξ (R ⨀ W)) + \frac{μ_{k}}{2} {‖ W - (Z_{k + 1} + \frac{1}{μ_{k}} Y_{2, k}) ‖}_{F}^{2}

(16)

where

R_{i j} = \frac{1}{2} {‖ F_{i} - F_{j} ‖}^{2}

⨀

is a Hadamard product operator of matrix and

Ξ

is a matrix with all elements are ones. We can decompose the problem (16) into

n

independent sub-problems, each of which can be expressed as a weighted non-negative sparse coding problem, i.e.

\min_{W_{i}} \sum_{g = 1}^{n} {(W_{k})}_{g}^{i} ⨀ R_{g}^{i} + \frac{μ_{k}}{2} {‖ {(W_{k})}^{i} - {(Z_{k + 1} + \frac{Y_{2, k}}{μ_{k}})}^{i} ‖}_{2}^{2}

(17)

s.t. $w \geq 0$

where ${(W_{k})}_{g}^{i}$ and ${(R)}_{g}^{i}$ are the $g$ -th elements of $i$ -th columns of matrix $W_{k}$ and $R$ respectively.

Computation of $F$

The sub-problem for updating $F$ can be recast as:

F_{k + 1} = \arg \min_{F} \sum_{i = 1}^{n} \sum_{i = 1}^{n} {‖ F_{i, k} - F_{j, k} ‖}^{2} W_{i j, k} + t r ({(F_{k} - Y)}^{T} U (F_{k} - Y)) = \arg \min_{F} t r ({(F_{k})}^{T} L_{2} F_{k}) + t r ({(F_{k} - Y)}^{T} U (F_{k} - Y))

(18)

where

L_{2} \in R^{n \times n}

is the graph Laplacian matrix and calculated as

L_{2} = D - W

D_{i i} = \sum_{j} W_{i j}

is the diagonal matrix. It is straightforward to set the derivative of (18) with respect to

F

to zero, namely

\partial (\min_{F} t r ({(F_{k})}^{T} F_{k}) + t r ({(F_{k} - Y)}^{T} U (F_{k} - Y))) / \partial F_{k} = 0

(19)

Then, we have

F_{k + 1} = {(L + U)}^{- 1} U Y

(20)

In Algorithm 1, a detailed procedure for solving the proposed method is described. Algorithm 1

LADMAP for solving GLR2S2

Input: Data matrix $X$ ; Label indicator matrix $Y$ ; parameters $α$ , $β$ and $γ$

Initialization: $Z_{0} = W_{0} = J_{0} = 0$ ; $Y_{1} = Y_{2} = Y_{3} = 0$ ; $μ_{0} = 0.1$ ; $μ_{\max} = 10^{7}$ ; $ρ_{0} = 1.01$ ; $ϵ_{1} = 10^{- 7}$ ; $ϵ_{2} = 10^{- 6}$ ; $θ = {‖ A ‖}_{F}^{2}$ ; $k = 0$

while not converged do

1. Fixed the others and update $Z$ by solving (10)

2. Fixed the others and update $J$ by solving (12)

3. Fixed the others and update $E$ by solving (14)

4. Fixed the others and update $W$ by solving (16)

5. Fixed the others and update $F$ by solving (18)

6. Update the multipliers as follows

$Y_{1, k + 1} = Y_{1, k} + μ_{k} (X - A Z_{k} - E_{k})$

$Y_{2, k + 1} = Y_{2, k} + μ_{k} (Z_{k} - W_{k})$

$Y_{3, k + 1} = Y_{3, k} + μ_{k} (Z_{k} - J_{k})$

7. Update the parameter $μ$ follows

$μ_{k + 1} = \min (μ_{\max}, ρ μ_{k})$

where $ρ = \{\begin{matrix} ρ_{0}, i f μ_{k} Ω / {‖ X ‖}_{F} \leq ϵ_{2} \\ 1, o therwise \end{matrix}$

8. Check the convergence conditions

${‖ X - A Z_{k} - E_{k} ‖}_{F} \leq ϵ_{1}$ or $μ_{k} Ω / {‖ X ‖}_{F} \leq ϵ_{2}$

where

Ω = \max (\sqrt{θ} {‖ Z_{k} - Z_{k + 1} ‖}_{F}, {‖ J_{k} - J_{k + 1} ‖}_{F}, {‖ W_{k} - W_{k + 1} ‖}_{F}, {‖ E_{k} - E_{k + 1} ‖}_{F}, {‖ F_{k} - F_{k + 1} ‖}_{F})

9. Update $k$

$k \leftarrow k + 1$

end while

Output: $F$ , $Z$ , $E$

Convergence and complexity analysis

The algorithm described above converges to a globally optimal solution of (7) as it is a direct application of LADMAP.

When updating Z by singular value thresholding, see (11), we may predict the rank $r$ of $Z_{k + 1}$ using the rank prediction strategy described in,^24,25 which grows from a small value and stabilizes at the true rank when the iteration goes on. Moreover, as in²² we may use the Lanczos method to compute the leading singular values and singular vectors, which only requires multiplication of ${\tilde{Z}}_{k} = Z_{k} - \nabla_{Z} Ψ (Z_{k}) / θ_{1}$ and its transpose, with vectors, which can be efficiently computed by successive matrix-vector multiplication, rather than forming ${\tilde{Z}}_{k}$ explicitly and then multiplying it with vectors. With such treatments, the complexity of each iteration of Algorithm 1 is $O (r n^{2})$ , where $n$ is the number of samples.

Experiments

Experiment setup

Datasets: we test our proposed method on three public datasets for evaluation: extended Yale B, CMU PIE and USPS.

-The extended Yale B: The dataset contains 38 subjects and there are about 64 frontal images for each individual. The facial images were taken in different light conditions. The face images of each subject correspond to a low-dimensional subspace. In this paper, because of the computational cost involved, we select the first 15 classes of the extended Yale B dataset as our test dataset. Thus, the test dataset contains in total 640 images and each image are resized to $32 \times 32$ pixels.

-CMU PIE: This face dataset contains 41,368 images of 68 subjects with different poses, illumination and expressions. We select the first 15 subjects and only use their images in five near frontal poses (C05, C07, C09, C27, C29) and under different illuminations and expressions. Each image is manually cropped and normalized to a size of $32 \times 32$ pixels.

-USPS: The handwritten digit dataset contains 9298 handwritten digit images in total, each having $16 \times 16$ pixels. We only use the images of digits 1, 2, 3 and 4 as four classes, each having 1269, 926, 824 and 852 samples, respectively. So, there are 3874 images in total.

Figure 1 shows some sample images in these four image datasets.

Figure 1.

Sample images from ORL, extended Yale B, CMU PIE and USPS datasets. (a) Extended Yale B, (b) CMU PIE and (c) USPS.

Comparison methods: To verify the effectiveness of the proposed method, we compare the proposed method with the following state-of-the-art baseline methods:

-KNN-graph: In the experiment, we use Euclidean distance as similarity measure, and use Gauss kernel function to weigh the edges of the graph. In this paper, we use two constructed graphs, KNN1 and KNN2, in which the number of nearest neighbors is set to 5 and 8, respectively.

-LLE-graph: According to,²⁶ we construct two LLE graphs, which are represented as LLE1 and LLE2, with the number of the nearest neighbors being 8 and 10, respectively. Because the weights W of LLE graphs may be negative and symmetrical, similar to,⁷ we treat them symmetrically by $W = (|W| + |W^{T}|) / 2$ in this paper.

- $l_{1}$ -graph: According to the method in,⁷ we construct $l_{1}$ -graph. Because the weights $W$ of $l_{1}$ -graph is asymmetrical, we symmetrize it in this paper.

-SPG: Essentially, the SPG problem is a lasso problem with non-negative constraints, without considering corruption errors. Here, we construct the SPG graph according to the method of¹²

-LRR-graph: According to the method in,⁹ we construct LRR graphs and deal with them symmetrically according to l₁-graph we do. The parameters of LRR are the same as those in.⁹

-GLRR-graph: According to the method in,²⁰ we construct the GLRR-graph.

-NNLRS-graph: According to the method in,¹³ we construct the NNLRS-graph, the two regularization parameters are set to $β = 0.2$ and $λ = 10$ .

-MLRR-graph: According to the method in,¹⁸ we construct the MLRR-graph

Experimental studies

The purpose of semi-supervised learning task is to reveal more unlabeled information with limited labeled data. Therefore, in this paper, the percentage of labeled samples ranges from 10% to 60%, and the rest are unlabeled samples. The parameters of the GLR2S2 method are set to $α = 0.5$ , $β = 0.01$ , $γ = 10$ . For fair comparison, we record the index of randomly selected labeled samples at each level and use these indexes for all of the above methods. For each configuration, we run each algorithm 50 times independently. The experimental results are reported in Tables 1 to 3.

Table 1.

Classification accuracy rates (%) on extended Yale B dataset.

Labeled samples	10%	20%	30%	40%	50%	60%
KNN1	67.11	68.91	71.44	73.65	75.22	77.02
KNN2	63.16	64.41	66.46	69.03	70.27	71.42
LLE1	71.00	74.16	77.76	80.18	82.39	84.25
LLE2	70.24	73.35	77.17	80.1	82.35	84.06
$l_{1}$ -graph	53.18	49.67	47.67	42.84	34.21	22.44
SPG	83.63	87.61	90.43	92.93	94.37	95.58
LRR	71.78	75.54	77.67	80.58	81.96	83.91
GLRR	73.83	75.62	77.91	80.77	82.34	84.33
MLRR	81.91	83.15	85.89	88.04	90.37	92.30
NNLRS	94.44	94.69	95.71	96.25	96	96.77
GLR2S2	94.98	95.36	96.16	96.84	96.61	97.49

Table 2.

Classification accuracy rates (%) on CMU PIE dataset.

Labeled samples	10%	20%	30%	40%	50%	60%
KNN1	65.72	66.94	69.89	71.54	73.04	74.91
KNN2	63.58	63.89	66.49	67.85	69.55	70.91
LLE1	67.75	69.58	73.48	76.38	78.35	80.44
LLE2	67.47	69.17	72.99	75.99	77.78	79.98
$l_{1}$ -graph	78.29	82.82	87.94	90.99	93.39	94.87
SPG	80.25	84.55	89.29	91.75	93.71	95.05
LRR	68.74	70.18	74.39	76.14	78.76	79.95
GLRR	69.73	71.25	75.15	76.98	79.91	81.2
MLRR	73.17	75.93	78.29	80.16	83.67	85.29
NNLRS	87.78	89.37	90.18	92.92	96.00	95.00
GLR2S2	88.57	90.49	91.03	94.16	96.92	96.31

Table 3.

Classification accuracy rates (%) on USPS dataset.

Labeled samples	10%	20%	30%	40%	50%	60%
KNN1	96.87	97.78	98.45	98.8	99.18	99.35
KNN2	96.79	97.90	98.47	98.82	99.14	99.28
LLE1	72.31	77.57	80.82	83.38	85.72	87.39
LLE2	64.34	71.04	74.70	77.47	79.99	82.31
$l_{1}$ -graph	66.48	73.58	81.08	83.36	88.33	91.11
SPG	93.08	95.96	97.31	98.12	98.86	99.17
LRR	96.51	98.17	98.78	99.08	99.39	99.51
GLRR	96.57	98.17	98.81	98.99	99.38	99.51
MLRR	96.82	98.43	98.72	98.98	99.46	99.51
NNLRS	97.20	98.38	98.87	99.12	99.41	99.52
GLR2S2	97.75	98.83	99.08	99.39	99.60	99.64

From the experimental results, we can observe that:

In most cases, compared to other graph based semi-supervised learning algorithms, the proposed GLR2S2 method can consistently get the highest classification accuracy, even with low labeled samples rate.

Compared with NNLRS method which also use the sparse and low-rank constraints to construct affinity graph, the proposed GLR2S2 method is able to use the label information to construct affinity matrix effectively. In most cases, it can be seen that the classification accuracy has been significantly improved.

In comparison methods, $l_{1}$ -graph uses sparse constraint, SPG-graph imposes nonnegative sparse constraint on affinity matrix, which captures only the local linear structure of data. LRR graphs impose low-rank constraints that capture global mixture of subspace structures. However, LRR graphs often result in dense graphs, which are unnecessary for G-SSL. The proposed GLR2S2 method integrates the advantages of low-rank and sparse representation. The experimental results also prove its effectiveness.

There are three parameters that affect the performance of our proposed GLR2S2 method. $α$ and $β$ are parameters to control the effects of sparse constraints and local affinity constraints, respectively. $γ$ is to deal with a serious corruption error in data. Similar to previous experimental settings, we run GLR2S2 50 independent times for each parameter combination on an extended Yale B data set. We selected 50% samples as the labeled samples and the remaining as the unlabeled samples. Figure 2 shows the experimental results.

Figure 2.

Classification accuracy with varied parameters (a) , (b) - and (c) ○.

As can be seen from Figure 1, the performance of GLR2S2 varies steadily over a relatively large range of parameters $α$ , $β$ and $γ$ . $α$ is used to balance sparsity, when the value of $α$ is small, the performance will also be reduced. This means that low-rank and sparse property are very important for graph construction. As for $β$ , when we set a large value, the accuracy will decrease. $γ$ is used to deal with a serious corruption error in data. Experiments show that it is widely applicable to selection of $γ$ .

Conclusion

This paper proposes a new semi-supervised subspace clustering method GLR2S2, which uses label information to guide the construction of affinity graph. In addition, GLR2S2 integrates affinity construction and semi-supervised subspace clustering into a unified framework to ensure overall optimality. Aiming at the optimization problem of less auxiliary variables and less matrix inversion, an efficient iterative linearized ADM algorithm with adaptive penalty (LADMAP) is adopted in this paper. Experimental results on three data sets show that our new method is more efficient than the most advanced methods through a set of evaluations on classification and recognition. However, a theoretical analysis the benefits of using graph regularization will be further investigated as the future work and the application of the proposed model to a broader range of problems is to be explored.

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work is supported by the National Natural Science Foundation of China (Grant No. 61902160, 61806088), the Natural Science Foundation of the Jiangsu Higher Education Institutions of China (Grant No.19KJB520006) and the foundation of Changzhou Science and Technology Plan (Applied Basic Research) (Grant No. CJ20190076).

References

Zhu

Semi-supervised learning literature survey. World 2005; 10: 10.

Wright

Mairal

, et al. Sparse representation for computer vision and pattern recognition. Proc IEEE 2010; 98: 1031–1044.

Daitch

Kelner

, and Spielman

DA.

Fitting a graph to vector data. In: Proceedings of the 26th annual international conference on machine learning. New York: ACM, 2009: 201–208.

Jebara

Wang

, and Chang

S-F.

Graph construction and b-matching for semi-supervised learning. In: Proceedings of the 26th annual international conference on machine learning. New York: ACM, 2009: 441–448.

Talukdar

Pratim Crammer

New regularized algorithms for transductive learning. In: Machine learning and knowledge discovery in databases. Berlin, Heidelberg: Springer, 2009, pp.442–457.

Yan

Wang

Semi-supervised learning by sparse representation. SDM. 2009; 792-801.

Cheng

Yang

Yan

, et al. Learning with-graph for image analysis. IEEE Trans Image Process 2010; 19: 858–866.

Wright

Yang

Ganesh

, et al. Robust face recognition via sparse representation. IEEE Trans Pattern Anal Mach Intell 2009; 31: 210–227.

Liu

Lin

Robust subspace segmentation by low-rank representation. International Conference on Machine Learning 2010; 1–8.

10.

Hoyer

PO.

Modeling receptive fields with non-negative sparse coding. Neurocomputing 2003; 52-54: 547–552.

11.

Lee

Seung

HS.

Learning the parts of objects by non-negative matrix factorization. Nature 1999; 401: 788–791.

12.

Ran

, et al. Nonnegative sparse coding for discriminative semi-supervised learning. In: IEEE conference on computer vision and pattern recognition. Piscataway, NJ: IEEE, 2011; 2849–2856.

13.

Liansheng

, et al. Non-negative low rank and sparse graph for semi-supervised learning. In: IEEE conference on computer vision and pattern recognition. Piscataway, NJ: IEEE, 2012: 2328–2335.

14.

Yin

Junbin

Zhouchen

Laplacian regularized low-rank representation and its applications. 、1. IEEE transactions on pattern analysis and machine intelligence, 2015; 38(3): 504-517.

15.

Fang

, et al. Robust semi-supervised subspace clustering via non-negative low-rank representation. IEEE Trans Cybern 2016; 46: 1828–1838.

16.

Tong

Yihong

Nonlinear learning using local coordinate coding. In: Advances in neural information processing systems. 2009; 22: 2223–2231.

17.

Dayong

, et al. Retrieval-based face annotation by weak label regularized local coordinate coding. IEEE Trans Pattern Anal Mach Intell 2014; 36: 550–563.

18.

Peng

B-L

Wang

Enhanced low-rank representation via sparse manifold adaption for semi-supervised learning. Neural Netw 2015; 65: 1–17.

19.

Yang

Tang

, et al. Semi-supervised low-rank representation for image classification. SIViP 2017; 11: 73–80.

20.

Wang

Yuan

Graph-regularized low-rank representation for destriping of hyperspectral images. IEEE Trans Geosci Remote Sensing 2013; 51: 4009–4018.

21.

Zhu

Zoubin

John

Semi-supervised learning using gaussian fields and harmonic functions. ICML 2003; 3: 912–919.

22.

Lin

Risheng

Zhixun

Linearized alternating direction method with adaptive penalty for low-rank representation. . [C]//Proceedings of the 24th International Conference on Neural Information Processing Systems 2011; 612-620.

23.

Cai

J-F

Candès

Shen

A singular value thresholding algorithm for matrix completion. IAM J Optim 2010; 20: 1956–1982.

24.

Lin

Minming

The augmented Lagrange multiplier method for exact recovery of corrupted low-rank matrices. arXiv preprint 2010; arXiv:1009.5055.

25.

Toh

K-C

Yun

An accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems. Pacific J Optim 2010; 6: 615–640.

26.

Wang

, et al. Linear neighborhood propagation and its applications. IEEE Trans Pattern Anal Mach Intell 2009; 31: 1600–1615.

Graph regularized low-rank representation for semi-supervised learning

Abstract

Keywords

Introduction

Related works

LRR and GLRR

Semi-supervised classification

Graph regularized low-rank representation for semi-supervised learning

Objective function of GLR2S2

LADMAP for solving GLR2S2

Computation of Z

Computation of J

Computation of E

Computation of W

Computation of F

Convergence and complexity analysis

Experiments

Experiment setup

Experimental studies

Conclusion

Footnotes

Declaration of conflicting interests

Funding

References

Computation of $Z$

Computation of $J$

Computation of $E$

Computation of $W$

Computation of $F$