Sage Journals: Discover world-class research

Abstract

Low-Rank Representation (LRR) and Sparse Subspace Clustering (SSC) are considered as the hot topics of subspace clustering algorithms. SSC induces the sparsity through minimizing the $l_{1}$ -norm of the data matrix while LRR promotes a low-rank structure through minimizing the nuclear norm. In this paper, considering the problem of fitting a union of subspace to a collection of data points drawn from one more subspaces and corrupted by noise, we pose this problem as a non-convex optimization problem, where the goal is to decompose the corrupted data matrix as the sum of a clean and self-expressive dictionary plus a matrix of noise. We propose a new algorithm, named Low-Rank and Sparse Subspace Clustering with a Clean dictionary (LRS2C2), by combining SSC and LRR, as the representation is often both sparse and low-rank. The effectiveness of the proposed algorithm is demonstrated through experiments on motion segmentation and image clustering.

Keywords

Subspace clustering low-rank representation sparse representation

Introduction

The past few decades witnessed the data explosion, we have entered the era of big data, and an overwhelming amount of data is generated and collected every day. This poses a great challenge to process such large datasets, especially the datasets are usually very high-dimensional, even though the computer processing speed becomes faster and faster. The high-dimensionality of data not only increases the computing time but also decreases the performance due to the noise and insufficient samples in ambient space.^1,2 However, the intrinsic dimension of these data is often much smaller than the dimension of the ambient space. This has motivated a variety of techniques to find the low-dimensional representations of high-dimensional data.

Subspace clustering is the problem of clustering data according to their potential low-dimensional subspaces. Is has been widely used in many fields, such as motion segmentation and face clustering in computer vision, hybrid system identification in control, community clustering in social networks. Many algorithms have been proposed, such as GPCA,³ Spectral Curvature Clustering (SCC),⁴ Low-Rank Representation (LRR)^5,6 and its noisy variant LRSC,⁷ Sparse Subspace Clustering (SSC).^8,9 Among them, SSC and LRR are considered as the state-of-the-art techniques, which are based on minimizing the nuclear norm and $l_{1}$ -norm of the representation matrix respectively. Besides, the two algorithms are among the few subspace clustering algorithms supported by theoretic guarantees: both of them are proved to succeed when the subspaces are independent.^6,10

Sparse subspace clustering (SSC) uses $l_{1}$ -norm minimization to find the sparsest representation of samples and form a sparse coefficient matrix. This can be considered as expressing each sample as a linear combination of the other samples. The affinity matrix is then constructed using the sparse coefficient matrix. SSC guarantees that the clustering will be successful when the samples are linearly independent, or that a disjoint clustering will be produced under certain conditions. However, one major shortcoming of SSC is that it may not fully capture the advantages of global features, resulting in poorer performance when grossly corrupted or outlying observations are present. In addition, the SSC finds the sparsest representation of each sample individually, which leads to high computational cost.

Liu et al.⁵ recently presented an LRR algorithm for subspace clustering that relies on the same principle of representing a sample as a linear combination of others. LRR finds the lowest-rank representation of high-dimensional data. In addition to capturing the global structure of the data, LRR solves the convex optimization problem of nuclear norm minimization, which is considered as a surrogate for rank minimization. Several efforts have been made to improve the robustness of LRR. Chen and Yi¹¹ extended the original LRR algorithm by integrating a symmetric constraint condition into the low-rank representation, thus avoiding the post-processing symmetrization step of constructing an affinity matrix. Yin et al.¹² proposed a Laplacian regularized LRR to take into account the intrinsic geometrical structure of the data.

Although these methods are efficient and effective for solving specific clustering problems, an important shortcoming is that both LRR and SSC build an affinity matrix by using the techniques from low-rank or sparse representation. They assume that a point in a union of multiple subspaces admits a low-rank or sparse representation with respect to the dictionary formed by all the other data points, this is known as the Self-Expressiveness Property (SEP), i.e., $D = D C$ , where $C$ is the coefficient matrix. These coefficients are then converted into symmetric and non-negative affinities, from which clustering result is obtained using spectral clustering algorithm. Although these algorithms claim that they also succeed when the data are corrupted by noise, it is not clear that the above algorithms are effective when using a corrupted dictionary.

Motivated by the above research works, in this paper, we propose a low-rank sparse subspace clustering method with a clean dictionary named as LRS2C2. The experiments conducted for different types of vision problems prove the effectiveness of the proposed LRS2C2 method. The contributions of this paper can be summarized as follows:

(1) We integrate the low-rankness and sparsity constraints into a unified framework in order to get a better subspace clustering performance

(2) In the general optimization framework, we try to decompose a data matrix as a self-expressive, noise-free data matrix and a noise matrix and use the noise-free data matrix as a clean dictionary. Consequently, the introduction of a clean dictionary to low-rank and sparse subspace clustering enriches the relationship between high-dimensional data samples corrupted by gross errors and provides more robust subspace clustering performance.

The rest of this paper is organized as follows. In section 2, we briefly review SSC and LRR algorithms. In section 3, we present the proposed LRS2C2 method and its implementation by using the alternating direction method of multipliers (ADMM).¹³ The experimental results are shown in section 4. Finally, we conclude this paper in section 5.

Related works

In this section, we briefly introduce sparse representation, low-rank representation subspace clustering algorithm. For a given data matrix $X = [x_{1}, x_{2}, \dots, x_{n}] \in R^{D \times N}$ , each column is a $D$ -dimensional data vector drawn from the union of multiple linear subspaces ${\{S_{i}\}}_{i = 1}^{N}$ of unknown dimension. The task of subspace clustering is to cluster these data vectors into m corresponding subspaces.

Sparse subspace clustering

SSC was recently proposed as a solution to the subspace clustering problem.⁹ SSC finds the sparsest representation of the data by solving the following optimization problem

\min_{C} \sum {‖ C_{i} ‖}_{0}

(1)

s . t . X = X C and diag (C) = 0

where

C_{i}

is the

i

-th column of

C

{‖ \cdot ‖}_{0}

denotes the

l_{0}

-norm of a vector, i.e., the number of nonzero elements, and

diag (C) \in R^{N}

is the vector of diagonal elements of

C

. The optimal solution

C

of the above problem is called the sparsest representation of

X

. Because this optimization problem is non-convex and NP-hard, the following

l_{1}

-norm convex optimization provides a surrogate for problem (1)

\min_{C} {‖ C ‖}_{1}

(2)

s . t . X = X C and diag (C) = 0

where

{‖ \cdot ‖}_{1}

denotes the

l_{1}

-norm of a matrix, i.e., the sum of the absolute value of all matrix elements.

In the case of data corrupted by noise and sparse outliers, SSC assumes that each corrupted point can be represented by a linear combination of other corrupted points plus an outliers’ vector and a noise vector. In particular, SSC solves the following optimization problem

\min_{C} {‖ C ‖}_{1} + α {‖ E ‖}_{1} + \frac{β}{2} {‖ G ‖}_{F}^{2}

(3)

s . t . X = X C + E + G and diag (C) = 0

where

E

is the matrix of sparse outlying entries and

G

is the noise matrix. The parameters

α > 0

and

β > 0

determine the trade-off between the three terms in the optimization problem.

Low-Rank representation

Different from SSC, LRR finds the lowest-rank representation. LRR is formulated as the following rank minimization problem

\min_{C} rank (C)

(4)

s . t . X = A C

where the columns of

A

are a set of known bases or dictionary items. As it is difficult to solve the above optimization problem (2) due to the NP-hard nature of the rank function, a convex relaxation of this optimization problem is proposed as

\min_{C} {‖ C ‖}_{*}

(5)

s . t . X = A C

where

{‖ C ‖}_{*}

is the nuclear norm, defined as the sum of all singular values of

C

, which is the convex envelope of the rank function. Considering the fact that samples are usually noisy or even grossly corrupted, a more reasonable objective for LRR can be expressed as

\min_{C, E} {‖ C ‖}_{*} + λ {‖ E ‖}_{2, 1}

(6)

s . t . X = A C + E

where the

l_{2, 1}

-norm is defined as

{‖ E ‖}_{2, 1} = \sum_{j = 1}^{n} \sqrt{\sum_{i = 1}^{d} e_{i j}^{2}}

and the parameter

λ

is used to balance the effect of low-rank term and error term.

Therefore, in a post-processing step, one can use the coefficient $C$ (by using SSC or LRR) to define an affinity matrix as $(|C| + |C^{T}|) / 2$ . The segmentation of the data is then obtained by applying spectral clustering to this affinity.

Low-Rank sparse subspace clustering with a clean dictionary

Motivation

It is well known that the corruptions and noise are ubiquitous in real-world data, meaning that certain observation may be arbitrarily corrupted. We cannot expect high dimensional data to always capture in well-controlled environments. Therefore, when using the original noisy data as a dictionary, the clustering performance may be seriously limited by a lack of robustness to corrupted observations. The importance of learning the dictionary from noisy data for classification and clustering has been shown in many studies.^14–16

As there are many complex variations to the noise encountered in observation, it is difficult to choose an appropriate strategy for removing all types of noise from real-world data. Different low-rank matrix recovery and completion techniques are suitable for different types of noise. PCA determines the best low-rank approximation when the original data are only mildly corrupted by Gaussian noise with small variance. The RPCA can be used to recover the discriminative low-rank dictionary from data corrupted by sparse noise. The objective function of RPCA is as follows

\min {‖ A ‖}_{*} + λ {‖ E ‖}_{1}

s . t . X = A + E

In this paper, we assume that given a corrupted data matrix $D \in R^{M \times N}$ , we wish to decompose it as the sum of a self-expressive, noise-free data matrix $A$ , a noise matrix $G$ and the sparse errors $E$ . We assume that the columns of the matrix $A = \{a_{1}, a_{2}, \dots, a_{N}\}$ are points in $R^{M}$ . We also assume that $A$ is self-expressive, which means that the clean data points can be expressed as linear combinations of themselves, i.e.

a_{j} = \sum_{i = 1}^{N} a_{i} c_{i j} or A = A C

(7)

where

C = [c_{i j}]

is the matrix of coefficients. This constraint aims to capture the fact that a point in a linear subspace can be expressed as a linear combination of other points in the same subspace. Therefore, we expect

c_{i j}

to be zero if point

i

and

j

are in different subspaces.

The proposed approach, which we call Low-Rank and Sparse Subspace Clustering with a Clean dictionary (LRS2C2), is based on solving the following optimization problem

\min {‖ C ‖}_{*} + λ {‖ C ‖}_{1} + \frac{τ}{2} {‖ A - A C ‖}_{F}^{2} + \frac{α}{2} {‖ G ‖}_{F}^{2} + γ {‖ E ‖}_{1}

(8)

s . t . D = A + G + E

where

{‖ X ‖}_{*} = \sum_{i} σ_{i} (X)

{‖ X ‖}_{1} = \sum_{i j} ‖ X_{i j} ‖

and

{‖ X ‖}_{F}^{2} = \sum_{i j} X_{i j}^{2}

are the nuclear norm,

l_{1}

-norm and Frobenius norm of

X

respectively.

The proposed method combines the LRR and SSC, the first item of Eq.6 encourages $C$ to be low-rank by minimizing ${‖ C ‖}_{*}$ , the second item encourages $C$ to be sparse by minimizing ${‖ C ‖}_{1}$ , the third item encourages $A$ to be self-expressive by minimizing ${‖ A - A C ‖}_{F}^{2}$ , the fourth item encourages the noise $G$ to be as small as possible by minimizing ${‖ G ‖}_{F}^{2}$ , and the last term ensures the error to be sparse by minimizing ${‖ E ‖}_{1}$ .

Optimization for LRS2C2

In this subsection, we derive a computationally efficient algorithm to solve the proposed objective function.

When $E$ is fixed, let $(D - E) = U Σ V^{T}$ be the SVD of the data matrix $(D - E)$ . Then $A$ can be expressed as

A = U Λ V^{T}

(9)

where each entry of

Λ = diag (λ_{1}, \dots, λ_{n})

is obtained from each entry of

Σ = diag (σ_{1}, \dots, σ_{n})

as the solutions to

σ = ψ (λ) = \{\begin{matrix} λ + \frac{1}{α τ} λ^{- 3}, if λ > \frac{1}{\sqrt{τ}} \\ λ + \frac{1}{α τ} λ, if λ < \frac{1}{\sqrt{τ}} \end{matrix}

(10)

that minimizes

ϕ (λ, σ) = \frac{α}{2} {(σ - λ)}^{2} + \{\begin{matrix} λ + \frac{1}{α τ} λ^{- 3}, if λ > \frac{1}{\sqrt{τ}} \\ λ + \frac{1}{α τ} λ, if λ < \frac{1}{\sqrt{τ}} \end{matrix}

(11)

2. When $A$ and $C$ are fixed, the optimal solution for $E$ satisfies

- α (D - A - E) + γ sign (E) = 0

(12)

This equation has a closed-form solution

E = S_{\frac{γ}{α}} (D - A)

(13)

where

S (\cdot)

is the shrinkage thresholding operator defined as

S_{β} (x) = \{\begin{matrix} x - β, x > β \\ 0, - β \leq x \leq β \\ x + β, x < - β \end{matrix}

(14)

3. When $A$ and $E$ fixed, we optimize the $C$ under the ADMM framework. To further separate the terms of variable $C$ , we add two auxiliary terms $C_{1} = C_{2} = C$ to separate the two norms

\min {‖ C_{1} ‖}_{*} + λ {‖ C_{2} ‖}_{1} + \frac{τ}{2} {‖ A - A C ‖}_{F}^{2}

(15)

s . t . C_{1} = C_{2} = C

The augmented Lagrangian is

L = {‖ C_{1} ‖}_{*} + λ {‖ C_{2} ‖}_{1} + \frac{τ}{2} {‖ A - A C ‖}_{F}^{2} + ‖ Y_{1}, C_{1} - C ‖ + 〈Y_{2}, C_{2} - C〉 + \frac{μ}{2} ({‖ C_{1} - C ‖}_{F}^{2} + {‖ C_{2} - C ‖}_{F}^{2})

(16)

where

Y_{1}

and

Y_{2}

are Lagrange multipliers and

μ > 0

is a penalty parameter.

By fixing two of the variables, we can alternately update $C_{1}$ , $C_{2}$ and $C$ , and then update the multipliers $Y_{1}$ , $Y_{2}$ . Problem (16) can be divided into the following sub-problems

C = {(τ A^{T} A + 2 μ I)}^{- 1} (τ A^{T} A + μ C_{1} + μ C_{2} - Y_{1} - Y_{2})

C_{1} = \arg \min {‖ C_{1} ‖}_{*} + \frac{τ}{2} {‖ C_{1} - (C + \frac{Y_{1}}{μ}) ‖}_{F}^{2}

C_{2} = \arg \min {‖ C_{2} ‖}_{1} + \frac{τ}{2} {‖ C_{2} - (C + \frac{Y_{2}}{μ}) ‖}_{F}^{2}

The detailed optimization procedure is summarized in Algorithm 1. The three sub-problems in the corresponding steps are convex and have closed-form solutions. The second step can be solved by the singular value thresholding operator.¹⁷ The third step can be solved by the shrinkage thresholding operator.

After we get the coefficient matrix $C$ , we can define the affinity matrix of an undirected graph based on $C$ . In this paper, we do not use $(| C_{i j} | + | C_{j i} |) / 2$ to compute the affinity matrix. As suggested in He et al.,¹⁶ we use the coefficient shrinkage and normalization operators as in Ma et al.¹⁴ and construct the graph affinity matrix $W$ . This method takes into account of angular information of the principal directions of the representation. First, we consider the skinny SVD of $C$ , i.e., $U^{*} Σ^{*} {(V^{*})}^{T}$ , and then define a matrix $M = U^{*} {(Σ^{*})}^{1 / 2}$ in which each column of $U^{*}$ is assigned a weight by multiplying it by ${(Σ^{*})}^{1 / 2}$ . The affinity matrix is defined as follows

W_{i j} = {(\frac{m_{i}^{T} m_{j}}{{‖ m_{i} ‖}_{2} {‖ m_{j} ‖}_{2}})}^{2 α}

where

m_{i}

and

m_{j}

denote the

i

-th and

j

-th rows of

M

. Finally, we use a spectral clustering to get the final result.

Algorithm 1. LRS2C2

Input: Data matrix $D$ , parameters $λ$ , $τ$ , $α$ , $μ$

Output: ${I D}_{k} = K$ Cluster Index

Initialize: $C = C_{1} = C_{2} = 0$ , $Y_{1} = Y_{2} = 0$ , $μ = 10^{- 2}$ , $μ_{\max} = 10^{10}$ , $ρ = 1.1$ , $ε = 10^{- 6}$

1. Low-Rank and Sparse Recovery

While not converged do

Compute $A$ by Eq. (9)

Fix the other two variables and update $C$ by

C = {(τ A^{T} A + 2 μ I)}^{- 1} (τ A^{T} A + μ C_{1} + μ C_{2} - Y_{1} - Y_{2})

Fix the other two variables and update $C_{1}$ by

C_{1} = \arg \min {‖ C_{1} ‖}_{*} + \frac{τ}{2} {‖ C_{1} - (C + \frac{Y_{1}}{μ}) ‖}_{F}^{2}

Fix the other two variables and update $C_{2}$ by

C_{2} = \arg \min {‖ C_{2} ‖}_{1} + \frac{τ}{2} {‖ C_{2} - (C + \frac{Y_{2}}{μ}) ‖}_{F}^{2}

Update the multipliers:

Y_{1} = Y_{1} + μ (C_{1} - C)

Y_{2} = Y_{2} + μ (C_{2} - C)

Update the parameter $μ$ by $μ = \min (ρ μ, μ_{\max})$

Check the convergence conditions

{‖ C_{1} - C ‖}_{\infty} < ε and {‖ C_{2} - C ‖}_{\infty} < ε

End while

2. Spectral Clustering

Compute ${I D}_{k}$ using spectral clustering

Experiments

In this section, we evaluate the performance of LRS2C2 on two computer vision tasks: motion segmentation and image clustering. For the motion segmentation problem, we consider the Hopkins 155 dataset. For the image clustering problem, we consider the Extended YaleB dataset, AR dataset and MNIST datasets.

In the experiments, we use the subspace clustering error as a measure of performance, we compare LRS2C2 to state-of-the-art subspace clustering algorithms based on spectral clustering, such as LRR,⁵ LRSC,⁷ SSC⁹ and LSA.¹⁸ We choose these methods as a baseline because they have been shown to perform very well on the above tasks.

Experiments on motion segmentation

Motion segmentation refers to the problem of clustering a set of 2D point trajectories extracted from a video sequence into groups corresponding to different rigid-body motions. Here, the data matrix D is of dimension $2 F \times N$ , where N is the number of 2 D trajectories and F is the number of frames in the video. Under the affine projection model, the 2 D trajectories associated with a single rigid-body motion live in an affine subspace of $R^{2 F}$ of dimensional d = 1, 2 or 3. Therefore, the trajectories associated with n different moving objects lie in a union of n affine subspaces in $R^{2 F}$ , and the motion segmentation problem reduces to clustering a collection of point trajectories according to multiple affine subspaces.

We use the Hopkins 155 motion segmentation database to evaluate the performance of LRS2C2 against that of other algorithms. The database consists of 155 sequences of two and three motions, where 120 of the videos have two motions and 35 of the videos have three motions. On average, in the dataset, each sequence of two motions have N = 256 feature trajectories and F = 30 frames while each sequence of three motions has N = 398 feature trajectories and F = 29 frames. For each sequence, the 2 D trajectories are extracted automatically with a tracker and outlier are manually removed. Figure 1 shows some sample images with the feature points superimposed.

Figure 1.

Motion segmentation: given feature points on multiple rigidly moving objects tracked in multiple frames of a video (top), the goal is to separate the feature trajectories according to the moving objects (bottom).

Figure 2.

Sample images from the dataset (a) extended Yale B (b) AR (c) MNIST.

The results of applying subspace clustering algorithms to the dataset when we sue the original 2 F-dimensional feature trajectories and when we project the data into a 4n-dimensional subspace (n is the number of subspaces) using PCA are shown in Tables 1 and 2, respectively.

Table 1.

Clustering error (%) of different algorithms on the Hopkins 155 dataset with the 2 F-dimensional data points.

Algorithm		LSA	SSC	LRR	LRSC	LRS2C2
Two motions	Mean	4.23	2.07	4.10	2.57	1.80
Two motions	Median	0.56	0.00	0.22	0.00	0.00
Three motions	Mean	7.02	5.27	9.89	6.64	4.70
Three motions	Median	1.45	0.40	6.22	1.76	0.56
All	Mean	4.86	2.79	5.41	3.47	2.45
All	Median	0.89	0.00	0.53	0.09	0.00

Table 2.

Clustering error (%) of different algorithms on the Hopkins 155 dataset with the 4n-dimensional data points obtained by applying PCA.

Algorithm		LSA	SSC	LRR	LRSC	LRS2C2
Two motions	Mean	3.61	2.14	4.83	2.57	1.84
	Median	0.51	0.00	0.26	0.00	0.00
Three motions	Mean	7.65	5.29	9.89	6.62	4.59
	Median	1.27	0.40	6.22	1.76	0.48
All	Mean	4.52	2.85	5.98	3.47	2.46
	Median	0.57	0.00	0.59	0.00	0.00

From Tables 1 and 2, we can find that, in both cases, LRS2C2 obtains a small clustering error, outperforming the other algorithms. On the other hand, the clustering performance of different algorithms when using the 2 F-dimensional feature trajectories or the 4n-dimensional PCA projections are close. This comes from the fact that the feature trajectories of n motions in a video almost perfectly lie in a 4n-dimensional linear subspace of the 2 F-dimensional ambient subspace preserves the structure of the subspaces and the data; hence, for each algorithm, the clustering error in Table 1 is close to the error in Table 2. However, due to the fact that the Hopkins 155 dataset has a relatively low-noise level, the improvement of the LRS2C2 is relatively minor.

Experiments on images clustering

In this subsection, we conduct the experiments on images clustering problems. We use three image datasets including the extended YaleB, AR and MNIST to test the performance of LRS2C2. Figure 2 shows some sample images of the three data sets. The brief information of the three datasets are summarized as follows:

The extended Yale B database contains 38 human subjects and around 64 near frontal images under different illuminations per individual, the face images of each subject correspond to a low-dimensional subspace. In this paper, we select the first 10 classes of extended Yale B database to form the test dataset which contains 640 images in total and each image is resized to $32 \times 32$ pixels.

AR database contains over 4000 face images of 126 individuals. For each individual, 26 pictures were taken in two sections (separated by two weeks) and each section contains 13 images. These images include front view of faces with different expressions, illuminations and occlusions. In this paper, we took a subset of the AR dataset containing five male subjects and five female subjects.

MNIST database has 10 subjects, corresponding to 10 handwritten digits, namely 0–9. We select a subset which consists of the first 100 samples of each subject’s training dataset.

We first applied the proposed algorithm to the full original images without any artificial corruption or missing entries. Some example images from the extended Yale B database are given in Figure 3(a). Next, we considered artificial occlusion, namely random pixel corruptions. To simulate random pixel corruptions, we replaced a certain percentage of pixels in the matrix of the image with uniformly distributed random values in the range [0, 1]. To ensure that the noise was sparse, the corrupted location and pixels were selected at random and were unknown to the algorithms. The percentage of corrupted pixels was varied from 5 to 20% in steps of 5%. Figure 3(b) shows some examples of the images with random pixel corruptions. For a fair comparison, we preprocessed the images for the other algorithms by removing noise using the RPCA¹⁹ algorithm. Figure 3(c) shows some representative examples by applying the RPCA algorithm to face images with random pixel corruptions.

Figure 3.

Representative examples in the extended Yale database B: (a) sample images under different illumination conditions, (b) sample images with random pixel corruptions, (c) sample images after using RPCA.

We executed each algorithm 10 times and report the mean clustering error and standard deviation in Tables 3 to 5.

Table 3.

Clustering error (%) of different algorithm on the extended YaleB dataset by random pixel corruptions (the parameters for LRS2C2 are $λ = 0.2$ , $τ = 0.2$ , $α = 0.03$ , $γ = 0.01$ ).

Corruption ratio (%)	Error	LSA	SSC	LRR	LRSC	LRS2C2
0	Mean	36.00	32.80	47.50	34.65	24.06
	Std.	2.58	3.17	2.73	1.47	0.83
5	Mean	42.45	34.97	48.84	35.24	28.97
	Std.	3.00	3.68	1.60	1.72	0.94
10	Mean	44.27	35.08	50.13	35.58	30.19
	Std.	1.91	3.30	2.81	0.97	0.26
15	Mean	47.23	36.20	51.13	36.34	31.67
	Std.	1.29	2.52	3.10	2.62	2.28
20	Mean	51.27	38.32	51.90	38.97	32.58
	Std.	3.32	3.42	3.03	1.43	1.71

Table 4.

Clustering error (%) of different algorithm on the AR dataset by random pixel corruptions (the parameters for LRS2C2 are $λ = 0.2$ , $τ = 0.04$ , $α = 0.03$ , $γ = 0.01$ ).

Corruption ratio (%)	Error	LSA	SSC	LRR	LRSC	LRS2C2
0	Mean	37.86	23.57	2.86	12.83	2.10
	Std.	1.27	1.09	0.00	0.87	0.91
5	Mean	44.05	26.93	2.86	13.69	2.25
	Std.	0.65	1.58	0.30	0.61	0.69
10	Mean	44.71	27	2.86	15.34	2.33
	Std.	2.02	1.05	0.00	0.94	0.71
15	Mean	44.75	27.45	2.86	19.81	2.49
	Std.	2.60	1.27	0.00	1.08	0.55
20	Mean	43.64	29.36	3.07	2.98	2.56
	Std.	0.75	3.54	0.35	0.26	0.33

Table 5.

Clustering error (%) of different algorithm on the MNIST dataset by random pixel corruptions (the parameters for LRS2C2 are $λ = 0.2$ , $τ = 10^{- 3}$ , $α = 0.03$ , $γ = 0.01$ ).

Corruption ratio (%)	Error	LSA	SSC	LRR	LRSC	LRS2C2
0	Mean	36.00	32.80	47.50	33.52	28.42
	Std.	2.58	3.17	2.73	1.74	2.01
5	Mean	42.45	34.97	48.84	36.52	30.56
	Std.	3.00	3.68	1.60	1.68	2.30
10	Mean	44.27	35.08	50.13	37.29	31.04
	Std.	1.91	3.30	2.81	1.75	1.98
15	Mean	47.23	36.20	51.13	38.68	33.75
	Std.	1.29	2.52	3.10	1.46	1.60
20	Mean	51.27	38.32	51.90	39.12	34.90
	Std.	3.32	3.42	3.03	1.59	2.02

From the experimental results, it is clear that the proposed LRS2C2 method achieves the lowest clustering error in the three datasets. The clustering results confirm that the LRS2C2 can significantly improve the accuracy of image clustering. In these experiments, we adopt many low-rank based methods, the results also show low-rank based method outperformed other methods. For the original dataset without artificial corruption, as the LRS2C2 can recovery a clean dictionary from the input data matrix, it has the advantage than other methods, it achieves the lowest clustering error. Among the compared methods, LRSC also has the ability to use a clean dictionary, however, it only reveals the low-rankness of the data, LRS2C2 combines the low-rankness and sparsity of the representation of the data matrix, the solution is able to benefit from both the better separation induced by the $l_{1}$ -norm factor and the relatively denser connections promoted by the nuclear norm factor.

When the dataset was corrupted, the experimental results show that LRS2C2 consistently outperformed all the other methods, particularly for larger percentages of corrupted pixels. And as the increase of the percentage of the corruption, the proposed LRS2C2 method also has a relative advantage than other methods. These results clearly imply that LRS2C2 is more robust than the other algorithms.

Convergence analysis

In this section, we examine the convergence of our algorithm. The penalty parameter $μ$ is upper bounded, which is guaranteed by step (2)(3)(4) in Algorithm 1. The cost function in (8) is bounded and non-increasing at each alternating minimization step [36]. Hence the sequence ( $C_{1}$ , $C_{2}$ and $C$ ) in Algorithm 1 will converge. In addition, a convergence analysis of the ADMM for non-convex problems is provided in Hong et al. (2016).²⁰ Thus, the convergence of the algorithm is ensured. In order to verify the convergence of the proposed LRS2C2 algorithm, in this section, we run experimentation on the extended Yale B dataset, and draw the target function values for each iteration of the algorithm as shown in Figure 4. As you can see from the graph, the objective function decreases rapidly and converges to a certain value.

Figure 4.

Value of objective function versus iteration.

Conclusion

To effectively solve the subspace clustering problem in the case of data corrupted by noise, in this paper, we propose the LRS2C2 method, which decomposes the given data matrix to a clean data dictionary and a noise matrix. We also combine the low-rank representation and sparse representation in a unified framework to efficiently obtain the clustering results. Experiments conducted on motion segmentation and image clustering problems showed the effectiveness of our algorithm and its superiority over the state-of-the-art methods.

Footnotes

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work is supported by the National Natural Science Foundation of China (Grant No. 61902160, 61806088), the Natural Science Foundation of the Jiangsu Higher Education Institutions of China (Grant No.19KJB520006) and the foundation of Changzhou Science and Technology Plan (Applied Basic Research) (Grant No. CJ20190076).

ORCID iDs

Cong-Zhe You

Zhen-Qiu Shu

References

Basri

Jacobs

DW.

Lambertian reflectance and linear subspaces.

IEEE Trans Pattern Anal Mach Intell 2003; 25: 218–233.

Lee

Kriegman

DJ.

Acquiring linear subspaces for face recognition under variable lighting.

IEEE Trans Pattern Anal Mach Intell 2005; 27: 684–698.

Vidal

Sastry

Generalized principal component analysis (GPCA). IEEE Trans Pattern Anal Mach Intell 2005; 27.

Chen

Lerman

Spectral curvature clustering (SCC). Int J Comput Vis 2009; 81: 317–330.

Liu

Lin

Robust subspace segmentation by low-rank representation. In: Proceedings of the 27th international conference on machine learning (ICML-10), June 21-24, 2010, pp. 663–670. Haifa, Israel: Omnipress.

Liu

Lin

Yan

, et al. Robust recovery of subspace structures by low-rank representation. IEEE Trans Pattern Anal Mach Intell 2013; 35: 171–184.

Favaro

Vidal

Ravichandran

A closed form solution to robust subspace estimation and clustering. In: Computer vision and pattern recognition (CVPR) 2011, June 2011, pp.1801–1807. New York, NY: IEEE.

Elhamifar

Vidal

Sparse subspace clustering. In: 2009 IEEE conference on computer vision and pattern recognition, 2009 June, pp.2790–2797. New York, NY: IEEE.

Elhamifar

Vidal

Sparse subspace clustering: algorithm, theory, and applications.

IEEE Trans Pattern Anal Mach Intell 2013; 35: 2765–2781.

10.

Vidal

Tron

Hartley

Multiframe motion segmentation with missing data using PowerFactorization and GPCA. Int J Comput Vis 2008; 79: 85–105.

11.

Chen

2014. Subspace clustering by exploiting a low-rank representation with a symmetric constraint. arXiv:1403.2330.

12.

Yin

Gao

Lin

Laplacian regularized low-rank representation and its applications. IEEE Trans Pattern Anal Mach Intell 2013; 38: 504–517.

13.

Wen

Goldfarb

Yin

Alternating direction augmented Lagrangian methods for semidefinite programming. Math Prog Comp 2010; 2: 203–230.

14.

Wang

Xiao

, et al. Sparse representation for face recognition based on discriminative low-rank dictionary learning. In: 2012 IEEE conference on computer vision and pattern recognition (CVPR), June 2012, pp.2586–2593. New York, NY: IEEE.

15.

Yang

Zhang

Feng

, et al. Fisher discrimination dictionary learning for sparse representation. In: 2011 IEEE international conference on computer vision (ICCV) November 2011, pp.543–550. New York, NY: IEEE.

16.

Chen

Zhang

Low-rank representation with graph regularization for subspace clustering. Soft Computing 2015; 21: 1569–1581.

17.

Cai

Candès

Shen

A singular value thresholding algorithm for matrix completion. SIAM J Optim 2010; 20: 1956–1982.

18.

Yan

Pollefeys

A general framework for motion segmentation: independent, articulated, rigid, non-rigid, degenerate and non-degenerate. In: European conference on computer vision–ECCV 2006, Graz, Austria, 2006, pp.94–106. Berlin, Heidelberg: Springer.

19.

Candès

, et al. Robust principal component analysis? J Acm 2011; 58: 1–37.

20.

Hong

Luo

Razaviyayn

Convergence analysis of alternating direction method of multipliers for a family of nonconvex problems. SIAM J Optim 2016; 26: 337–364.

Low-rank sparse subspace clustering with a clean dictionary

Abstract

Keywords

Introduction

Related works

Sparse subspace clustering

Low-Rank representation

Low-Rank sparse subspace clustering with a clean dictionary

Motivation

Optimization for LRS2C2

Experiments

Experiments on motion segmentation

Experiments on images clustering

Convergence analysis

Conclusion

Footnotes

Declaration of Conflicting Interests

Funding

ORCID iDs

References