Sage Journals: Discover world-class research

Abstract

Recently, in the area of artificial intelligence and machine learning, subspace clustering of multi-view data is a research hotspot. The goal is to divide data samples from different sources into different groups. We proposed a new subspace clustering method for multi-view data which termed as Non-negative Sparse Laplacian regularized Latent Multi-view Subspace Clustering (NSL2MSC) in this paper. The method proposed in this paper learns the latent space representation of multi view data samples, and performs the data reconstruction on the latent space. The algorithm can cluster data in the latent representation space and use the relationship of different views. However, the traditional representation-based method does not consider the non-linear geometry inside the data, and may lose the local and similar information between the data in the learning process. By using the graph regularization method, we can not only capture the global low dimensional structural features of data, but also fully capture the nonlinear geometric structure information of data. The experimental results show that the proposed method is effective and its performance is better than most of the existing alternatives.

Keywords

Multi-view subspace clustering latent representation graph regularization Laplacian matrix

Introduction

Clustering analysis is a commonly used processing and analysis tool in artificial intelligence and machine learning fields. At present, researchers have proposed a large amount of clustering analysis methods, and have been widely used. However, due to the continuous progress of data science, the rapid development of data sources and access methods, the data obtained is more and more complicated. Among them, multi-view data is a wide-ranging phenomenon. However, traditional clustering methods are mostly proposed and developed for single-view data, in the face of multi-view data clustering, the performance is not ideal. Therefore, how to design a clustering analysis algorithm for multi-view datasets is a huge challenge. In order to effectively solve the problem of multi view data clustering, researchers have proposed many multi view cluster analysis methods from many aspects.

The multi-view clustering analysis algorithm can be roughly divided into two categories: multi-view K-means clustering^1–3 and multi-view spectral clustering algorithm.^4–6 These two types of algorithms are multi-view extensions based on K-means clustering and spectral clustering algorithms, respectively. At present, multi-view k-means clustering algorithm has achieved good performance in many fields, but due to the inherent deficiency of K-means clustering algorithm, the selection of initial central data points has a greater impact on the final clustering results. This greatly limits the application of multi-view k-means clustering algorithms. The multi-view clustering algorithm based on spectral clustering can obtain relatively stable clustering performance. Therefore, many researchers have recently focused on multi-view spectral clustering algorithms.^7–9

Numerous studies have shown that high-dimensional data can usually be embedded in a specific set of low-dimensional subspaces, so the multi-view subspace learning algorithm based on spectral clustering has been widely applied and achieved good performance. Representative algorithms include multi-view subspace clustering algorithm (MSC),⁷ diversity-induced multi-view subspace clustering algorithm (DiMSC),⁸ and potential multi-view subspace clustering algorithm (LMSC).⁹ The MSC⁷ algorithm learns the self-representation coefficient matrix of each view and unifies it into a common indicator matrix to obtain the final clustering result. However, the MSC algorithm ignores the differences between different views. For this problem, the diversity-induced multi-view clustering algorithm (DiMSC)⁸ uses the Hilbert-Schmidt independent criterion (HSIC) to learn the diversity of different views and improve the performance of clustering. However, both the MSC and DiMSC algorithms learn the self-representation coefficient matrix of the data in the original space of the data. This may greatly reduce the performance of the cluster due to possible noise in the original space of the data. Therefore, the LMSC⁹ algorithm will potentially represent combined with subspace clustering in a unified optimization framework, the self-representation coefficient matrix is learned in the potential representation space obtained by learning, which can greatly avoid the influence of noise on the clustering result and improve the clustering performance.

The LMSC⁹ algorithm adopts low rank representation (LRR) for subspace learning. If all the data in a high dimensional space is actually located on the union of a set of linear subspaces, the LRR can be easily acquired in the data. However, in practical applications, this assumption cannot be guaranteed. The data is often embedded in a low-dimensional manifold structure. For example, widely used face images are sampled from nonlinear low-dimensional manifolds embedded in high-dimensional space.¹⁰ Therefore, in this case, the LRR method may not be able to discover the inherent geometry and discriminant structure of the data^11,12 and this information is indispensable for practical applications.

In order to maintain the local geometry embedded in high-dimensional space, the researchers proposed a number of manifold learning algorithms, including representative manifolds, locally linear embedding (LLE),¹³ ISOMAP,¹⁴ and locality preserving projection (LPP)¹⁰ neighborhood preserving embedding (NPE)¹⁵ and Laplacian Eigenmap (LE).¹⁶ The starting point for all these algorithms is based on the so-called local invariance,¹⁷ which aims to estimate the geometric and topological result information of an unknown manifold from random points (scattered data).

In practical application, a reasonable assumption is adopted for general production. If two data sample points are very close in the internal manifold of data distribution, then the two points are also very close in a new representation space.¹⁸ In order to make full use of the inherent geometric structure information of data distribution, Lu et al.¹² proposed a graph regularization LRR algorithm.

Based on the above analysis, we propose a Laplacian regularized multi-view clustering algorithm based on latent representation. At the same time, we further introduce sparse and non-negative constraints in the algorithm model to improve the performance and rationality of the algorithm.

Related works

Subspace clustering

Let $X = [x_{1}, x_{2}, \dots, x_{N}] \in R^{d \times N}$ denote a given dataset, where each column represents an instance of a data sample, and each row represents a feature of the data. Assume that the data set can be represented by a union of linear subspaces. The purpose of the subspace learning algorithm is to capture the subspace structure ${\{S_{i}\}}_{i = 1}^{k}$ of the data set, where $S_{i}$ represents the $i$ -th subspace of the dataset. The subspace clustering algorithm adopts the self-expressness characteristics of data sample points that lie in linear subspaces. At present, the researchers have proposed a large number of subspace clustering algorithms, which have similar algorithm model framework which can be expressed as follows:

\min_{Z, E} f (Z) + g (E) s.t. X = X Z + E

(1)

Where

Z

represents a self-representation coefficient matrix,

E

represents a noise matrix, and

f (Z)

represents a regularization term of the coefficient matrix

Z

. Different subspace clustering algorithms use different regularization terms for

Z

, and each has a unique effect for

Z

, but the commonality of such algorithms is to find the subspace structure of the data set. For example, the representative algorithm Sparse Subspace Clustering algorithm (SSC)¹⁹ uses the

L_{1}

-norm to constrain the coefficient matrix

Z

, thereby ensuring that the obtained coefficient matrix is sparse. The Low Rank Representation algorithm (LRR)²⁰ uses the nuclear norm constraint for

Z

, so that the obtained coefficient matrix

Z

is low rank. And in these algorithm models, different norms are used for different noises

E

to reduce the influence of noise data on the final result. After obtaining the coefficient matrix

Z

, the subspace clustering algorithm uses the spectral clustering algorithm to divide the data, and the affinity matrix is constructed as

(|Z| + |Z^{T}|) / 2

Multi-view subspace clustering

Multi-view datasets contain a wealth of information from multiple sources, which is very useful for clustering analysis. Therefore, studying multi-view clustering is very important. Based on the single-view subspace clustering algorithm, a low-rank tensor constrained multi-view subspace clustering (LT-MSC)²¹ is proposed. The algorithm model can be expressed as follows:

\min_{Z^{v}, E^{v}} {‖ Z ‖}_{*} + λ {‖ E ‖}_{2, 1} s.t. X^{v} = X^{v} Z^{v} + E^{v}, v = 1, \dots, m

(2)

Where $Z = \emptyset (Z^{1}, \dots, Z^{m})$ , $E = [E^{1}, \dots, E^{m}]$ , $\emptyset (\cdot)$ constructs a tensor $Z$ with a dimension of $n \times n \times m$ by combining the representation coefficient matrix $Z^{v}$ ( $v = 1, \dots, m$ ) of different views into one 3-order tensor. Finally, the subspace representation of each view is merged through

Z = \frac{1}{m} \sum_{v = 1}^{m} (|Z^{v}| + {|Z^{v}|}^{T})

(3)

However, the fusion method is too simple to fully and accurately explore the complementary information between different view data samples. At present, there are also many studies that use weight-and rules to fuse multi-view representations. However, it still cannot solve the error problem of multi-view fusion. In the single-view subspace clustering analysis algorithm, the Latent Space Sparse Subspace Clustering (LS3C)²² algorithm can simultaneously reduce the dimensionality and sparse representation of the data set. Inspired by this idea, the work in⁹ proposed a latent multi-view subspace clustering (LMSC), which treats different views as projections from the same latent space and performs subspace aggregation on this latent space. The LMSC model can be expressed as follows:

\min_{Z, P, H, E_{h}, E_{r}} {‖ E ‖}_{2, 1} + λ {‖ Z ‖}_{*} s.t. X = P H + E_{h}, H = H Z + E_{r}, E = [E_{h}; E_{r}], P P^{T} = I

(4)

In the above optimization problem, $P = {[{(P^{1})}^{T}, \dots, {(P^{m})}^{T}]}^{T}$ is a projection matrix of the multi-view data and the data set is represented as $X = {[{(X^{1})}^{T}, \dots, {(X^{m})}^{T}]}^{T}$ , $H$ is the shared latent representation.

However, there is a problem of this method is that the complementary information between different data views is ignored, and no structural constraints are used on the representation coefficient matrix $Z$ . In addition, this kind of method needs to use the representation coefficient matrix Z to do spectral clustering to get the final clustering partition result. Therefore, the accuracy of the coefficient matrix Z of the data representation affects the accuracy of the final result.

Non-negative sparse Laplacian regularized latent multi-view subspace clustering (NSL2MSC)

In this paper, we do some research on the subspace clustering problem of multi-view latent representations. Given a multi-view data set ${\{[x_{i}^{(1)}; \dots; x_{i}^{(V)}]\}}_{i = 1}^{N}$ with $V$ different views, the number of samples is $N$ . Many studies of the researchers have shown that different data views can be represented by the same latent representation space.^2,4,7,9 Therefore, the goal of this method is to find such a common latent representation space $h$ for each data sample point.

The objective function of the proposed Non-negative Sparse Laplacian regularized Latent Multi-view Subspace Clustering (NSL2MSC) is formulated as

\min_{P, H, Z, E_{h}, E_{r}} {‖ Z ‖}_{*} + λ_{1} {‖ Z ‖}_{1} + λ_{2} \sum_{i j} {‖ z_{i} - z_{j} ‖}^{2} W_{i j} + λ_{3} {‖ E_{h} ‖}_{2, 1} + λ_{4} {‖ E_{r} ‖}_{2, 1} s.t. X = P H + E_{h}, H = H Z + E_{r}, P P^{T} = I, Z \geq 0

(5)

where

P

and

X

are reconstruction models aligned and the multi-view observations, respectively.

E_{r}

and

E_{h}

represent the errors because of the subspace representation and the latent representation respectively,

H = {\{h_{i}\}}_{i = 1}^{N}

is the shared latent representation.

In the objective function, the third term is the constraint term added based on the manifold assumption. Under the manifold assumption, the relationship between the two samples can be expressed as

\min \sum_{i j} {‖ z_{i} - z_{j} ‖}^{2} W_{i j}

(6)

Where, $z_{i}$ and $z_{j}$ are respectively the coefficients matrix of sample points $x_{i}$ and $x_{j}$ under some transformations.

Manifold constraints are important for the construction of many algorithm models, such as dimension reduction algorithm, clustering algorithm and semi-supervised learning algorithm. Where $D$ is a diagonal matrix, which represents the degree of the matrix. $D_{i i}$ is the $i$ -th diagonal element, and its value is the sum of all the similarity relationship associated with $y$ , i.e. $D_{i i} = \sum_{j} W_{i j}$ . Therefore, we can express the objective function of graph Laplacian as follows⁹

L = D - W

In addition, the objective function of graph embedding (5) can easily be expressed as follows

\min t r (Z L Z^{T})

Therefore, through these algebraic transformations, the model of the proposed method can be represented as the following matrix form

\min_{P, H, Z, E_{h}, E_{r}} {‖ Z ‖}_{*} + λ_{1} {‖ Z ‖}_{1} + λ_{2} t r (Z L Z^{T}) + λ_{3} {‖ E ‖}_{2, 1} s.t. X = P H + E_{h}, H = H Z + E_{r}, E = [E_{h}; E_{r}], P P^{T} = I, Z \geq 0

(7)

Model optimization

The objective function in equation (6) proposed in this paper can get a latent representation of the data and get meaningful similarity matrix from the latent representation which learns from multiple views. Although for the variables in the algorithm model, such as $P$ , $H$ , $Z$ , $E_{h}$ and $E_{r}$ , the proposed method cannot guarantee joint convexity. For each variable we can perform iterative optimization by fixing the other variables separately. For example, LADMAP²³ algorithm is an effective optimization method to solve this problem.

In order to optimize the objective function, we first introduce an auxiliary variable $J$ into the algorithm model to separate the objective function. In this way, we describe the optimization problem:

\min_{P, H, Z, E_{h}, E_{r}, J} {‖ Z ‖}_{*} + λ_{1} {‖ J ‖}_{1} + λ_{2} t r (Z L Z^{T}) + λ_{3} {‖ E ‖}_{2, 1} s.t. X = P H + E_{h}, H = H Z + E_{r}, E = [E_{h}; E_{r}], P P^{T} = I, Z = J, J \geq 0

(8)

The augmented Lagrange function of the above problem is

L (P, H, Z, E_{h}, E_{r}, J) = {‖ Z ‖}_{*} + λ_{1} {‖ J ‖}_{1} + λ_{2} t r (Z L Z^{T}) + λ_{3} {‖ E ‖}_{2, 1} + Φ (M_{1}, H - H Z - E_{r}) + Φ (M_{2}, Z - J) + Φ (M_{3}, P - P H - E_{h})

In addition, in order to facilitate the subsequent calculation and representation, we give the definitions of $Φ$ : $Φ (C, D) = \frac{μ}{2} {‖ D ‖}_{F}^{2} + 〈C, D〉$ , where $μ$ is a positive penalty scalar and $〈\cdot, \cdot〉$ defines the matrix inner product. According to LADMAP, by fixing the other variables, the variables P, H, Z, E, J can be updated by solving the following optimization problems iteratively.

$P$ -subproblem

With the other variables fixed, we obtain the expression of $P$ by solving the following optimization problem

P^{*} = \arg \min Φ (M_{3}, X - P H - E_{h})

(9)

s.t. $P P^{T} = I$

Theorem 1.

²⁴ Given the objective function $\min_{R} {‖ Q - G R ‖}_{F}^{2}$ s.t. $R^{T} R = R R^{T} = I$ , the optimal solution is $R = U V^{T}$ , where $U$ and $V$ are left and right singular values of SVD decomposition of $G^{T} Q$ .

The optimal solution of the $P$ -subproblem can be easily obtained as $P^{T} = U V^{T}$ , where $U$ and $V$ are the left and right singular values of SVD of $H {(M_{3} + X - E_{h})}^{T}$ , since we have

P^{*} = \arg \min Φ (M_{3}, X - P H - E_{h}) = \arg \min \frac{μ}{2} {‖ X - P H - E_{h} + M_{3} ‖}_{F}^{2} = \arg \min {‖ (X + M_{3} / μ - E_{h}) - P H ‖}_{F}^{2} = \arg \min {‖ {(X + M_{3} / μ - E_{h})}^{T} - H^{T} P^{T} ‖}_{F}^{2}

According to theorem 1, if $P$ is an orthogonal matrix (i.e. $P^{T} P = P P^{T} = I$ ), then the optimal solution to the p-subproblem is $P^{T} = U V^{T}$ , where $U$ and $V$ are the left and right singular values of SVD decomposition of $H {(\frac{M_{3}}{μ} + X - E_{h})}^{T}$ . In the practical application, in order to improve the computation efficiency, we relax $P$ to row orthogonal (i.e. $P P^{T} = I$ , where $P \in R^{k \times d}$ , $k ≪ d$ ), and in the practical application can also get good performance and convergence.

$H$ -subproblem

With the other variables fixed, we obtain the expression of H by solving the following optimization problem

H^{*} = \arg \min Φ (M_{3}, X - P H - E_{h}) + Φ (M_{1}, H - H Z - E_{r})

(10)

Taking the derivative with respect to H and setting it to zero, we get

A H + H B = C

(11)

With $A = μ P^{T} P$ , $B = μ (Z Z^{T} - Z - Z^{T} + I)$ , $C = (P^{T} M_{3} + M_{1} (Z^{T} - I)) + μ (P^{T} X + E_{r}^{T} - P^{T} E_{h} - E_{r} Z^{T})$ .

The above equation is a Sylvester equation. In order to avoid instability, we strictly limit $A$ to be positive definite by $\hat{A} = A + ϵ I$ , where $I$ is an identity matrix and $0 < ϵ ≪ 1$ .

Proposition 1.

The Sylvester equation (11) has a unique solution.

Proof.

The Sylvester equation $A H + H B = C$ has a unique solution for $H$ exactly when there are no common eigenvalues of $A$ and $- B$ . Since $\hat{A}$ is a positive definite matrix, so all of its eigenvalues are positive: $α_{i} > 0$ . While since $B$ is a positive semi-definite matrix, so all of its eigenvalues are nonnegative: $β_{i} \geq 0$ . Hence, for any eigenvalues of $A$ and $B$ , $α_{i} + β_{i} > 0$ .

Accordingly, the Sylvester equation (11) has a unique solution.

$Z$ -subproblem

Fix the other variables, we update $Z$ by solving the following problem

L_{1} = \arg \min_{Z} {‖ Z ‖}_{*} + λ_{2} t r (Z L Z^{T}) + Φ (M_{1}, H - H Z - E_{h}) + Φ (M_{2}, Z - J) = \arg \min_{Z} {‖ Z ‖}_{*} + λ_{2} t r (Z L Z^{T}) + \frac{μ}{2} {‖ Z - J_{k} + \frac{1}{μ} M_{2}^{k} ‖}_{F}^{2} + \frac{μ}{2} {‖ H - H Z - E_{h}^{k} + \frac{1}{μ} M_{1}^{k} ‖}_{F}^{2}

which does not have a closed-form solution. By the spirit of LADMAP,²⁵ we denote the smooth component of

L_{1}

q (Z, E_{k}, J_{k}, M_{1}^{k}, M_{2}^{k}) = λ_{2} t r (Z L Z^{T}) + \frac{μ}{2} {‖ Z - J_{k} + \frac{1}{μ} M_{2}^{k} ‖}_{F}^{2} + \frac{μ}{2} {‖ H - H Z - E_{h}^{k} + \frac{1}{μ} M_{1}^{k} ‖}_{F}^{2}

Then according to LADMAP, minimizing $L_{1}$ can be replaced by solving the following problem:

Z^{*} = \arg \min_{Z} {‖ Z ‖}_{*} + 〈\nabla_{Z} q (Z_{k}), Z - Z_{k}〉 + \frac{η}{2} {‖ Z - Z_{k} ‖}_{F}^{2}

(12)

where

q (Z, E_{k}, J_{k}, M_{1}^{k}, M_{2}^{k})

is approximated by its linearization

〈\nabla_{Z} q (Z_{k}), Z - Z_{k}〉

Z_{k}

plus a proximal term

\frac{η}{2} {‖ Z - Z_{k} ‖}_{F}^{2}

and

\nabla_{Z} q (Z_{k})

is the gradient of

q

w.r.t.

Z

. As long as

η > 2 λ_{2} {‖ L ‖}_{2} + μ (1 + {‖ H ‖}_{2}^{2})

, where

{‖ \cdot ‖}_{2}

is the spectral norm of a matrix, i.e. the largest singular value, the above replacement is valid. Then the above equation has a closed-form solution given by

Z_{k + 1}^{*} = Θ_{{(η)}^{- 1}} (Z_{k} - \nabla_{Z} q (Z_{k}) / η)

where

Θ_{ε} (A) = U S_{ε} (Σ) V^{T}

is the singular value thresholding operator (SVT),²⁵ in which

U Σ V^{T}

is the singular value decomposition (SVD) of

A

and

S_{ε} (x) = sgn (x) m ax (|x| - ε, 0)

is the soft thresholding operator.

$E$ -subproblem

The reconstruction error $E$ is updated by solving the following problem

E^{*} = \arg \min_{E} {‖ E ‖}_{2, 1} + Φ (M_{3}, X - P H - E_{h}) + Φ (M_{1}, H - H Z - E_{r}) = \arg \min_{E} \frac{1}{μ} {‖ E ‖}_{2, 1} + \frac{1}{2} {‖ E - G ‖}_{F}^{2}

(13)

where

G

is formed by vertically concatenating the matrices

(X - P H + M_{3} / μ)

and

(H - H Z + M_{1} / μ)

. This subproblem can be efficiently solved by Lemma 3.2 in.²⁰

$J$ -subproblem

Fix the others, the Lagrange function with respect to $J$ can be written as

J^{*} = \arg \min_{J \geq 0} λ_{1} {‖ J ‖}_{1} + Φ (M_{2}, Z - J) = \arg \min_{J \geq 0} λ_{1} {‖ J ‖}_{1} + \frac{μ}{2} {‖ J - (Z + M_{2} / μ) ‖}_{F}^{2}

(14)

which has the following closed-form solution:

J_{k + 1} = \max \{S_{\frac{λ_{1}}{μ}} (Z_{k + 1} + \frac{1}{μ} M_{2}^{k}), 0\}

Updating multipliers

We update the multipliers by

\{\begin{matrix} M_{1} = M_{1} + μ (H - H Z - E_{r}) \\ M_{2} = M_{2} + μ (Z - J) \\ M_{3} = M_{3} + μ (X - P H - E_{h}) \end{matrix}

The entire algorithm is summarized in Algorithm 1.

Algorithm 1: Optimization Algorithm for the proposed NSL2MSC method
Input: Multi-view matrices: $\{X^{(1)}, \dots, X^{(V)}\}$ ; parameter $λ_{1}$ , $λ_{2}$ and $λ_{3}$ ; the dimension $K$ of the latent representation $H$ .
Initialize: $P = 0$ , $E_{r} = 0$ , $E_{h} = 0$ , $J = Z = 0$ , $M_{1} = 0$ , $M_{2} = 0$ , $M_{3} = 0$ , $μ = 10^{- 6}$ , $ρ = 1.1$ , $ϵ = 10^{- 4}$ , $\max_{μ} = 10^{6}$ ; Initialize $H$ with random values.
While not converged do
Update variables $P, H, Z, E, J$ according to subproblem 1–5;
Update multipliers $M_{1}, M_{2}, M_{3}$ according to subproblem 6;
Update the parameter $μ$ by $μ = \min (ρ μ, \max_{μ})$ ;
Check the convergence condition: ${‖ X - P H - E_{h} ‖}_{\infty} < ϵ$ , ${‖ H - H Z - E_{r} ‖}_{\infty} < ϵ$ , ${‖ J - Z ‖}_{\infty} < ϵ$ .
End
Output: $Z, H, P, E$

Complexity and convergence

The method proposed in this paper is optimized through six subproblems. The complete algorithm is shown in algorithm 1. Where, the time complexity of solving the $P$ -subproblem is $O (k^{2} d + d^{3})$ , $k$ , $d$ and $n$ respectively represent the latent space dimensions, the total dimensions of multi-view data features, and the number of data samples. To solve the $J$ -subproblem, the time complexity is $O (n^{3})$ . To solve the $H$ -subproblem, Bartels Stewart algorithm was used to solve Sylvester equation, and the time complexity was $O (k^{3})$ . To solve the $Z$ -subproblem, the main task is to compute the inverse of the matrix, and the time complexity is $O (n^{3})$ . For $E$ subproblems and Lagrange multiplier calculations, the main task is to calculate matrix multiplication, with the time complexity $O (dkn + k n^{2})$ . Thus, the total time complexity of each iteration is $O (k^{2} d + d^{3} + k^{3} + n^{3} + dkn + k n^{2})$ . The total time complexity can be expressed as $O (d^{3} + n^{3})$ because of $k ≪ d$ . It is difficult to prove the convergence of the algorithm, but the effect of the algorithm on the real dataset shows that the algorithm proposed in this paper has strong convergence and stability even if $H$ is randomly initialized.

Experiments

Experimental setting

Dataset description

We chose four real datasets (MSRCV1,³ Extended YaleB, Still DB,²⁶ BBCSport⁵) as test data to verify the effectiveness of the proposed method. The basic information for the four datasets is as follows:

– The MSRCV1 dataset contains 240 images in 8 classes. In this experiment, we selected 7 classes of them, such as tree, building, air-plane, cow, face, car and bicycle. In the experiment, we extracted 6 features of each image, namely CENT (view1), CMT (view2), GIST (view3), HOG (view4), LBP (view5) and SIFT (view6). And use these features to make up 6 different views.

– The Extended Yale-B dataset contains a total of 2414 face images of 38 people, each with 64 near frontal images under different illuminations. In the experiment, we select the first 10 classes as the final dataset, which has 640 frontal face images in total, we chose three features of the image, intensity (view1), LBP (view2) and Gabor (view3) to form 3 different views of the dataset.

– The still DB dataset contains 467 images in 6 classes. In the experiment, we extract three features of each image: sift bow (view1), color sift bow (view2) and shape context bow (view3) to form three different views of the dataset.

– The BBCSport dataset contains sports news in five subject areas, all of which are from the BBC sports website and are associated with two views.

Compared methods

We compare the algorithm proposed in this paper with the state-of-the- art algorithm proposed recently. The compared algorithms are as follows:

– SPC: The standard spectral clustering algorithm. In the experiment, we choose the view with the most information to do the spectral clustering.

– LRR²⁰: The low rank representation. In the experiment, we run the LRR algorithm on each single view and record the best clustering results.

– Co-Reg SPC⁴: In this method, the clustering method is combined with regularization to make the corresponding data points in the same cluster.

– RMSC⁵: This method first clusters each view, and then uses the shared low-rank transition probability matrix to integrate them to get better clustering results.

– FCMSC²⁷: The algorithm deals with the problem that multiple views have different statistic properties and simply concatenating them directly cannot derive a satisfied clustering performance.

– LMSC⁹: This method searches for latent representations and reconstructs the data according to the learned latent representations.

Experimental results

Performance comparison

To evaluate the characteristics of each algorithm, we adopt four evaluation indexes, NMI (normalized mutual information), ACC (accuracy), F-measure and RI (rand index). For the 5 compared algorithms in the experiment, we adjust all parameters to the best performance in the corresponding paper. In the experiment, for all experimental data sets, the dimension of latent space of the proposed algorithm is set to $K = 500$ , and the value range of three parameters $λ_{1}, λ_{2}, λ_{3}$ is from $\{1 e^{- 3}, 1 e^{- 2}, 1 e^{- 1}, 1, 1 e^{1}, 1 e^{2}, 1 e^{3}\}$ .

For each algorithm, we run it 30 times independently and give the mean value and standard deviation.

The detailed clustering results of the proposed algorithm and the comparison algorithm on four real datasets are shown in Tables 1 –4.

Table 1.

Clustering results on the MSRCV1 dataset.

Method	NMI		ACC		F-Measure		RI
Method	Mean	Std	Mean	Std	Mean	Std	Mean	Std
SPC	0.5961	0.0267	0.6831	0.0441	0.5566	0.0414	0.8698	0.0089
LRR	0.5235	0.0153	0.6063	0.0025	0.4743	0.0056	0.8612	0.0007
Co-Reg SPC	0.6056	0.0134	0.6993	0.0123	0.5581	0.0203	0.9025	0.0028
RMSC	0.5987	0.0059	0.7161	0.0072	0.6092	0.0151	0.8886	0.0023
FCSMC	0.6852	0.0031	8.8213	0.0117	0.6906	0.0139	0.9071	0.0032
LMSC	0.6724	0.0117	0.8121	0.0115	0.6713	0.0159	0.9027	0.0017
NSL2MSC	0.7206	0.0121	0.8469	0.0101	0.7037	0.0126	0.9258	0.0026

Table 2.

Clustering results on the Extended Yale B dataset.

Method	NMI		ACC		F-Measure		RI
Method	Mean	Std	Mean	Std	Mean	Std	Mean	Std
SPC	0.3807	0.0115	0.3815	0.0349	0.3049	0.0117	0.2347	0.0139
LRR	0.6345	0.0048	0.6286	0.0145	0.5263	0.0069	0.4511	0.0019
Co-Reg SPC	0.1819	0.0054	0.2073	0.0076	0.1703	0.0008	0.0882	0.0006
RMSC	0.1627	0.0119	0.1868	0.0136	0.1672	0.0107	0.0718	0.0123
FCSMC	0.7175	0.0104	0.6269	0.0140	0.5853	0.0097	0.5764	0.0167
LMSC	0.7231	0.0111	0.7328	0.0152	0.6259	0.0082	0.5825	0.0121
NSL2MSC	0.8026	0.0117	0.7818	0.0120	0.7406	0.0038	0.7121	0.0058

Table 3.

Clustering results on the still DB dataset.

Method	NMI		ACC		F-Measure		RI
Method	Mean	Std	Mean	Std	Mean	Std	Mean	Std
SPC	0.1056	0.0075	0.2928	0.0065	0.2222	0.0073	0.7319	0.0063
LRR	0.1125	0.0026	0.3146	0.0041	0.2473	0.0051	0.7353	0.0008
Co-Reg SPC	0.1055	0.0018	0.2701	0.0028	0.2317	0.0039	0.7414	0.0004
RMSC	0.1188	0.0063	0.2917	0.0197	0.2383	0.0204	0.7359	0.0051
FCSMC	0.1406	0.0072	0.3187	0.0021	0.2728	0.0046	0.7428	0.0056
LMSC	0.1418	0.0036	0.3289	0.0031	0.2709	0.0058	0.7426	0.0002
NSL2MSC	0.1521	0.0047	0.3442	0.0058	0.2838	0.0062	0.7569	0.0006

Table 4.

Clustering results on the BBCSport dataset.

Method	NMI		ACC		F-Measure		RI
Method	Mean	Std	Mean	Std	Mean	Std	Mean	Std
SPC	0.7084	0.0058	0.8005	0.0332	0.7633	0.0035	0.8919	0.0008
LRR	0.6983	0.0022	0.7913	0.0035	0.7715	0.0028	0.8816	0.0015
Co-Reg SPC	0.7242	0.0008	0.7556	0.0057	0.7722	0.0013	0.8930	0.0006
RMSC	0.8133	0.0102	0.8428	0.0139	0.8772	0.0090	0.9265	0.0027
FCSMC	0.8352	0.0104	0.8785	0.0041	0.8849	0.0053	0.9377	0.0019
LMSC	0.8263	0.0071	0.9031	0.0052	0.8871	0.0071	0.9465	0.0006
NSL2MSC	0.8536	0.0063	0.9436	0.0067	0.9048	0.0066	0.9617	0.0011

Table 1 shows the clustering results on the MSRCV1 data set. On the MSRCV1 dataset, we can see that the proposed method is much better than the compared method. The reason behind this is that the latent representation space learned can make better use of multiple views of data.

Table 2 shows the clustering performance on the Extended YaleB dataset. The clustering performance of most algorithms on this dataset is poor, the main reason is that the illumination changes greatly in the dataset, which seriously affects the clustering performance. However, the NSL2MSC algorithm proposed in this paper has achieved better results. It is 7.95%, 4.9%, 11.47% and 12.96% higher than LMSC algorithm in NMI, ACC, F-measure and RI indexes, respectively.

Table 3 shows the clustering results on the still DB dataset. The clustering performance of each algorithm is not very good. However, from the four indicators, the NSL2MSC algorithm proposed in this paper has achieved relatively promising clustering results.

Table 4 shows the clustering results on BBCSport dataset. The clustering performance of NSL2MSC proposed in this paper is at least 2% higher than that of RMSC. Compared with LMSC method, our NSL2MSC method still achieves better result.

Parameters effect

In the NSL2MSC algorithm model proposed in this paper, there are several regularization parameters. Next, the Extended YaleB data set is taken as an example to test the effect of the four parameters $λ_{1}, λ_{2}, λ_{3}$ and $K$ on ACC and NMI. We change one parameter while fixing the other. Figure 1 shows the ACC and NMI results for different parameter settings on the Extended Yale B dataset. It clearly can be seen that the results show that in a large range of parameters, the NSL2MSC algorithm is superior to other algorithms.

Figure 1.

The parameter effect of $λ_{1}, λ_{2}, λ_{3}$ and $K$ on Extended Yale-B dataset with ACC and NMI metrics, respectively.

Conclusion

In this paper, a new multi view subspace clustering method was proposed, which is called NSL2MSC. The main innovation of this algorithm lies in the use of graph regularization to make full use of the complementary information among views in the process of learning multi view latent subspace representation. In this way, this method can not only represent the global information of the data, but also can capture the local geometric structure information of the data. A large number of image clustering experiments show the effectiveness of this method.

Footnotes

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work is supported by the National Natural Science Foundation of China (Grant No. 61902160, 61806088), the Natural Science Foundation of the Jiangsu Higher Education Institutions of China (Grant No.19KJB520006) and the foundation of Changzhou Science and Technology Plan (Applied Basic Research) (Grant No. CJ20190076).

ORCID iDs

Cong-Zhe You

Zhen-Qiu Shu

References

Liu

Wang

Gao

, et al. Multi-view clustering via joint nonnegative matrix factorization. In: Proceedings of the 2013 SIAM international conference on data mining, 2013, pp. 252–260. Society for Industrial and Applied Mathematics.

Cai

Nie

Huang

Multi-view K-means clustering on big data. In: Twenty-third international joint conference on artificial intelligence, 2013, pp. 2598–2604.

Han

Nie

Discriminatively embedded K-means for multi-view clustering. In: Proceedings of the IEEE conference on computer vision and pattern recognition, Las Vegas, NV, 27–30 June 2016, pp. 5356–5364. IEEE.

Kumar

Rai

Daume

Co-regularized multi-view spectral clustering. In: Advances in neural information processing systems, 2011, pp. 1413–1421.

Xia

Pan

, et al. Robust multi-view spectral clustering via low-rank and sparse decomposition. In: Twenty-eighth AAAI conference on artificial intelligence, 2014.

Zhou

Burges

CJC.

Spectral clustering and transductive learning with multiple views. In: Proceedings of the 24th international conference on machine learning, June 2007, pp. 1159–1166.

Gao

Nie

, et al. Multi-view subspace clustering. In: Proceedings of the IEEE international conference on computer vision, Santiago, Chile, 7–13 December 2015, pp. 4238–4246. IEEE.

Cao

Zhang

, et al. Diversity-induced multi-view subspace clustering. In: Proceedings of the IEEE conference on computer vision and pattern recognition, Boston, MA, 7–12 June 2015, pp. 586–594. IEEE.

Zhang

, et al. Latent multi-view subspace clustering. In: 2017 Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), Honolulu, HI, 21–26 July 2017, pp. 4279–4287. IEEE.

10.

Yan

, et al. Face recognition using Laplacian faces. IEEE Trans Pattern Anal Mach Intell 2005; 27: 328–340.

11.

Cai

Han

, et al. Graph regularized nonnegative matrix factorization for data representation. IEEE Trans Pattern Anal Mach Intell 2011; 33: 1548–1560.

12.

Wang

Yuan

Graph-regularized low-rank representation for destriping of hyperspectral images. IEEE Trans Geosci Remote Sens 2013; 51: 4009–4018.

13.

Roweis

Saul

LK.

Nonlinear dimensionality reduction by locally linear embedding. Science 2000; 290: 2323–2326.

14.

Tenenbaum

De Silva

Langford

JC.

A global geometric framework for nonlinear dimensionality reduction. Science 2000; 290: 2319–2323.

15.

Cai

Yan

, et al. Neighborhood preserving embedding. In: Tenth IEEE international conference on computer vision (ICCV'05) volume 1, Beijing, China, 17–21 October 2005, pp. 1208–1213. IEEE.

16.

Belkin

Niyogi

Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput 2003; 15: 1373–1396.

17.

Hadsell

Chopra

LeCun

Dimensionality reduction by learning an invariant mapping. In: IEEE computer society conference on computer vision and pattern recognition (CVPR'06), New York, NY, 17–22 June 2006, pp. 1735–1742. IEEE.

18.

Cai

Shao

, et al. Laplacian regularized gaussian mixture model for data clustering. IEEE Trans Knowl Data Eng 2011; 23: 1406–1418.

19.

Elhamifar

Vidal

Sparse subspace clustering: Algorithm, theory, and applications. IEEE Trans Pattern Anal Mach Intell 2013; 35: 2765–2781.

20.

Liu

Lin

Yan

, et al. Robust recovery of subspace structures by low-rank representation. IEEE Trans Pattern Anal Mach Intell 2013; 35: 171–184.

21.

Zhang

Liu

, et al. Low-rank tensor constrained multiview subspace clustering. In: Proceedings of the IEEE international conference on computer vision, Santiago, Chile, 7–13 December 2015, pp. 1582–1590. IEEE.

22.

Patel

Van Nguyen

Vidal

Latent space sparse subspace clustering. In: Proceedings of the IEEE international conference on computer vision, Sydney, NSW, Australia, 1–8 December 2013, pp. 225–232. IEEE.

23.

Lin

Liu

Linearized alternating direction method with adaptive penalty for low-rank representation. In: Advances in Neural Information Processing Systems, 2011, pp. 612–620.

24.

Huang

Nie

Huang

Spectral rotation versus K-Means in spectral clustering. In: The 27th AAAI Conference on Artificial Intelligence, 2013.

25.

Cai

Candès

Shen

A singular value thresholding algorithm for matrix completion. SIAM J Optim 2010; 20: 1956–1982.

26.

Ikizler

Cinbis

Pehlivan

, et al. Recognizing actions from still images. In: 2008 19th international conference on pattern recognition, Tampa, FL, 8–11 December 2008, pp. 1–4. IEEE.

27.

Zheng

Zhu

, et al. Feature concatenation multi-view subspace clustering. Neurocomputing 2020; 379: 89–102.

Non-negative sparse Laplacian regularized latent multi-view subspace clustering

Abstract

Keywords

Introduction

Related works

Subspace clustering

Multi-view subspace clustering

Non-negative sparse Laplacian regularized latent multi-view subspace clustering (NSL2MSC)

Model optimization

P -subproblem

H -subproblem

Z -subproblem

E -subproblem

J -subproblem

Updating multipliers

Complexity and convergence

Experiments

Experimental setting

Dataset description

Compared methods

Experimental results

Performance comparison

Parameters effect

Conclusion

Footnotes

Declaration of Conflicting Interests

Funding

ORCID iDs

References

$P$ -subproblem

$H$ -subproblem

$Z$ -subproblem

$E$ -subproblem

$J$ -subproblem