Multi-view clustering via simultaneously learning shared subspace and affinity matrix

Abstract

Due to the existence of multiple views in many real-world data sets, multi-view clustering is increasingly popular. Many approaches have been investigated, among which the subspace clustering methods finding the underlying subspaces of data have been developed recently. Although the subspace-based multi-view methods can achieve promising performance, the shared subspace information has not been fully utilized. To address this problem, a novel multi-view clustering model by simultaneously learning shared subspace and affinity matrix is proposed. In our method, a shared subspace is learned to preserve the effective consensus information of all views. Then, a subspace-based affinity matrix with adaptive neighbors is learned to assign the most suitable cluster to each data point. An iterative strategy is developed for solving this problem. Moreover, experiments on four benchmark data sets demonstrate that our algorithm outperforms other state-of-the-art algorithms.

Keywords

Multi-view clustering shared subspace affinity matrix

Introduction

In recent years, multi-view learning has attracted much attention in various real-world applications because data are becoming more diverse and can be collected from different domains or feature sets. For instance, web pages can be represented by different characteristics, for example, text and hyperlinks. An image can also have various representations with respect to many kinds of different features, such as texture, shape, and color features. Although each single view could be adequate or complete for a learning task, multi-view learning fully exploiting the complementary and consensus information is often effective for improving performance. Therefore, how to integrate these multi-view features is a key problem. A naive method is that we can concatenate the multi-view features into one long vector, but this is not physically meaningful. In this way, the complementary information of different views cannot be explored efficiently because the specific statistical characteristic in each view is different.¹ In the past decades, many approaches that focus on multi-view learning have been proposed, among which supervised or semi-supervised methods account for a large proportion.^2,3 In this article, we concentrate on unsupervised scenarios, for example, multi-view clustering, which is a more challenging problem because of the unknown label information for multi-view data.

To fuse different views into a unified framework, many graph-oriented multi-view clustering models have been investigated. The method in Selee et al.⁴ proposes a new tensor decomposition in which the authors store object-feature matrices as the slices of a tensor. In another study,⁵ canonical correlation analysis is utilized to map original data with high dimension to a low-dimensional subspace, which is a typical dimension reduction method. To explore complementary information, some methods adopt co-training or co-regularization strategies, such as the studies by Kumar et al.^6,7 The method in Cai et al.⁸ explores a shared Laplacian matrix by integrating multi-view heterogeneous features and uses a nonnegative constraint to improve robustness. The method in Nie et al.⁹ jointly performs multi-view clustering and local structure learning. In this method, the weight for each view is automatically determined and does not need additional parameters. Generally, graph-based methods have excellent performance. However, in the real world, graphs are often unreliable since these original data always have noise and outliers.

Recently, various subspace learning algorithms have been proposed^10

–15 since data always distribute on certain underlying low-dimensional subspaces. In these methods, self-representation-oriented subspace clustering has been extensively developed to multi-view domains. The goal of these subspace clustering methods is to find underlying subspaces embedded in original data so that data points can be segmented correctly. The multi-view clustering method in Yin et al.¹⁶ introduces a pairwise sparse subspace representation and maximizes the correlation of different views. The method in Feng et al.¹⁷ utilizes the local reconstruction relationships of data. Besides, a low-rank tensor constraint in Zhang et al.¹⁸ is employed to exploit the multi-view complementary information. In this method, representation matrices of all views are regarded as a tensor taking the high-order correlations of multi-view data into account. Additionally, the method in Cao et al.¹⁹ introduces a diversity property named diversity-induced multi-view subspace clustering (DiMSC) in which the Hilbert Schmidt independence criterion (HSIC) is utilized as a diversity term to learn the complementarity for original multi-view data. These above multi-view clustering methods based on subspace execute subspace clustering on each view. Although these methods are able to explore the multi-view complementary information, the structure of affinity matrix might be destroyed. Additionally, these methods do not take the consensus subspace information into account. The method in Qi et al.²⁰ introduces a common global affinity matrix for all views. Also, a regularization term in this method is used to minimize the differences between this common affinity matrix and individual affinity matrix for each view in this method, which results in errors to some extent and makes the final clustering results not optimal.

To address the above problems, a novel multi-view clustering approach is proposed to explore consensus subspace information of all views. In our method, we directly learn a shared subspace from original data. Further, we learn an affinity matrix with adaptive neighbors by utilizing the shared subspace rather than raw data, which avoids the influence of noise and outliers existing in original data and improves the robustness of our algorithm. The main contributions are as follows:

Based on self-representation, the basic subspace clustering model is extended to multi-view clustering. A shared subspace is exploited to obtain the multi-view consensus information that is important for multi-view clustering tasks.

An affinity matrix is learned by arranging the adaptive neighbors based on the learned shared subspace, which uncovers the subspace structure and guarantees the clustering consistency.

We develop an iterative algorithm with fast convergence. In addition, we compare with other clustering methods on four data sets. Experiments verify that our method outperforms other approaches.

Multi-view subspace clustering

For a data set, it always distributes on certain underlying low-dimension subspaces. Subspace clustering methods aim to find the underlying subspace structure, then obtain an affinity matrix based on the subspace and perform clustering on this affinity matrix.

Let $X = [x_{1}, x_{2},..., x_{n}] \in ℜ^{d \times n}$ be n original data points, each column of which represents the d dimensional feature of the data. The self-representation-based subspace clustering^21,22 represents the data as

X = X Z + E

where $Z = [z_{1}, z_{2},..., z_{n}] \in ℜ^{n \times n}$ is the subspace-based representation matrix, the i^th column z _i can represent the data point x _i . $E \in ℜ^{d \times n}$ is the error matrix.

Generally, we can write the basic subspace clustering model as

\begin{matrix} min_{Z, E} & Ψ (Z) + α F (X, X Z) \\ s.t. & X = X Z + E \end{matrix}

where Ψ( Z ) and F( X , XZ ) denote the regularized term and loss term, respectively. After solving the optimization problem (2), the subspace representation Z of the data points will be obtained. The nonzero elements in z _i denote the corresponding data points which are from the same subspace. Therefore, the subspace structure of the data points will be learned. Then, an affinity matrix $A = (| Z | + | Z^{T} |) / 2$ can be calculated. Finally, a spectral clustering algorithm²³ is performed on the affinity matrix A to get the final clustering result.

Recently, many subspace clustering methods are extended to the multi-view domains for clustering. Generally, these algorithms can be formulated as

\begin{matrix} min_{Z_{v}, E_{v}} & \sum_{v = 1}^{V} (Ψ (Z_{v}) + α_{v} F (X_{v}, X_{v} Z_{v})) \\ s.t. & X_{v} = X_{v} Z_{v} + E_{v} \end{matrix}

where $X_{v} \in ℜ^{d^{v} \times n}, Z_{v} \in ℜ^{n \times n}, and E_{v} \in ℜ^{d^{v} \times n}$ denote the data set, the subspace representation, and the reconstruction error for the v^th view (v = 1, 2, …, V, V is the number of views), respectively. The feature dimension of the v^th view is d^v and the number of data points is n. In addition, c is the number of classes. In this model, the parameter α_v controls the penalty weight of loss for the v^th view. However, most existing multi-view subspace clustering algorithms only take the complementarity of different views into consideration but ignore the consistency. To deal with this problem, we introduce a shared subspace of original data to preserve the consistency information among different views.

The proposed method

In this section, we introduce a novel multi-view subspace clustering algorithm. According to the formula (3), previous multi-view subspace clustering algorithms generate an individual subspace representation Z _v for each view, and then execute operation on each Z _v to explore complementary information, without considering the consistency information among different views.

Instead of computing an individual Z _v for each view, we propose to learn a shared subspace for all views by solving

min_{Z} \sum_{v} {‖ X_{v} - X_{v} Z ‖}_{F}^{2} + λ_{1} {‖ Z ‖}_{F}^{2}

where λ₁ > 0 is the trade-off parameter and Z is the shared subspace representation for all views. In our objective, we adopt the Frobenius norm for Z to improve the robustness according to Lu etal.²¹ Therefore, the shared subspace representation that makes the subspace structures of original high-dimensional data be preserved results in the consistency of clustering results for data points across views.

For clustering tasks, it is an effective strategy to explore the local connectivity of original data. Recently, a new graph-based method with adaptive neighbors is used to cluster data. Given the data set $X = [x_{1}, x_{2},..., x_{n}] \in ℜ^{d \times n}$ , for the i^th data point x _i , the basic form of this model is as follows

\begin{matrix} min_{a_{i}} & \sum_{i = 1}^{n} \sum_{j = 1}^{n} ({‖ x_{i} - x_{j} ‖}_{2}^{2} a_{i j} + γ a_{i j}^{2}) \\ s.t. & \forall i, a_{i}^{T} 1 = 1, 0 \leq a_{i} \leq 1 \end{matrix}

where a _i is a vector, whose j^th element is $a_{i j} . γ > 0$ is the trade-off parameter. For more details of adaptive neighbors, please refer to the studies by Nie et al.²⁴ and Guo.²⁵

In this article, we utilize the shared subspace representation to learn the affinity matrix rather than the raw data, which avoids the influence of noise and outliers existing in original data. Thus, we can write the model as

\begin{matrix} min_{a_{i}} & \sum_{i = 1}^{n} \sum_{j = 1}^{n} ({‖ z_{i} - z_{j} ‖}_{2}^{2} a_{i j} + λ_{2} a_{i j}^{2}) \\ s.t. & \forall i, a_{i}^{T} 1 = 1, 0 \leq a_{i} \leq 1 \end{matrix}

The problem (6) is equivalent to

\begin{matrix} min_{a_{i}} & 2 t r (Z L_{A} Z^{T}) + λ_{2} {‖ A ‖}_{F}^{2} \\ s.t. & \forall i, a_{i}^{T} 1 = 1, 0 \leq a_{i} \leq 1 \end{matrix}

where λ₂ > 0 is the trade-off parameter and A is the affinity matrix with the adaptive neighbors. The Laplacian matrix is $L_{A} = D - (A + A^{T}) / 2$ . D is a diagonal matrix where $D_{i i} = \sum_{j} a_{i j}$ .

Thus, the final multi-view clustering model can be written as follows

\begin{array}{l} min_{Z, A} & \sum_{v} {‖ X_{v} - X_{v} Z ‖}_{F}^{2} + λ t r (Z L_{A} Z^{T}) \\ + λ_{1} {‖ Z ‖}_{F}^{2} + λ_{2} {‖ A ‖}_{F}^{2} \\ s.t. & \forall_{i}, a_{i}^{T} 1 = 1, a_{i} \geq 0 \end{array}

where λ, λ₁, and λ₂ are three nonnegative regularization parameters. Different from concatenating all features into one long vector, in our model, we explore the complementary and consistency information by each X _v and shared Z , respectively. Specifically, all views share the same subspace representation considering the consensus information and guaranteeing the clustering consistency across views. Also, we use the shared subspace representation to learn the affinity matrix, which improves the robustness. In the next section, we will provide an effective optimization algorithm to solve the model.

Optimization algorithm

To solve our challenging problem (8), we use an efficient iterative optimization strategy. Next, we also offer a convergence analysis.

Optimization procedure

Fixing A , update Z . When A is fixed, the problem (8) can be formulated as

min_{Z} \sum_{v} {‖ X_{v} - X_{v} Z ‖}_{F}^{2} + λ t r (Z L_{A} Z^{T}) + λ_{1} {‖ Z ‖}_{F}^{2}

Set the first derivative of equation (9) to zero and we can get the following formula

(\sum_{v} X_{v}^{T} X_{v} + λ_{1} I) Z + λ Z L_{A} - \sum_{v} X_{v}^{T} X_{v} = 0

which is a Sylvester equation with a unique solution.

Fixing Z , update A . While Z is fixed, A can be updated as follows

\begin{matrix} min_{a_{i}} & λ t r (Z L_{A} Z^{T}) + λ_{2} {‖ A ‖}_{F}^{2} \\ s.t. & \forall i, a_{i}^{T} 1 = 1, 0 \leq a_{i} \leq 1 \end{matrix}

which is equivalent to the following form

\begin{matrix} min_{a_{i}} & \sum_{i, j} d_{i j} a_{i j} + λ_{2} a_{i j}^{2} \\ s.t. & a_{i}^{T} 1 = 1, 0 \leq a_{i} \leq 1 \end{matrix}

where $d_{i j} = \frac{λ}{2} {‖ z_{i} - z_{j} ‖}_{2}^{2} . d_{i} \in ℜ^{n \times 1}$ is denoted as a vector, whose j^th element is d _ij . Then, the above problem can be written as

min_{a_{i}^{T} = 1, 0 \leq a_{i j} \leq 1} {‖ a_{i} + \frac{1}{2 λ_{2}} d_{i} ‖}_{2}^{2}

For each data point x _i , we set the number of its nearest neighbors as k, the Lagrangian function is given by

L (a_{i}, Λ, ϕ_{i}) = \frac{1}{2} {‖ a_{i} + \frac{1}{2 λ_{2 i}} d_{i} ‖}_{2}^{2} - Λ (a_{i}^{T} 1 - 1) - ϕ_{i}^{T} a_{i}

where Λ and ϕ _i are Lagrangian multipliers. The optimal solution of a _i can be obtained as follows according to Karush–Kuhn–Tucker condition

a_{i j} = {(- \frac{d_{i j}}{2 λ_{2 i}} + \frac{1}{k} + \frac{1}{2 k λ_{2 i}} \sum_{j = 1}^{k} d_{i j})}_{+}

In addition, λ₂ can be determined using the property of adaptive neighbors. Since x _i has k neighbors, we can get $a_{i j} > 0, \forall 1 \leq j \leq k, and a_{i, k + 1} = 0$ . Therefore, we have

\frac{k}{2} d_{i k} - \frac{1}{2} \sum_{j = 1}^{k} d_{i j} < λ_{2 i} \leq \frac{k}{2} d_{i, k + 1} - \frac{1}{2} \sum_{j = 1}^{k} d_{i j}

where $d_{i 1}, d_{i 2},..., d_{i n}$ are the elements with the ascending order. Therefore, we let λ_2i satisfy the right item of equation (16) to make sure that most elements of a _i own exact k nonzero elements. Finally, we compute the average value of these λ_2i to determine the parameter λ₂ which can be formulated as

λ_{2} = \frac{1}{n} \sum_{i = 1}^{n} (\frac{k}{2} d_{i, k + 1} - \frac{1}{2} \sum_{j = 1}^{k} d_{i j})

Therefore, we do not need to tune the parameter λ₂ in the experiments. For more detailed information about adaptive neighbors, please refer to the study by Nie et al.²⁴

Algorithm 1:

Algorithm for solving the model (8).

Input:

Data

X = {X_{1}, X_{2},..., X_{v}}, X_{v} \in ℜ^{d^{v} \times n}

, parameter λ, λ₁ and λ₂, number of classes c.

1: Initialize

A_{(0)}, t = 0

;

2: Repeat;

3: Update

Z_{(t)}

by solving equation (9);

4: Update

A_{(t)}

by solving equation (11);

5: t = t + 1;

6: Until convergence;

Output: Affinity matrix

A \in ℜ^{n \times n}

By iteratively solving problem (8), we can get the final affinity matrix A . Then, we can perform the spectral clustering²³ on A to get final clustering results. We summarize the detailed procedure in algorithm 1.

Convergence analysis

We prove the convergence of the proposed algorithm in this section. The objective function in problem (8) is denoted as J( Z , A ) for convenience.

For the Z updated by (9), we have

\begin{array}{l} Z_{t + 1} & = arg min_{Z} \sum_{v} ({‖ X_{v} - X_{v} Z ‖}_{F}^{2} + λ t r (Z L_{A_{t}} Z^{T}) \\ + λ_{1} {‖ Z ‖}_{F}^{2} \\ = arg min_{Z} O (Z, A_{t}) \\ \Rightarrow O (Z_{t + 1}, A_{t}) \leq O (Z_{t}, A_{t}) \end{array}

Similarly, for the A updated by (11), we can get

\begin{array}{l} A_{t + 1} & = \underset{\forall i, a_{i}^{T} 1 = 1, a_{i} \geq 0}{arg min} λ t r (Z_{t + 1} L_{A} Z_{t + 1}^{T}) + λ_{2} {‖ A ‖}_{F}^{2} \\ = \underset{\forall i, a_{i}^{T} 1 = 1, a_{i} \geq 0}{arg min} J (Z_{t + 1}, A) \\ \Rightarrow J (Z_{t + 1}, A_{t + 1}) \leq J (Z_{t + 1}, A_{t}) \end{array}

By combining (18) and (19), we can have

J (Z_{t + 1}, A_{t + 1}) \leq J (Z_{t + 1}, A_{t}) \leq J (Z_{t}, A_{t})

Consequently, we can notice that the objective value monotonically decreases at each iteration. In addition, since $J (Z_{t}, A_{t})$ is convex for each variable, $J (Z_{t}, A_{t})$ decreases with the updated rules (9) and (11) until algorithm 1 converges.

Experiments

In this section, we will compare our proposed method with other state-of-the-art multi-view clustering algorithms on four benchmark data sets.

Data set descriptions

3-Sources Text data set (http://mlg.ucd.ie/datasets/3sources.html.) is from Reuters, BBC, and the Guardian which are three online news sources, respectively. The total numbers of distinct news are 416 and they are divided into six classes. In our experiments, each source is seen as an independent view and we use 169 of distinct news to form three views.

Oxford Flowers data set is composed of 1360 examples which consist of 17 flower categories, and each category has 80 images.²⁶ For each image, three visual features (color, shape, and texture) are used to describe it. In our experiments, three different views are constructed by utilizing the χ² distance matrices for three visual features.

COIL-100 object data set is composed of 100 categories with 7200 images. Each category contains 72 images. In addition, each image is captured with five-degree rotation. In our experiments, these images are seen as four views: view 1 $[0 °,85 °]$ , view 2 $[90 °,175 °]$ , view 3 $[180 °,265 °]$ , and view 4 $[270 °,355 °]$ .

Handwritten numerals (HW) data set contains 2000 examples from 0 to 9 digit categories, and each category has 200 examples. In our experiments, we use six public features (KAR, FOU, FAC, PIX, ZER, and MOR).

We summarize the information of four data sets in Table 1 including the number of data points, views, and clusters for each data set.

Table 1.

Data sets.

Data set	3-Sources	Oxford Flowers	COIL-100	HW
#.Size	169	1360	7200	2000
#.View	3	3	4	6
#.Cluster	6	17	100	10

HW: handwritten numerals.

Experiment setup

To evaluate our proposed algorithm better, we compare the proposed algorithm with subspace-based method (SSC)²² for each single-view counterpart and report the best performance (SSC-BSV). Also, we perform the SSC²² on concatenated features (SSC-Con). In addition, we compare with other multi-view clustering approaches: co-regularized spectral clustering (Co-reg),⁷ multimodal spectral clustering (MMSC),⁸ DiMSC,¹⁹ and multi-view learning with adaptive neighbors (MLAN).⁹ The detailed information is listed as follows:

SSC-BSV: SSC is a classic subspace clustering method. In this method, subspace representation is obtained at first, and then spectral clustering method is performed on the subspace representation. In our experiments, for each single view, we run the SSC method and report the best results.

SSC-Con: We first concatenate all features into one and then perform SSC method to obtain the clustering results.

Co-reg: This method uses a centroid-based co-regularization term to make the clustering results consistent for all of the views.

MMSC: A shared Laplacian matrix is explored by integrating multi-view heterogeneous image features in this method. In addition, this method uses a non-negative constraint and improves the model robustness.

DiMSC: This method utilizes a diversity term named HSIC to explore the complementarity information of multi-view data.

MLAN: This method simultaneously performs multi-view clustering and local structure learning in which the weight for each view is automatically determined and does not need additional parameters.

Our model has three parameters λ, λ₁, and λ₂. The parameter λ₂ can be determined by using the property of adaptive neighbors according to equation (17). Therefore, we only need to tune the parameters λ and λ₁ to get their best values. In our experiments, we tune the parameters λ and λ₁ from the range ${10^{- 3} {,10}^{- 2} {,...,10}^{2} {,10}^{3}}$ and report the best results. Also, in our experiments, the number of the nearest neighbors for each example is set as 9. For compared methods, we also tune their parameters to get best results. Besides, these methods adopt spectral clustering, which utilizes k-means algorithm. Therefore, we carry out the experiments for 10 times to get the average value on each data set. We perform all experiments by using the MATLAB tool on a computer with Intel Xeon E5-2650 v2CPU (2.6 GHz) and 32G RAM.

Experiment results

We adopt two evaluation metrics, namely, accuracy (ACC) and normalized mutual information (NMI) to evaluate our clustering results. The clustering ACC and NMI on the four data sets are shown in Tables 2 and 3, respectively. Generally, our proposed method that utilizes the shared subspace information can achieve superior results apparently compared with single view approaches. At the same time, as for clustering ACC, we can see that our method can also outperform the other state-of-the-art multi-view clustering approaches in our experiments. And the clustering performance improvements over the MLAN method on four data sets are around 1.2%, 4.7%, 9.3%, and 0.4%, respectively. For clustering NMI, on the contrary, the MLAN method achieves superior clustering results compared with our proposed method on 3-Sources Text data set, but the value of our method is very close to the MLAN method. Thus, the proposed method considering the consensus information can achieve superior clustering results and is relatively robust to parameters compared with other clustering methods.

Table 2.

Clustering ACC (mean and standard deviation) of different methods.^a

Data set	3-Sources	Oxford Flowers	COIL-100	HW
SSC-BSV	0.353 (0.008)	0.356 (0.008)	0.727 (0.023)	0.767 (0.004)
SSC-Con	0.357 (0.004)	0.365 (0.006)	0.748 (0.020)	0.815 (0.009)
Co-reg	0.549 (0.030)	0.433 (0.011)	0.595 (0.017)	0.804 (0.059)
MMSC	0.694 (0.021)	0.442 (0.013)	0.559 (0.028)	0.840 (0.011)
DiMSC	0.617 (0.088)	0.431 (0.021)	0.607 (0.018)	0.531 (0.003)
MLAN	0.763 (0.000)	0.459 (0.001)	0.700 (0.002)	0.973 (0.000)
Ours	0.775 (0.000)	0.506 (0.006)	0.793 (0.018)	0.977 (0.002)

ACC: accuracy; HW: handwritten numerals; SSC-Con: SSC on concatenated features; Co-reg: co-regularized spectral clustering; MMSC: multimodal spectral clustering; DiMSC: diversity-induced multi-view subspace clustering; MLAN: multi-view learning with adaptive neighbors.

^aThe best results are in bold font.

Table 3.

Clustering NMI (mean and standard deviation) of different methods.^a

Data set	3-Sources	Oxford Flowers	COIL-100	HW
SSC-BSV	0.183 (0.012)	0.373 (0.005)	0.909 (0.009)	0.759 (0.001)
SSC-Con	0.156 (0.005)	0.410 (0.005)	0.924 (0.007)	0.850 (0.007)
Co-reg	0.510 (0.029)	0.423 (0.008)	0.870 (0.006)	0.778 (0.035)
MMSC	0.580 (0.018)	0.446 (0.004)	0.831 (0.010)	0.892 (0.008)
DiMSC	0.590 (0.047)	0.443 (0.011)	0.849 (0.005)	0.396 (0.003)
MLAN	0.656 (0.000)	0.476 (0.002)	0.895 (0.002)	0.939 (0.001)
Ours	0.651 (0.000)	0.541 (0.004)	0.941 (0.006)	0.944 (0.001)

NMI: normalized mutual information; HW: handwritten numerals; SSC-Con: SSC on concatenated features; Co-reg: co-regularized spectral clustering; MMSC: multi modal spectral clustering; DiMSC: diversity-induced multi-view subspace clustering; MLAN: multi-view learning with adaptive neighbors.

^aThe best results are in bold font.

In our experiments, the stop criteria is given by

\frac{| J^{(t + 1)} - J^{(t)} |}{J^{(t)}} {< 10}^{- 6}

where J^(t) is the objective function value in the t^th iteration. In Figure 1, the convergence of our method is experimentally illustrated by curves on two data sets. From Figure 1, we can see that our method converges fast.

Figure 1.

Convergence curves of our method on two data sets (3-Sources and Oxford Flowers data sets).

Parameter selection

In our experiments, the logarithms (base 10) of two parameters are taken. We introduce the selection of parameters according to Figure 2. For 3-Sources data set, the accuracies suddenly increase and a peak will appear when λ₁ = 10, but the results are stable within the range $10^{- 3} \leq λ \leq 1$ . For Oxford Flowers data set, when $10^{- 2} \leq λ_{1} \leq 1 and 10^{- 3} \leq λ \leq 1$ , the results are high. As for COIL-100 data set, the results change frequently. However, our model has high performance and the results change slightly when $1 \leq λ_{1} \leq 10^{3}$ . As for HW data set, we can see that our model is not sensitive when $1 \leq λ_{1} \leq 10^{3} and 10^{- 3} \leq λ \leq 1$ . Specifically, for each data set, we list the best values of λ and λ₁ in Table 4.

Figure 2.

Clustering results (ACC) on four data sets (3-Sources, Oxford Flowers, COIL-100, HW) with two parameters λ and λ₁. ACC: accuracy; HW: handwritten numerals.

Table 4.

Major parameters.

Data set	3-Sources	Oxford Flowers	COIL-100	HW
λ	0.01	1	10	1
λ ₁	10	0.1	10	10

HW: handwritten numerals.

As for clustering NMI, it is similar to the clustering results of ACC with respect to parameter selection. Therefore, we do not discuss it in this section.

Conclusion

In this article, we design a novel model for multi-view clustering by simultaneously learning shared subspace representation and affinity matrix. Our proposed method is a self-representation-oriented multi-view subspace clustering method. In this work, we make all views share the same subspace representation, which explores the multi-view consensus information and guarantees the clustering consistency across all views. Then the affinity matrix with adaptive neighbors is learned by utilizing the shared subspace representation rather than raw data, which uncovers the subspace structure and improves the robustness. Additionally, we compare with other multi-view clustering approaches on four data sets. Experimental results validate that the proposed algorithm is more effective and can achieve superior clustering results.

Footnotes

Authors’ note

This article was presented in part at the CCF Chinese Conference on Computer Vision, Tianjin, 2017. This article was recommended by the program committee.

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work is funded by the National Natural Science Foundation of China (grant nos. 61402079, 61379151, and U1636219), the Foundation for Innovative Research Groups of the NSFC (grant no. 71421001), and the Open Project Program of the National Laboratory of Pattern.

References

Liu

Tao

Cheng

. Multiview Hessian discriminative sparse coding for image annotation. Comput Vision Image Understand 2014; 118(1): 50–60.

Amini

Goutte

. A co-classification approach to learning from multilingual corpora. Mach Learn 2010; 79(1): 105–121.

Tao

. Large-margin multi-view information bottleneck. IEEE Trans Pattern Anal Mach Int 2014; 36(8): 1559–1572.

Selee

Kolda

Kegelmeyer

. Extracting clusters from large datasets with multiple similarity measures using IMSCAND. In: Technical report, Sandia National Laboratories (SAND2007-7977). 2007, pp. 87–103.

Blaschko

Lampert

. Correlational spectral clustering. In: Proceeding IEEE conference computer vision pattern recognition, Anchorage, Alaska, 2008, pp. 1–8. DOI: 10.1109/CVPR.2008.4587353.

Kumar

Daume

III

. A co-training approach for multi-view spectral clustering. In: Proceeding international conference machine learning, Washington, DC, USA, 2011, pp. 393–400.

Kumar

Rai

Daume

III

. Co-regularized multi-view spectral clustering. In: Proceeding neural information processing system, Granada, Spain, 2011, pp.1413–1421.

Cai

Nie

Huang

. Heterogeneous image feature integration via multi-modal spectral clustering. In: Proceeding IEEE conference computer vision pattern recognition, Providence, RI, 2011, pp. 1977–1984. DOI: 10.1109/CVPR.2011.5995740.

Nie

Cai

. Multi-view clustering and semi-supervised classification with adaptive neighbors. In: Proceeding AAAI conference on artificial intell, San Francisco, CA, USA, 2017, pp. 2408–2414.

10.

Tao

. Geometric mean for subspace selection. IEEE Trans Pattern Anal Mach Int 2008; 31(2): 260–274.

11.

Shu

Porikli

Ahuja

. Robust orthonormal subspace learning: efficient recovery of corrupted low-rank matrices. In: Proceeding IEEE conference computer vision pattern recognition, Columbus, OH, USA, 2014, pp. 3874–3881. DOI: 10.1109/CVPR.2014.495.

12.

Zhang

Wang

. Cross-modal subspace learning via pairwise constraints. IEEE Trans Image Proc 2015; 24(12): 5543–5556.

13.

Ding

. Robust multi-view subspace learning through dual low-rank decompositions. In: Proceeding AAAI conference on artificial intell, Phoenix, AZ, USA, 2016, pp. 1181–1187.

14.

Erfani

Baktashmotlagh

Moshtaghi

. From shared subspaces to shared landmarks: a robust multi-source classification approach. In: Proceeding AAAI conference on artificial intell, San Francisco, CA, USA, 2017, pp. 1854–1860.

15.

Peng

Kang

Cheng

. Subspace clustering via variance regularized ridge regression. In: Proceeding IEEE conference computer vision pattern recognition, Honolulu, HI, USA, 2017, pp. 2931–2940. DOI: 10.1109/CVPR.2017.80.

16.

Yin

. Multi-view clustering via pairwise sparse subspace representation. Neurocomputing 2015; 156(C): 12–21.

17.

Feng

Cai

Liu

. Multi-view spectral clustering via robust local subspace learning. Soft Comput 2017; 21(8): 1937–1948.

18.

Zhang

Liu

. Low-rank tensor constrained multiview subspace clustering. In: Proceeding IEEE international conference computer vision, Santiago, 2015, pp. 1582–1590. DOI: 10.1109/ICCV.2015.185.

19.

Cao

Zhang

. Diversity induced multi-view subspace clustering. In: Proceeding IEEE conference computer vision pattern recognition, Boston, MA, USA, 2015, pp. 586–594. DOI: 10.1109/CVPR.2015.7298657.

20.

Shi

Wang

. Multi-view subspace clustering via a global low-rank affinity matrix. In: International conference on intelligent data engineering and automated learning, Jiangsu, China, 2016, pp. 321–331. DOI: 10.1007/978-3-319-46257-8_35.

21.

Min

Zhao

. Robust and efficient subspace segmentation via least squares regression. In: European conference computer vision, Firenze, Italy, 2012, pp. 347–360. DOI: 10.1007/978-3-642-33786-4_26.

22.

Elhamifar

Vidal

. Sparse subspace clustering: algorithm, theory, and applications. IEEE Trans Pattern Anal Mach Int 2013; 35(11): 2765–2781.

23.

Jordan

Weiss

. On spectral clustering: analysis and an algorithm. In: Proceeding neural information processing system, Vancouver, 2001, pp. 849–856.

24.

Nie

Wang

Huang

. Clustering and projected clustering with adaptive neighbors. In: ACM SIGKDD international conference on knowledge discovery and data mining, New York, USA, 2014, pp. 977–986. DOI: 10.1145/2623330.2623726.

25.

Guo

. Robust subspace segmentation by simultaneously learning data representations and their affinity matrix. In: Proceding international joint conference artificial intell, Buenos Aires, Argentina, 2015, pp. 3547–3553.

26.

Nilsback

Zisserman

. A visual vocabulary for flower classification. In: Proceeding IEEE conference computer vision pattern recognition, New York, USA, 2006, pp. 1447–1454. DOI: 10.1109/CVPR.2006.42.