Sage Journals: Discover world-class research

Abstract

This paper deals with clustering for multiview data. Multiview clustering has been a research hot spot in many domains or applications, such as information retrieval, biology, chemistry, and marketing. Exploring information from multiple views, one can hope to find a clustering that is more accurate than the ones obtained using the individual views. The aim is to search for clustering patterns that perform a consensus between the patterns from different views. Inspired by variable weighting and co-regularized strategy, this paper studies co-regularized weighting multiview clustering algorithms. Two co-regularized weighting multiview clustering algorithms are proposed from two aspects: pairwise co-regularization and centroid-based co-regularization. Experimental results obtained both on synthetic and real datasets show that the proposed algorithms outperform the main existing multiview clustering algorithms.

Keywords

Clustering co-regularization multiview

Introduction

The ever growing complexity of data constitutes a central challenge for the data mining community. This complexity may concern various aspects, such as the size of the dataset, the complexity of the features, the temporality or scalability, and more generally the multiplicity of data.

Multiview data are instances that have multiple views from different feature spaces. Data with multiple representations are quite common in scientific, economic, and social domains such as biology, medicine, marketing, and social networks. For example, the variables of the nucleated blood cell data¹ were divided into views of density, geometry, “color,” and texture, each representing a view of particular measurements on the nucleated blood cells. In the past decade, multiview data have raised interests in the so-called multiview clustering.^2–7 Multiview clustering needs to exploit the information from multiple views and takes the differences among different views into consideration in order to produce a more accurate and robust partitioning of the data.

A fundamental issue of multiview clustering is to combine multiple clustering from each individual views. Usually, there exist two main approaches in multiview learning: centralized and distributed. Distributed strategy is also known as clustering ensembles problematic or a posteriori fusion. Centralized algorithms make use of multiple representations simultaneously to discover hidden patterns from the data. Most of the existing work in multiview clustering follows the centralized approach with extensions to existing clustering algorithms.^3–5

In semisupervised learning, co-regularization makes the hypotheses that learned from different views of the data agree with each other on unlabeled data.⁸ The framework employs two main assumptions: (1) the true objective functions in each view should agree on the labels for the unlabeled data (compatibility), and (2) the views are independent given the class label (conditional independence). The compatibility assumption allows us to shrink the space of possible target hypotheses by searching only over the compatible functions. The independence assumption makes it unlikely for compatible classifiers to agree on wrong labels. In the case of clustering, this would mean that a data point in both views would be assigned to the correct cluster with high probability.

In addition, currently, most of multiview clustering algorithms consider the every view equally in terms of importance, but if there is a poor quality view, it will seriously affect the clustering results. Such as the data distribution is too concentrated to separate completely or there exists noise or outliers in a view, these will make clustering result in this view worse than other views. It is essential to assign weights to different views. Variable weighting clustering was introduced in clustering analysis.^8–10 Recently, Tzortzis and Likas⁵ proposed a weighted combination of exemplar-based mixture models that assigns different weights to the views and learns those weights automatically. Chen et al.⁷ proposed TW-k-means algorithm, considering both view weights and variable weights.

In this paper, we propose two fuzzy clustering algorithms for multiview datasets by co-regularizing the clustering hypotheses across views. Co-regularization is a well-known technique in semisupervised literature; however, it is not known on using it for unsupervised learning problems.^9,10 Inspired by the co-regularization and variable weighting strategy, we propose two novel fuzzy multiview clustering objective functions that implicitly combine multiple views of the data to achieve a better clustering performance. The weights of views are introduced to distinguish the impacts of different views in clustering. A co-regularized term is also added to its objective function to mine the hidden patterns from different views.

Related works on multiview clustering

In this section, we introduce some necessary notations and briefly introduce some multiview clustering approaches presented in the past. In the following, we consider a dataset of R views to be clustered where $X = {x_{1 r}, x_{2 r}, \dots, x_{nr}}$ denotes the data in rth view and $x_{ir}$ denotes the ith data in rth view.

Collaborative fuzzy clustering

Pedrycz¹¹ uses the FCM model and derives the collaborative variant CoFC for the multiview context. The collaboration between the views only concerns the membership degree $u_{ikr}$ . In this way data confidentiality is satisfied and bandwidth or storage costs are strongly reduced. A local inertia term is defined by the inertia of the fuzzy clusters in view r, “penalized” by a disagreement with the other view $r'$ . The disagreement between view r and view $r'$ is weighted via $α_{r, r'}$ (fixed) that denotes a priori information about the desirable collaboration between the two views

J_{CoFC} (r) = \sum_{i = 1}^{N} \sum_{k = 1}^{C} u_{ikr}^{2} d_{ikr}^{2} + \sum_{r' = 1}^{R} α_{r, r'} \sum_{i = 1}^{N} \sum_{k = 1}^{C} (u_{ikr} - u_{ikr'}) 2 d_{ikr}^{2}

(1)

where

d_{ikr}^{2} = | | x_{ir} - v_{kr} | | 2

. The first item of equation (1) is the traditional FCM item, while the second item is the collaborative item. It is important to notice that the original traditional FCM model contains a fuzzy parameter m > 1.0. However, in CoFC, it is fixed at 2.0 to make the optimization process possible. We know that the clustering performance is sensitive to the fuzzy parameter m, so the CoFC is less general than traditional FCM.

Centralized method for multiple-view clustering (CoFKM)

Cleuziou and Exbrayat⁹ presented a CoFKM. This approach is an extension of the FCM method based on a centralized strategy. This method introduces a penalty term which aims at reducing the disagreement between different views. The objective function of CoFKM is defined as follows

J_{CoFKM} = \sum_{r = 1}^{R} \sum_{i = 1}^{N} \sum_{k = 1}^{C} u_{ikr}^{m} d_{ikr}^{2} + η Δ

(2)

Δ = \frac{1}{R - 1} \sum_{r > r'} \sum_{i = 1}^{N} \sum_{k = 1}^{C} (u_{ikr'}^{m} - u_{ikr}^{m}) (d_{ikr}^{2} - d_{ikr'}^{2})

(3)

where η is parameter which allows to control the penalty associated to the disagreement.

We can see that CoFKM overcomes the shortcomings of CoFC, but it considers the every view equally in terms of importance. However, if there is a poor quality view which is very common in real-world datasets, it may affect the clustering performance.

Co-regularized weighting multiview clustering (CoWMVC)

We assume that we are given data having multiple representations (i.e. views). Given a dataset of R views, let $X = {x_{1 r}, x_{2 r}, \dots, x_{nr}}$ denote the examples in view r. The approach we propose here is an extension of the FCM method. The single view FCM clustering algorithm solves the following optimization problem

J_{FCM} = \sum_{i = 1}^{n} \sum_{K = 1}^{C} u_{ik}^{m} d_{ik}^{2}

(4)

With $\forall x_{i} \in X$ , $\sum_{k = 1}^{C} u_{ik} = 1$ where variables of the problem are centers of clusters (v_k) and membership degrees ( $u_{ik}$ ). Our multiview clustering framework builds on the standard FCM clustering with a single view, by appealing to the co-regularization framework typically used in the semisupervised learning literature.

Here, we propose two co-regularization-based approaches to make the clustering hypotheses on different views agree with each other. We construct an objective function by introducing a penalty term which aims at reducing the disagreement between different views.

Our first co-regularization scheme enforces that a view pair (v, w) should have high pairwise similarity (pair-wise co-regularization criteria). Our second co-regularization scheme enforces different views to look similar by regularizing them toward a common consensus (centroid-based co-regularization). The idea is different from previously proposed consensus clustering approaches that commit to individual clusterings in the first step and then combine them to a consensus in the second step.

Pairwise CoWMVC

We propose a co-regularized approach based on FCM, which aims at minimizing in each view and penalizing the disagreement between all pairs of views. Thus, the criterion to be minimized can be written as

J_{CoWMV} (P) = \sum_{r = 1}^{R} w_{r}^{τ} (J_{FCM} (r) + η Δ)

(5)

where views are normalized in order to get comparable inertia for all views.

w_{r}^{τ}

represents the weight of view r and Δ is a pair-wise co-regularization term. We propose

Δ = \frac{1}{R - 1} \sum_{r \neq r'} \sum_{i = 1}^{N} \sum_{k = 1}^{C} (u_{ikr'}^{m} - u_{ikr}^{m}) d_{ikr}^{2}

(6)

This co-regularized term is made to penalize our criterion. It can be considered as a divergence between views, the lower Δ is, the lower the disagreement. The advantage of equation (5) is that the pairwise co-regularized term has the same order of magnitude with local inertia, then the sum of these expressions can be considered as a coherent global criterion $J_{CoWMVC}$

J_{CoWMVC (P)} = \sum_{r = 1}^{R} w_{r}^{τ} \sum_{i = 1}^{N} \sum_{k = 1}^{C} u_{ikr η} d_{ikr}^{2}

(7)

where

u_{ikr η} = (1 - η) u_{ikr}^{m} + \frac{η}{R - 1} (\sum_{\bar{r}} u_{ik \bar{r}}^{m})

and η is a parameter which allows to control the penalty associated to the disagreement.

Given that V and W are fixed, $J_{CoWMVC (P)}$ in equation (7) reaches a local minimum only if U satisfies the following conditions

u_{ikr} = \frac{((1 - η) d_{ikr}^{2} + \frac{η}{R - 1} \sum_{\bar{r}} d_{ik \bar{r}}^{2}) 1 / 1 - m}{\sum_{k' = 1}^{C} ((1 - η) d_{ik' r}^{2} + \frac{η}{R - 1} \sum_{\bar{r}} d_{ik' \bar{r}}^{2}) 1 / 1 - m}

(8)

Given that U and W are fixed, $J_{CoWMVC (P)}$ in equation (7) reaches a local minimum only if V satisfies the following conditions

v_{kr} = \frac{\sum_{i = 1}^{N} u_{ikr η} x_{ir}}{\sum_{i = 1}^{N} u_{ikr η}}

(9)

Given that U and V are fixed, $J_{CoWMVC (P)}$ in equation (7) reaches a local minimum only if W satisfies the following conditions

w_{r} = \frac{(\sum_{i = 1}^{N} \sum_{k = 1}^{C} u_{ikr η} d_{ikr}^{2}) 1 / 1 - τ}{\sum_{r' = 1}^{R} (\sum_{i = 1}^{N} \sum_{k = 1}^{C} u_{ikr η} d_{ikr}^{2}) 1 / 1 - τ}

(10)

Since the proposed method produces a fuzzy partition for each view, in order to get a global clustering, we have to merge these results. We propose to build a global partition of data using an assignment rule

u_{ik} = \sum_{r = 1}^{R} w_{r} u_{ikr}

(11)

And assign the data x_i to the cluster k according the maximum of $u_{ik}$ .

Therefore, the pairwise co-regularized weighting multiview clustering algorithm CoWMVC(P) is summarized as follows:

Set the cluster number C; initialize the membership matrices, cluster centers, and weights for all views; and set the termination criterion.

For each view, update the cluster center using equation (9).

For each view, update the membership using equation (8).

Compute the weight for rth view using equation (10)

Repeat step (2) to step (5), until the termination criterion is satisfied.

Merge these results to get a global clustering using equation (11)

Centroid-based CoWMVC

In this section, we present an alternative regularization scheme that regularizes each view-specific set of membership $U (r)$ toward a common membership $U *$ . In contrast with the pairwise regularization approach which has $(m 2)$ pairwise regularization terms, where m is the number of views, the centroid-based regularization scheme has m pairwise regularization terms. The objective function can be written as

J_{CoWMVC (C)} = \sum_{r = 1}^{R} w_{r}^{τ} (\sum_{i = 1}^{N} \sum_{k = 1}^{C} u_{ikr}^{m} d_{ikr}^{2} + \sum_{i = 1}^{N} \sum_{k = 1}^{C} η_{r} (u_{ik}^{m} - u_{ikr}^{m}) d_{ikr}^{2})

(12)

We can rewrite the objective function as follows

J_{CoWMVC (C)} = \sum_{r = 1}^{R} w_{r}^{τ} \sum_{i = 1}^{N} \sum_{k = 1}^{C} u_{ikr η} d_{ikr}^{2}

(13)

where

u_{ikr η} = (1 - η_{r}) u_{ikr}^{m} + η_{r} u_{ik}^{m}

. Each regularization term is weighted by a parameter

η_{r}

Minimizing the objective function, we can obtain the updating equations of the cluster center, the membership, and the weights for views, respectively.

Given that V and W are fixed, $J_{CoWMVC (C)}$ in equation (13) reaches a local minimum only if U satisfies the following conditions

u_{ikr} = \frac{(d_{ikr}^{2}) 1 / 1 - m}{\sum_{k' = 1}^{C} (d_{ik' r}^{2}) 1 / 1 - m}

(14)

The updating equation of consensus membership is obtained as follows

Given that U and W are fixed, $J_{CoWMVC (C)}$ in equation (13) reaches a local minimum only if V satisfies the following conditions

v_{kr} = \frac{\sum_{i = 1}^{N} u_{ikr η} x_{ir}}{\sum_{i = 1}^{N} u_{ikr η}}

(16)

Given that U and V are fixed, $J_{CoWMVC (C)}$ in equation (13) reaches a local minimum only if W satisfies the following conditions

In centroid-based CoWMVC algorithm, as we have got the consensus membership $U *$ , we use this as our final global partition and do not need to merge the clustering results of each view.

Experiments

Datasets

We conduct experiments and validate our approach on two synthetic datasets and two real-world datasets from UCI Machine Learning repository. We give a brief description of each dataset here.

Synthetic data 1: Our first synthetic dataset consists of two views and is generated in a manner akin to Yi et al.¹² which first chooses the cluster c_i each sample belongs to and then generates each of the views $x_{i}^{(1)}$ and $x_{i}^{(2)}$ from a two-component Gaussian mixture model. These views are combined to form the sample $(x_{i}^{(1)}, x_{i}^{(2)}, c_{i})$ . We sample 1000 points from each view. The cluster centers are $μ_{1}^{(1)} = (1 1)$ , $μ_{2}^{(1)} = (2 2)$ in view 1, and are $μ_{1}^{(2)} = (2 2)$ , $μ_{2}^{(2)} = (1 1)$ in view 2. The covariance matrices for the two views are given below

\sum_{1}^{(1)} = (1 0.5 0.5 1.5), \sum_{1}^{(2)} = (0.3 00 0.6) \sum_{2}^{(1)} = (0.3 00 0.6), \sum_{2}^{(2)} = (1 0.5 0.5 1.5)

Synthetic data 2: Our second synthetic dataset consists of three views. Moreover, the features are correlated. Each view still has two clusters. The cluster means and the covariance for three views are given below

μ_{1}^{(1)} = (1 1), μ_{2}^{(1)} = (3 4), μ_{1}^{(2)} = (1 2) μ_{2}^{(2)} = (2 2), μ_{1}^{(3)} = (1 1), μ_{2}^{(3)} = (3 3) \sum_{1}^{(1)} = (1 0.5 0.5 1.5), \sum_{1}^{(2)} = (1 - 0.2 - 0.2 1), \sum_{1}^{(3)} = (1.2 0.2 0.2 1), \sum_{2}^{(1)} = (0.3 0.2 0.2 0.6) \sum_{2}^{(2)} = (0.6 0.1 0.1 0.5), \sum_{2}^{(3)} = (1 0.1 0.4 0.7)

Multiple feature dataset: The multiple features dataset consists in a set of 2000 handwritten numbers (digitalized pictures) described by six different views (Fourier coefficients, profile correlations, Karhunen–Loeve coefficients, pixel averages, Zernike moments, morphological features). Ten homogeneous classes (200 objects per class) have to be recovered.

Image segmentation dataset: The image segmentation dataset consists in a set of 2310 samples described by two different views. It has seven classes and each sample has nine features. The descriptions of the two datasets are summarized in Table 1.

Evaluation

The evaluation of a clustering result is still an open problem, since one does not always know the true label of objects. Here, we choose three well-known different evaluations to measure the results quality of the compared approaches. The two evaluation criterion are described in Table 2.

We compare the two proposed CoWMVC algorithms with three different methods. (1) Single view: using the most informative view; (2) feature concatenation: concatenating the features of each view, and running FCM on the dataset; (3) TW-k-means clustering algorithm from Chen et al.⁷; (4) CoFKM clustering algorithm from Cleuziou and Exbrayat⁹; (5) CoFCM clustering algorithm from Jiang et al.¹³ The results we obtained correspond to a mean of 20 runs for each dataset. The parameters were compared with the same initialization, the parameter is set to

m = 1.25

. Results are summarized in Tables 3 and 4.

Table 1.

Descriptions of MF, IS dataset, and the composition of each view.

Datasets	Views	Composition of each view	Dimensions	Size
MF	Mfeat-fou view	76 Fourier coefficients of the character shapes	76	2000
	Mfeat-fac view	216 profile correlations	216
	Mfeat-kar view	64 Karhunen–Loeve coefficients	64
	Mfeat-pix view	240 pixel averages in 2*3 windows	240
	Mfeat-zer view	47 Zernike moments	47
	Mfeat-mor view	6 morphological variables	6
IS	Shape view	9 features about the shape information of the seven images	9	2310
IS	RGB view	10 features about the RGB values of the seven images	10	2310

IS: image segmentation dataset; MF: multiple feature dataset. RGB is one view of the dataset.

Table 2.

Evaluation criterion.

Evaluation	Expression	Notation
NMI	$NMI = \frac{\sum_{i = 1}^{C} \sum_{j = 1}^{C} N_{i, j} log (\frac{N • N_{i, j}}{N_{i} • N_{j}})}{\sqrt{\sum_{i = 1}^{C} N_{i} log (N_{i} / N) \sum_{j = 1}^{C} N_{j} log (N_{j} / N)}}$	$N_{i, j}$ refers to the number of objects simultaneously in the cluster i and in the class j
F-Measure	$Re call = \frac{TP}{TP + FN}$	TP means true positive
	$Pr ecision = \frac{TP}{TP + FP}$	FP means false positive
	$F - Measure = \frac{2 * Re call * Pr ecision}{Re call + Pr ecision}$	FN means false negative
RI	$RI = \frac{f_{00} + f_{11}}{N (N - 1) / 2}$	$f_{00}$ refers to data points with different class label belonging to different classes;
RI		$f_{11}$ refers to data points with the same class labels belonging to the same classes

NMI: normalized mutual information; RI: rand index.

Table 3.

Experiments on two synthetic datasets.

Method		Synth data 1			Synth data 2
Method	NMI	F-Measure	RI	NMI	F-Measure	RI
Best single view	0.264	0.663	0.647	0.541	0.823	0.825
Feature Concat	0.340	0.671	0.672	0.792	0.871	0.869
TW-k-means	0.345	0.715	0.702	0.824	0.926	0.926
CoFKM	0.352	0.720	0.723	0.857	0.960	0.953
CoFCM	0.352	0.723	0.723	0.858	0.962	0.954
CoWMVC(P)	0.358	0.731	0.723	0.864	0.964	0.959
COWMVC(C)	0.356	0.724	0.726	0.882	0.964	0.965

CoFCM: collaborative fuzzy clustering from multiple weighted views; CoFKM: centralized method for multiple-view clustering; COWMVC(C): Centroid-Based Co-regularized weighting multi-view clustering; CoWMVC(P): pairwise co-regularized weighting multiview clustering algorithm; NMI: normalized mutual information; RI: rand index.

Table 4.

Experiments on two real-world datasets.

Method		MF			IS
Method	NMI	F-Measure	RI	NMI	F-Measure	RI
Best single view	0.663	0.593	0.906	0.491	0.491	0.838
Feature Concat	0.362	0.284	0.582	0.480	0.432	0.842
TW-k-means	0.746	0.796	0.698	0.481	0.476	0.795
CoFKM	0.780	0.812	0.738	0.515	0.511	0.826
CoFCM	0.782	0.824	0.739	0.518	0.526	0.828
CoWMVC(P)	0.850	0.903	0.970	0.537	0.569	0.832
COWMVC(C)	0.827	0.833	0.966	0.549	0.563	0.839

CoFCM: collaborative fuzzy clustering from multiple weighted views; CoFKM: centralized method for multiple-view clustering; COWMVC(C): Centroid-Based Co-regularized weighting multi-view clustering; CoWMVC(P): pairwise co-regularized weighting multiview clustering algorithm; IS: image segmentation dataset; MF: multiple feature dataset; NMI: normalized mutual information; RI: rand index.

From Tables 3 and 4, we can see that in most cases, the proposed CoWMVC(P) and CoWMVC(C) have a better evaluation index value under the same condition. That means, the clustering result of CoWMVC(P) and CoWMVC(C) is much closer to the actual class label than that of other three methods. For two real-world UCI datasets, it also can be noted that feature concatenation actually performs worse than single view on these two datasets.

For the four datasets we observe that the three multiview clustering algorithms have a very strong improvement over the single view approach, which is the main contribution to multiview approach.

Finally, it can be clearly seen from the results that the proposed algorithms can improve the performance of clustering.

We also test our model to see how the co-regularization parameter η affects the quality of the result. Our reported results are for the pairwise CoWMVC algorithm (Figure 1). Similar trends were observed for the centroid CoWMVC algorithm and therefore we do not report them here. For the range of η shown in the plot, these results indicate that although the performance of our algorithms depends on the co-regularized parameter η, it is still reasonably stable across a wide range of η.

Figure 1.

NMI scores of co-regularized weighting multiview clustering as a function for (a) MF dataset and (b) IS dataset. IS: image Segmentation dataset; MF: multiple feature dataset; NMI: normalized mutual information.

Conclusion and future works

In this work, we studied the clustering problem on complex multiview data. We proposed a CoWMVC approach. The approach uses the philosophy of co-regularization to make the clusterings in different views agree with each other. Experimental results show that the proposed algorithms exhibit better multiview clustering performance.

Many perspectives and generalizations will be studied further in future. For example the possibility to deal with different number of clusters in each dataset, extending the proposed framework to the case where some of the views having missing data.

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work is supported by the Specialized Research Fund for the Doctoral Program of Higher Education of China (Grant No. 20130093110009) and the National Natural Science Foundation of China (Grant No. 61373055).

References

Mui

. Automated classification of nucleated blood cells using a binary tree classifier. IEEE Trans Pattern Anal Mach Intell 1980; 2: 429–443.

Wang J, Zeng H, Chen Z, et al. ReCoM: reinforcement clustering of multi-type interrelated data objects. In: Proceedings of the 26th annual international ACM SIGIR conference on research and development in information retrieval, 2003, pp.274–281. New York, NY: ACM.

Bickel S and Scheffer T. Multi-view clustering. In: Proceedings of the IEEE international conference on data mining, 2004, pp.19–24.

Zhou D and Burges C. Spectral clustering and transductive learning with multiple views. In: Proceedings of the 24th international conference on machine learning, 2007, pp.1159–1166. New York, NY: ACM.

Tzortzis

Likas

. Multiple view clustering using a weighted combination of exemplar-based mixture models. IEEE Trans Neural Netw 2010; 21: 1925–1938.

Jing

Huang

. An entropy weighting k-means algorithm for subspace clustering of high-dimensional sparse data. IEEE Trans Knowledge Data Eng 2007; 19: 1026–1041.

Chen

et al.

TW-k-means: automated two-level variable weighting clustering algorithm for multi-view data. IEEE Trans Knowledge Data Eng 2011; 25: 932–944.

Sindhwani V, Niyogi P and Belkin M. A co-regularization approach to semi-supervised learning with multiple views. In: Proceedings of the workshop on learning with multiple views, international conference on machine learning, Bonn, Germany, 2005.

Cleuziou G and Exbrayat M. CoFKM: a centralized method for multiple-view clustering. In: Ninth IEEE international conference on data mining, 2009, pp.752–757. IEEE Computer Society.

10.

Kumar

Rai

Daume

. Co-regulized multi-view spectral clustering. Adv Neural Inform Process Syst 2011, pp. 1413–1421.

11.

Pedrycz

. Collaborative fuzzy clustering. Pattern Recogn Lett 2002; 23: 1675–1686.

12.

Zhang

. Multi-view EM algorithm for finite mixture model. In: ICAPR, Lecture notes in computer science, Springer, Berlin, Heidelbeg: Springer-Verlag, 2005, pp. 420–425.

13.

Jiang

Chung

Wang

et al.

Collaborative fuzzy clustering from multiple weighted views. IEEE Trans Cybernet 2015; 45: 688–701.