Digraph Spectral Clustering with Applications in Distributed Sensor Validation

Abstract

In various sensor networks, the performances of sensors vary significantly over time, due to the changes of surrounding environment, device hardware, and so forth. Hence, monitoring the status is essential in sensor network maintenance. Spectral clustering has been employed as an enabling technique to solve this problem. However, the traditional spectral clustering is developed for undirected graph, and the naive generalization for directed graph by symmetrization of the adjacency matrix will lead to loss of network information, and thus cannot efficiently detect bad sensor nodes while applying it for sensor validation. In this paper, we develop a generalized digraph spectral clustering method. Instead of simply symmetrizing the adjacency matrix, our method takes into consideration the network circulation while clustering the sensors. The extensive simulation results demonstrate that our method outperforms the traditional spectral clustering method by increasing the bad detection ratio from 19% to 41%.

1. Introduction

Sensor networks as an enabling technique have been deployed in many scenarios that human beings find it hard to reach, for example, in the wild area, ocean, battle fields, and so forth. These sensor networks serve an important purpose to collect information to help people understand and monitor the unreachable regions. Due to various unpredictable reasons, for example, mechanical problems, malfunctioning, damage, turning to a function-reducing mode due to low battery power, and being compromised, the sensor performances may degrade over time. Thus a periodical validation of sensor status is needed. However, in many cases, it is not possible to reach the sensor network to find the problematic senor nodes. Hence, a self-validation method becomes more practical in reality, where sensor nodes validate their goodness by monitoring the signals received from their neighbors.

In the literature, the spectral clustering method is introduced as a key technique to identify those “bad sensor nodes” [1, 2]. However, because the traditional spectral clustering algorithm only works on symmetric matrices, [1] symmetrizes the asymmetric connectivity matrix among the sensor nodes, and applies the traditional spectral clustering algorithm, this symmetrization leads to a loss of significant information directed connections between sensor nodes, which makes the bad node detection highly inaccurate.

An n by n square adjacency matrix A represents a finite graph $G = (V, E, A)$ with $n = | V |$ vertices, where V is the vertex set and E denotes the collection of all directional edges. Each entry of adjacency matrix $a_{i j}$ represents the weight (or conductance) from vertex i to vertex j.

For undirected graph, associated with symmetric adjacency matrix A, the random walk theory has been extensively studied in the literature, such as the reversible Markov chain theory. Chung and Yau [3] define a normalized Laplacian matrix with the out-degree matrix of the undirected graph and provide several ways to derive the discrete Green's function. The effective resistance defined in [4–6] is exactly the commute time of random walk on directed graph and can be computed by the pseudoinverse of the graph Laplacian.

In contrast, for directed graph, where many key properties listed above cannot hold any more, due to the asymmetry of the adjacency matrix A in directed graphs, the in-degree and out-degree distribution are no longer equal to the stationary distribution.

In this paper, we focus on the strongly connected graph, which is corresponding to the irreducible Markov Chain, and develop a digraph spectral clustering algorithm to solve the sensor node validation problem. The main contributions of this paper are summarized below. (i)

To our knowledge, this is the first work to investigate the sensor node validation as digraph spectral clustering problem. We develop theoretical results that introduce a digraph spectral clustering algorithm without losing the information of directed links among sensors.

(ii)

By evaluating our algorithm and the traditional undirected graph based spectral clustering algorithm on randomly generated large-scale synthetic data, we show a significant promotion in terms of the sensor node detection accuracy.

This paper is organized as follows. Section 2 formally defines the sensor node validation problem. In Section 3, we introduce the generalized digraph spectral clustering algorithm. In Section 4, we provide extensive evaluation results that demonstrate the efficiency and effectiveness of our proposed digraph spectral clustering algorithm. We briefly introduce the related works below in Section 5. We conclude our paper and outline the future works in Section 6.

2. Problem Definition

Given a sensor network with n static sensor nodes, denoted as $V = {s_{1}, \dots, s_{n}}$ , which is strongly connected, every sensor $s_{i}$ can reach $s_{j}$ in a number of hops. They periodically ping their one hop neighbors, where the received signal strength (RSS) values reported by a sensor to one of its neighbors correlate with the matching degree of their antenna polarizations. Thus, using the RSS as the goodness of a connection between two sensor nodes, denoted as $a_{i j}$ , the weighted adjacency matrix $A = [a_{i j}]$ captures the connectivity of sensor nodes in the sensor network. We denote $G = (V, E, A)$ as the directed graph established by the sensor network, with V as the set of all sensor nodes, E as the set of all directed edges between sensor nodes, and A as the weighted adjacency matrix.

Assume that properly working sensors of similar properties, such as radio and antenna characteristics, node environments, and power usage, will report similar measurements of RSS. Thus, those working sensors are called “good sensors” and other sensors that do not report proper measurements “bad sensors.” We illustrate these terminologies using a simple example, where sensors are indexed by their antenna orientations. In this case, nearby sensors are those which have similar antenna polarizations. Suppose that RSS values reported by a sensor to one of its neighbors correlate with the matching degree of their antenna polarizations. Then, nearby sensors, which are in good working condition, are expected to report similar RSS measurements on the same neighbor.

The sensor node validation problem in fact aims to find those “bad sensors” by detecting anomaly connection patterns received from “bad sensors.” To be precise, to identify bad sensors, we need to solve the problem of determining whether a sensor belongs to a cluster of nearby sensors. We assume that there are plentiful sensors so that clusters of nearby sensors will be large. A sensor is considered as a bad one if the sensor is in a small unique cluster or the sensor is in a small out-of-place component of a large cluster. As illustrated in [1], this can be achieved by applying spectral clustering algorithm. However, different from [1], in reality, the adjacency matrix A, in general, is asymmetric; namely, $a_{i j}$ is not necessarily equal to $a_{j i}$ . Thus the traditional spectral clustering algorithm developed for undirected graph cannot be directly applied to solve our sensor node validation problem. In this work, we provide theoretical results to generalize the spectral clustering algorithm to digraph case and apply our clustering algorithm to do sensor node validation. In the next section, we will introduce our digraph clustering algorithm.

3. Digraph Spectral Clustering Algorithm

In this section, we develop digraph spectral clustering algorithm for strongly connected digraphs by introducing the generalized objective function as $\tilde{L}$ . We will first present the generalized objective function followed by the design of the digraph spectral clustering algorithm.

3.1. Generalized Objective Function

While using $\tilde{L}$ as the objective function, we have

\begin{matrix} f^{T} \tilde{L} f = \sum_{i, j = 1}^{n} ‍ F_{Π} (i, j) g (i) (g (i) - g (j)), \end{matrix}

(1)

where

g (i) = f (i) π_{i}^{- 1 / 2}

and

F_{Π} (i, j) = π_{i} p_{i j}

is the circulation of edge

(i, j) \in E

Detailed descriptions for the extension are presented in the following section.

3.2. Design of Digraph Spectral Clustering Algorithm

In this section, we will generalize the spectral clustering algorithm using the generalized random walk theory on digraphs.

Spectral clustering is one of the most popular modern clustering algorithms, with wide applications in distributed computing systems. For example, in [7, 8], spectral clustering is used to estimate the wireless transmission cost in wireless networks.

So far, most published spectral algorithms operate on symmetric matrices. However, there are important cases when data points have pairwise relationships that are not symmetric. The link delivery probabilities in wireless network, economic transactions, and internet communications are often asymmetric. The commonly used approach for spectral clustering link data is to obtain a symmetric matrix $\tilde{A}$ from the original adjacency A and then to apply spectral clustering techniques to $\tilde{A}$ . Typical transformations used in the literature include $\tilde{A} = A + A^{T}$ and $\tilde{A} = A^{T} A$ . Zhou et al. [9] design a symmetric combinational Laplacian matrix by symmetrizing the original unnormalized Laplacian of the digraph. There are two serious problems with these methods. First, in digraph, given a partition $V = {S, \bar{S}}$ , the commonly used edge cuts, that is, $cut (\partial A) = cut (A, \bar{A}) = \sum_{i \in A, j \in \bar{A}} ‍ a_{i j}$ , are not symmetric; that is, $cut (\partial A) \neq cut (\partial \bar{A})$ . These approaches cannot distinguish the directional variations in digraph cases. Secondly, they all apply the symmetrization to either adjacency matrix or Laplacian matrix in digraph, and thus the results obtained in fact represent the characteristics of the transferred undirected graphs instead of the original directed graphs.

Now, we are in the position to introduce how our generalized normalized Laplacian matrix can be used to address the above two problems in digraph based spectral clustering algorithms.

Given a strongly connected directed graph $G = (V, E, A)$ , $P = D^{- 1} A = [p_{i j}]$ is the transition probability matrix of the corresponding irreducible Markov Chain. The circulation F is defined as a function which maps each directed edge $(i, j) \in E$ to a nonnegative real value $F : E (G) \to R^{+} \lor {0}$ , and, for each vertex k,

\begin{matrix} \sum_{i} ‍ F (i, k) = \sum_{j} ‍ F (k, j) . \end{matrix}

(2)

In [10], Chung proves that $F_{Π} (i, j) = π_{i} p_{i j}$ is in fact the circulation function of the digraph G.

The circulation essentially interprets that, for each random walk state, the in-flow traffic is equal to the out-flow traffic, even though the digraph edge weights are not symmetric. Therefore, given a graph partition $V = {S, \bar{S}}$ , one can easily check that the circulations between S and $\bar{S}$ , defined as $F_{Π} (\partial S) = \sum_{i \in S, j \in \bar{S}} ‍ π_{i} p_{i j}$ , are symmetric; that is, $F_{Π} (\bar{S}) \neq F_{Π} (\partial \bar{S})$ . This nice property of the circulation motivates us to redefine the graph cut for directed graph as below.

Definition 1 (CCut of digraph partition).

For a two-components digraph partition $V = {S, \bar{S}}$ ,

\begin{matrix} CCut (S, \bar{S}) = \frac{F_{Π} (\partial S)}{F_{Π} (S)}, \end{matrix}

(3)

where

F_{Π} (\partial S) = \sum_{i \in S, j \in \bar{S}} ‍ π_{i} p_{i j}

and

F_{Π} (S) = \sum_{i, j \in S} ‍ π_{i} p_{i j}

For a digraph partition with k disjoint components $V = {S_{1}, \dots, S_{k}}$ ,

\begin{matrix} CCut (S_{1}, \dots, S_{k}) = \sum_{m = 1}^{k} ‍ CCut (S_{m}, {\bar{S}}_{m}) . \end{matrix}

(4)

Note that one has the following.

The $\min_{S \subseteq V} CCut$ is in fact the Cheeger constant $h (G)$ [10] of digraph, when $k = 2$ .

In digraphs, the generalized normalized Laplacian matrix $\tilde{L} = Π^{1 / 2} (I - P) Π^{- 1 / 2}$ has close relationship with the circulation $F_{Π}$ as below.

Theorem 2.

For any column vector $f \in R^{n}$ , we have

\begin{matrix} f^{T} \tilde{L} f = \sum_{i, j = 1}^{n} ‍ F_{Π} (i, j) g (i) (g (i) - g (j)), \end{matrix}

(5)

where

g (i) = f (i) π_{i}^{- 1 / 2}

and

F_{Π} (i, j) = π_{i} p_{i j}

is the circulation of edge

(i, j) \in E

Proof.

Consider

\begin{array}{l} f^{T} \tilde{L} f = f^{T} Π^{1 / 2} (I - P) Π^{- 1 / 2} f \\ = \sum_{i = 1}^{n} ‍ f {(i)}^{2} - \sum_{i, j = 1}^{n} ‍ f (i) f (j) π_{i}^{1 / 2} π_{j}^{- 1 / 2} p_{i j} \\ = \sum_{i = 1}^{n} ‍ π_{i} {(\frac{f (i)}{\sqrt{π_{i}}})}^{2} - \sum_{i, j = 1}^{n} ‍ \frac{f (i)}{\sqrt{π_{i}}} \frac{f (j)}{\sqrt{π_{j}}} π_{i} p_{i j} \\ = \sum_{i, j = 1}^{n} ‍ π_{i} p_{i j} \frac{f (i)}{\sqrt{π_{i}}} (\frac{f (i)}{\sqrt{π_{i}}} - \frac{f (j)}{\sqrt{π_{j}}}) \\ = \sum_{i, j = 1}^{n} ‍ F_{Π} (i, j) g (i) (g (i) - g (j)) . \end{array}

(6)

Given a partition of $V = {S_{1}, \dots, S_{k}}$ , we can define k indicator vectors $r_{i} = (r_{1, i}, \dots, r_{n, i})^{T}$ ( $1 \leq i \leq k$ ), by

\begin{matrix} r_{i, j} = {\begin{cases} \sqrt{\frac{π_{i}}{F_{Π} (S_{j})}} & if i \in S_{j} \\ 0 & otherwise . \end{cases} \end{matrix}

(7)

Then, we set the matrix $R \in R^{n \times k}$ as the matrix containing those k indicator vectors as columns. Then, the following equations hold:

\begin{matrix} r_{i}^{T} \tilde{L} r_{i} = CCut (S_{i}, {\bar{S}}_{i}), \\ r_{i}^{T} \tilde{L} r_{i} = {(R^{T} \tilde{L} R)}_{i i} . \end{matrix}

(8)

Hence, we have

\begin{matrix} CCut (S_{1}, \dots, S_{k}) = \sum_{i = 1}^{k} ‍ r_{i}^{T} \tilde{L} r_{i} = Tr (R^{T} \tilde{L} R) . \end{matrix}

(9)

Using the similar optimization formulation as [11], we can write the problem of minimizing $CCut (S_{1}, \dots, S_{k})$ as

\begin{matrix} \min_{S_{1}, \dots, S_{k}} Tr (R^{T} \tilde{L} R) subject to {(Π R)}^{T} (Π R) = I, R as in (7) . \end{matrix}

(10)

We relax the problem by allowing the entries of matrix of matrix R to take arbitrary real values. Then, the relaxed problem becomes

\begin{matrix} \min_{R \in R^{n \times k}} Tr (R^{T} \tilde{L} R) subject to {(Π R)}^{T} (Π R) = I . \end{matrix}

(11)

This is the standard form of a trace minimization problem, and Rayleigh-Ritz theorem [12] tells us that the solution is given by choosing R as the matrix which contains the first k eigenvectors corresponding to the k smallest eigenvalues of $\tilde{L}$ as columns. Now, we need to reconvert the real valued solution matrix to a discrete partition. We use standard k-means algorithms [13] on the rows of R and get the clusters $S_{1}, \dots, S_{k}$ .

Therefore, we obtain Algorithm 1.

Algorithm 1: Spectral clustering algorithm for digraphs.

(1) Construct the one hop probability matrix P for Graph G;

(2) Compute the generalized normalized laplacian $\tilde{L} = Π^{1 / 2} (I - P) Π^{- 1 / 2}$ ;

(3) Compute the first k eigenvectors $u_{1}, \dots, u_{k}$ of $\tilde{L}$ .

(4) Let indicator matrix R be the matrix containing the vectors $u_{1}, \dots, u_{k}$ as columns.

(5) Use the k-means algorithm on the rows of the R, and cluster V into $S_{1}, \dots, S_{k}$ .

How to choose k is a general problem for all clustering algorithms. The goal is to choose the number k, such that all eigenvalues of $\tilde{L}$ , ${λ_{1}, \dots, λ_{k}}$ are very small, but $λ_{k + 1}$ is relatively large. One heuristic proposed is using eigengap of $\tilde{L}$ to compute the k. We will address this problem as part of our future work.

4. Evaluation

In this section, we conduct extensive evaluations on the performance of our proposed sensor validation algorithm.

4.1. Evaluation Settings

We consider a sensor network with $n = 1000$ sensor nodes and randomly generate the topologies among them, so the network is strongly connected. After getting a topology, we randomly generate RSS values among sensor pairs, such that nearby sensors maintain similar antenna polarizations. Then, we randomly choose a set of $b < 1000$ sensors to be the bad nodes and change their RSS values to their neighbors significantly. Then, by applying our digraph spectral clustering algorithm and the traditional undirected graph based spectral clustering algorithm on the symmetrized matrix $A^{T} A$ , we compare the detection accuracy between them. Given b as the number of “bad nodes,” we consider the number of detected correct “bad nodes” as $b^{'}$ and use $r = b^{'} / b$ to evaluate the detection accuracy.

In the evaluation we vary the size of bad node set b from $50$ to $500$ , and the degree of RSS changes from 10% to 50%, where the degree of RSS changes indicates the percentage of the change on the original RSS values. Note that, for each confuration setting, we run the simulation 100 times with randomly generated topologies and a set of randomly chosen bad nodes, to reduce the randomness introduced to the results. Below, we will present our evaluation results.

4.2. Evaluation Results

Figure 1 shows the results on how the number of bad nodes impacts the detection accuracy when using our digraph clustering algorithm and tradition spectral clustering algorithm. We observe that, as the number of “bad” nodes increases, the detection accuracy decreases almost linearly for both our digraph spectral clustering algorithm and the traditional spectral clustering algorithm. This happens because, as the more “bad nodes” exist, the harder it is for the clustering algorithms to detect them, where the malfunctioning nodes may coexist and dominate a neighborhood of sensor nodes and make the detection become harder. Overall, our method consistently outperforms the traditional method with 19% to 41% more detection accuracy.

Figure 1

The impact of number of “bad” nodes.

Figure 2 shows the results of how the change ratio on RSS affects the detection accuracy when using our digraph clustering algorithm and tradition spectral clustering algorithm. We observe that, as the change rate on RSS increases from 0.1 to 0.5, the detection accuracy of both our digraph spectral clustering method and the traditional spectral clustering method increases, which is because larger changes on RSS leads to higher dissimilarity between “bad nodes” and “good” nodes. Moreover, our method achieves 22% to 34% more detection accuracy over the traditional method.

Figure 2

The impact of change rate on RSS.

5. Related Work

In this paper, we develop generalized digraph spectral clustering algorithm for sensor status validation in distributed sensor networks. The related work for sensor node status validation has been discussed in the previous section, where in the section we primarily introduce the state-of-the-art spectral clustering methods. Spectral clustering is mainly employed in data mining and machining learning areas [14–19], where a few studies have attempted to extend the spectral clustering algorithms to digraph setting, for example, [7–9, 19–25]. However, while being applied to solve sensor status validation problem, these works have two fundamental drawbacks: (1) loss of information by symmetrizing the adjacency matrix or Laplacian matrix and (2) asymmetric cuts in digraphs.

The Loss of Information by Symmetrization of A or L. The algorithms proposed in [19, 20, 24] symmetrize the adjacency matrix or Laplacian matrix, so the traditional spectral clustering algorithm (for undirected graph) $[]$ can be used. However, by using the symmetrization to either adjacency matrix or Laplacian matrix in digraphs, they in fact apply existing undirected graph based spectral clustering algorithms in transferred undirected graphs, with $\tilde{A} = A^{T} + A$ , $\tilde{A} = A^{T} A$ , and so forth, instead of the original directed graphs. Therefore, the results will lose some information from the original digraph, and the clustering obtained will not be accurate. Many papers have explicitly addressed this problem, such as [23, 24].

In particular, in [9] by Zhou et al., they define the same cut function as $CCut (S, \bar{S}) = \sum_{i \in S, j \in \bar{S}} ‍ π_{i} p_{i j}$ and get good performances. However, the proposed algorithm still has problems. First, they use the symmetrized Laplacian as objective function, which will result in losing information from original digraph. Secondly, they proposed the symmetrized Laplacian and cut definition as heuristics and did not explicitly give the circulation interpretation of the algorithm.

Asymmetric Cuts in Digraphs. The directionality of directed links is crucial information, where [23–25] define the cluster cut in an asymmetric fashion. In digraph, given a partition $V = {S, \bar{S}}$ , the edge cuts defined in many papers (for instance, $cut (\partial A) = cut (A, \bar{A}) = \sum_{i \in A, j \in \bar{A}} ‍ a_{i j}$ , etc.) are not symmetric; that is, $cut (\partial A) \neq cut (\partial \bar{A})$ . These approaches cannot distinguish the directional difference in digraph cases.

6. Conclusion

In this paper we propose a generalized digraph spectral clustering algorithm for validating sensor status in distributed sensor networks. The proposed DSC algorithm considers the network flow circulation while performing the sensor node clustering; thus it preserves the directed link information, which is lost in the traditional spectral clustering method. In our extensive simulations, digraph spectral clustering algorithm demonstrates 19% to 41% mode detection accuracy over the traditional spectral clustering based method.

There exist several future directions. Firstly, we are planning to explore the applicability of digraph spectral clustering algorithm in other scenarios, for example, social network community detection, and so forth. Secondly, when the sensor network size scales up, a real-time statistical query for the number of “bad” nodes becomes time consuming. In this case, we consider applying sampling techniques [26–34] to perform fast and accurate estimation for it. Last but not least, we are interested in applying our digraph spectral graph method to detect node and link failures in large-scale cloud computing environments [35–38].

Footnotes

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgment

This work is partially supported by the 863 National Hi-tech Research and Development Program (2011AA01A103).

References

Kung

H. T.

Vlah

A spectral clustering approach to validating sensors via their peers in distributed sensor networks

Proceedings of the 18th IEEE International Conference on Computer Communications and Networks (ICCCN '09)

August 2009

1 7

10.1109/ICCCN.2009.5235225

2-s2.0-70449103476

Kung

H. T.

Vlah

Validating sensors in the field via spectral clustering based on their measurement data

Proceedings of the 2009 IEEE Military Communications Conference (MILCOM '09)

October 2009

1 10

10.1109/MILCOM.2009.5379940

2-s2.0-77951493537

Chung

Yau

Discrete green's functions

Journal of Combinatorial Theory A 2000 91 1-2 191 214

10.1006/jcta.2000.3094

MR1779780

2-s2.0-0034215984

Chandra

A. K.

Raghavan

Ruzzo

W. L.

Smolensky

The electrical resistance of a graph captures its commute and cover times

Proceedings of the 21st Annual ACM Symposium on Theory of Computing

May 1989

574 586

2-s2.0-0024864589

Doyle

P. G.

Snell

L. J.

Random Walks and Electric Networks 1984

Washington, DC, USA

Mathematical Association of America

Tetali

Random walks and the effective resistance of networks

Journal of Theoretical Probability 1991 4 1 101 109

10.1007/BF01046996

MR1088395

ZBLl0722.60070

2-s2.0-0001418604

Zhang

Z.-L.

Random walks on digraphs: a theoretical framework for estimating transmission costs in wireless routing

Proceedings of the 2010 Conference on Computer Communications (INFOCOM '10)

IEEE

1 9

Zhang

Z.-L.

Random walks and green's function on digraphs: a framework for estimating wireless transmission costs

IEEE/ACM Transactions on Networking 2013 21 1 135 148

10.1109/TNET.2012.2191158

2-s2.0-84873995938

Zhou

Huang

Schölkopf

Learning from labeled and unlabeled data on a directed graph

Proceedings of the 22nd International Conference on Machine Learning (ICML '05)

New York, NY, USA

ACM

1036 1043

10.

Chung

Laplacians and the Cheeger inequality for directed graphs

Annals of Combinatorics 2005 9 1 1 19

10.1007/s00026-005-0237-z

MR2135772

2-s2.0-17444366585

11.

von Luxburg

A tutorial on spectral clustering

2006 TR-149

Max Planck Institute for Biological Cybernetics

12.

Lutkepohl

Handbook of Matrices 1997

Wiley

MR1433592

13.

Hartigan

J. A.

Wong

M. A.

A K-means clustering algorithm

Applied Statistics 1979 28 100 108

14.

Chen

Wang

Zhang

Z. L.

Influence diffusion dynamics and influence maximization in social networks with friend and foe relationships

Proceedings of the 6th ACM International Conference on Web Search and Data Mining (WSDM '13)

February 2013

New York, NY, USA

ACM

657 666

10.1145/2433396.2433478

2-s2.0-84874263116

15.

Chen

Wang

Zhang

Z.-L.

Voter model on signed social networks

Internet Mathematics 2014 99 1 50

16.

Zhang

Bao

Mutual or unrequited love: identifying stable clusters in social networks with uni- and bidirectional links

Proceedings of the 9th Workshop on Algorithms and Models for the Web Graph (WAW '12)

2012

113 125

17.

Zhang

Z.-L.

Boley

The routing continuum from shortest-path to all-path: a unifying theory

Proceedings of the 31st International Conference on Distributed Computing Systems (ICDCS '11)

July 2011

IEEE

847 856

10.1109/ICDCS.2011.57

2-s2.0-80051914608

18.

Zhang

Z.-L.

Boley

From shortest-path to all-path: the routing continuum theory and its applications

IEEE Transactions on Parallel and Distributed Systems 2014 25 7 1745 1755

10.1109/TPDS.2013.203

19.

Zhou

Burges

C. J. C.

Spectral clustering and transductive learning with multiple views

Proceedings of the International Conference on Machine Learning (ICML '07)

June 2007

New York, NY, USA

1159 1166

10.1145/1273496.1273642

2-s2.0-34547970997

20.

Chi

Song

Zhou

Hino

Tseng

B. L.

Evolutionary spectral clustering by incorporating temporal smoothness

Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '07)

August 2007

San Jose, Calif, USA

153 162

10.1145/1281192.1281212

2-s2.0-36849005505

21.

Zhang

Random walks on digraphs, the generalized digraph Laplacian and the degree of asymmetry

Algorithms and Models for the Web-Graph 2010 6516 74 85 Lecture Notes in Computer Science

10.1007/978-3-642-18009-5_8

MR2786422

22.

Zhang

Digraph Laplacian and the degree of asymmetry

Internet Mathematics 2012 8 4 381 401

10.1080/15427951.2012.708890

MR3010000

ZBLl1258.05072

23.

Meila

Pentney

Clustering by weighted cuts in directed graphs

Proceedings of the 7th SIAM International Conference on Data Mining (SDM '07)

April 2007

SIAM

135 144

2-s2.0-70449095774

24.

Veloso

M. M.

Kambhampati

Spectral Clustering of Biological Sequence Data 2005

AAAI Press/The MIT Press

25.

Yan

Huang

Jordan

M. I.

Fast approximate spectral clustering

Proceedungs of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '09)

July 2009

ACM

907 915

10.1145/1557019.1557118

2-s2.0-70350657266

26.

Gjoka

Butts

C. T.

Kurant

Markopoulou

Multigraph sampling of online social networks

IEEE Journal on Selected Areas in Communications 2011 29 9 1893 1905

10.1109/JSAC.2011.111012

2-s2.0-80053654964

27.

Golnari

Zhang

Z.-L.

What drives the growth of youtube ? measuring and analyzing the evolution dynamics of youtube video uploads

Proceedings of the 6th ASE International Conference on Social Computing (SocialCom '14)

2014

1 10

28.

Steiner

Bao

Wang

Zhu

Region sampling and estimation of geosocial data with dynamic range calibration

Proceedings of the 30th International Conference on Data Engineering (ICDE '14)

2014

1 12

29.

Steiner

Wang

Zhang

Z.-L.

Bao

Dissecting foursquare venue popularity via random region sampling

CoNEXT'12 Student Workshop: The 8th International Conference on Emerging Networking Experiments and Technologies 2012 21 22

30.

Steiner

Wang

Zhang

Z.-L.

Bao

Exploring venue popularity in foursquare

Proceedings of the 5th IEEE International Workshop on Network Science for Communication Networks (NetSciCom '13)

2013

31.

Mohaisen

Luo

Kim

Zhang

Measuring bias in the mixing time of social graphs due to graph sampling

Proceedings of the 2012 IEEE Military Communications Conference (MILCOM '12)

November 2012

1 6

10.1109/MILCOM.2012.6415714

2-s2.0-84874294472

32.

Ribeiro

Towsley

Estimating and sampling graphs with multidimensional random walks

Proceeding of the 10th Internet Measurement Conference (IMC '10)

November 2010

New York, NU, USA

ACM

390 403

10.1145/1879141.1879192

2-s2.0-78650856898

33.

Yuan

Deng

Zeng

Wang

Dai

Yang

Oceanst: a distributed analytic system for large-scale spatiotemporal mobile broadband data

VLDB 2014 Demo 2014 1 4

34.

Zhou

Adhikari

V. K.

Zhang

Z.-L.

Counting Youtube videos via random prefix sampling

Proceedings of the 2011 ACM SIGCOMM Conference on Internet Measurement Conference (IMC '11)

November 2011

Berlin, Germany

371 379

10.1145/2068816.2068851

2-s2.0-82955197339

35.

Qian

Huang

Chen

Swan: end-to-end orch estration for cloud network and wan

Proceedings of the IEEE 2nd International Conference on Cloud Networking (CloudNet '13)

2013

IEEE

236 242

36.

Qian

Medhi

On energy-aware aggregation of dynamic temporal demand in cloud computing

Proceedings of the 4th IEEE International Conference on Communication Systems and Networks, (COMSNETS '12)

January 2012

1 6

10.1109/COMSNETS.2012.6151370

2-s2.0-84858033255

37.

Qian

Medhi

Server operational cost optimization for cloud computing service providers over a time horizon

Proceedings of the 11th USENIX Conference on Hot Topics in Management of Internet, Cloud, and Enterprise Networks and Services

2011

38.

Qian

Medhi

Trivedi

T. T.

A hierarchical model to evaluate quality of experience of online services hosted by cloud computing

Proceedings of the 12th IFIP/IEEE International Symposium on Integrated Network Management (IM '11)

May 2011

105 112

10.1109/INM.2011.5990680

2-s2.0-80052713647