The comparison of significance of fuzzy community partition across optimization methods

Abstract

The analysis of fuzzy(overlapping) community structure in complex networks is an important problem in data mining of network data sets. However, due to the exist of random factors and error edges in real networks, how to measure the significance of community structure efficiently is a crucial question. In this paper, we present a novel statistical framework comparing the significance of fuzzy community structure across various optimization models. Different from the universal approaches, we calculate the similarity between a given node and its leader and employ the distribution of link tightness to derive the significance score, instead of a direct comparison to a randomized model. Based on the distribution of community tightness, a new “p-value” form significance measure is proposed for community structure analysis. Specially, the well-known approaches and their corresponding quality functions are unified to a novel general formulation, which facilitate providing a detail comparison across them. To determine the position of leaders and their corresponding followers, an efficient algorithm is proposed based on the spectral theory. Finally, we apply the significance analysis to some famous benchmark networks and the good performance verified the effectiveness and efficiency of our framework.

Keywords

Fuzzy community statistical significance optimization methods quality functions benchmark networks

1 Introduction

A common feature observed in real networks is the presence of community structures [1–5 , 18], i.e. subgraphs which are densely connected to each other while less connected to the subgraphs outside. In many scenarios, nodes in a network can belong to more than one community, called fuzzy(overlapping) communities [2 –9]. In order to estimate how much a decomposition of a network which is found by a community detection algorithm is meaningful, we need a quality measure. Consequently, for a particular measure, the community detection algorithms can be ranked. To this end, various measures have been proposed in the literature, so far. The most prevalent measure which has been used extensively in the literature is due to Newman and Girvan [18]. This measure, called modularity, quantifies how much the density of the edges inside identified communities differs from the expected edge density in an equivalent network with similar number of vertices and edges but randomized edge placement, which is taken as the null model for statistical tests. Considering the modularity measure, the community detection problem is transformed to the modularity maximization problem. Modularity function can naturally extended to fuzzy form, which used to detect overlapping communities.

Recently, some optimization algorithms based on Potts models which used to detect community structure have attracted attention. Communities correspond to Potts model spin states, and the associated system energy indicates the quality of a candidate partition. It models an inhomogeneous ferromagnetic system where each node is viewed as a labeled spin in the network. Let A be the adjacency matrix of graph G and let σ_i denote the label of the community that node i belongs to. Furthermore, the Kronecker Delta function is defined by δ (σ_i, σ_j) =1 if σ_i = σ_j and δ (σ_i, σ_j) =0, otherwise. Having the community membership labels σ, Reichardt and Bornholdt (RB) [16] proposed a generalized Hamiltonian as the core energy function, $H_{RB} ({σ}) = - \frac{1}{2} \sum_{i \neq j} (a_{ij} - γ_{RB} p_{ij}) δ (σ_{i}, σ_{j}) .$ (1) where γ_RB is the resolution parameter, $p_{ij} \in ℝ$ is the random form of adjacent matrix A = (a_ij). The Potts dynamical model is a powerful tool which has been wildly applied to uncover the dynamics of community structure in networks [8, 9].

Label propagation is another famous algorithm for community detection [27]. Briefly, the algorithm starts with randomly assigning a community label to each node. Then, each node updates its label by replacing it by the label most used by its neighbors. The other well-known optimization approaches used in community detection problem are Simulated Annealing (SA) [25], extermal optimization (DA) [13], expectation maximization [20], Bayesian inference [17], and variational Bayes [15]. For a comprehensive and comparative review on this topic we refer the reader to [4].

Although a lot of optimization method and their functions are proposed, How to determine the hidden properties of a given community [9] effectively remain unclearly answered. To answers these crucial questions, in this paper, we present a novel statistical framework comparing the significance of soft community structure across various optimization methods. Different from the universal approaches, we calculate the similarity of a given node to its leader and employ the distribution of link tightness to derive the significance score, instead of a direct comparison to a randomized model. A small example is shown in Fig. 1a, which illustrates that tighter the following nodes link to its leader, more significant the community is. Based on the distribution of community tightness, a new “p-value” form significance measure is proposed for community structure analysis. Specially, the well-known approaches and their corresponding quality functions are unified to a novel general fuzzy formulation, to provide a detail comparison across them. Then, we can choose the most suitable form of the function by set the parameters properly. To determine the position of leaders and their corresponding followers, an efficient fuzzy detection algorithm is proposed based on the spectral theory. Finally, we apply the significance analysis to some famous benchmark networks and the good performance verified the effectiveness and efficiency of our framework. The detailed procedure can be observed in Fig. 1b.

2 Community structure and the leader

Leader-driven algorithms [11, 12] constitute a special case of seed-centric approaches. These methods show that, in many real world, especial the social networks, nodes of a network are usually classified into two categories: leaders and followers. For example, considering the famous Karate network [28], nodes 1 and 33 are two significant leaders and corresponding communities are built around them (see Fig. 2a). If two leaders are removed, these communities will be split up, as they link to most followers and keep the community together. Since community are consequence of information spreading, a given community can be defined as the area in which a leader has most influence. So, one can uncover the community partition by finding all natural leaders and their corresponding followers on which they influence. We believe if followers are more tightly linked to the leader, or leader spreads more influence on their followers, this community are more significant or robust. When we use a given optimization method to evolve the community configure, the significance of communities also evolves correspondingly, which shown in Fig. 2b.

3 The fuzzy community detection algorithm based on leader position

In this study, the relative positions of leader and corresponding followers are crucial to analyze the significance situation. In order to obtain the leader of corresponding community, we extract the candidate fuzzy membership by minimizing the following objective function $J_{m} = \sum_{i = 1}^{n} \sum_{j = 1}^{k} x_{ij} {∥ d_{i} - c_{j} ∥}^{2},$ (2) where variables x_ij is the fuzzy membership that node i in community j, with ∑_jx_ij = 1. This method is similar as the famous k-means method and can be obtain both center and assignment iteratively. d_i is the ith n-dimensional data point, c_j is the n-dimensional center(leader) of the community j, and ∥* ∥ is any norm expressing the similarity between a given node and the center. One can use an iterative optimization of the objective function shown above, to obtain the network partition by the update of fuzzy membership x_ij and the community leaders c_j. This procedure converges to a local minimum or a saddle point of J_m.

Suppose K is the upper bound of number of clusters and A = (a_ij) _n×n is the adjacent matrix of a network, then, the detailed algorithm is shown in Algorithm 1 and stated straightforwardly as follows:

Step 1: for a given K

(i) Calculate the diagonal matrix D = (d_ii), where d_ii = ∑_ka_ik.

(ii) Computing the top K eigenvectors based on generalized eigensystem Ax = tDx, and then establish the eigenvector matrix E_K = [e₁, e₂, . . . , e_K].

Step 2: for each number of communities 2 ≤ k ≤ K:

(i) Establish the matrix E_K = [e₂, e₃, . . . , e_K] from the matrix E_K.

(ii) Normalize the rows of E_K to unit length using Euclidean distance norm.

(iii) Cluster the row vectors of E_K using any community detection method by minimizing Equation.(2) to obtain a membership matrix X_k and corresponding leaders.

Step 3: Maximizing the modular function: Pick the optimal number of communities k and the corresponding partition X_k that maximizes Q (X_k).

In step 1, given the adjacent matrix A = (a_ij) _n×n and a diagonal matrix D = (d_ii), d_ii = ∑_ka_ik, two matrices D^-1/2AD^-1/2 and D^-1A are used. This is motivated by Ref. [7], which uses the top K eigenvectors of the generalized eigensystem Ax = tDx instead of the K eigenvectors of the adjacent matrix. It shows that after normalizing the rows using Euclidean norm, their eigenvectors are mathematically identical and emphasize that this is a numerically more stable method. In step 2, we choose the initial the starting centers to be as orthogonal as possible which already used in k-means clustering method [6, 26]. This way of choosing centers(leaders) does not cost additional time complexity, and also improve the quality of the partition, thus at the same time reduces the need for restarting the random initialization process. In step 3, the Q function measures the quality of a given community structure organization of a network and can be used to automatically select the optimal number of communities k according to the maximum Q value [26], we will discuss the multiple optimization methods and their corresponding Q function in detail in the following section.

Algorithm 1 The fuzzy community detection algorithm

Require: Graph G with size n and volume m, the algorithm parameters, i.e. $f_{μ}^{+}$ , $f_{μ}^{-}$ and R_μ which shown in Equation. (3)

Ensure The fuzzy membership matrix X;

For a given number of communities K

repeat

Calculate the top K eigenvector matrix E_K = [e₁, e₂, . . . , e_K] and initiate the community membership X (0) = E_K.

Update the position of center and corresponding community membership matrix X to minimize the Equation.(2)

Until exceeding the maximum number of iterations

Select the optimal number of communities K and corresponding community membership according to the maximum of Q defined in Equation. (3)

4 The general and expanded formation of function Q

For many community detection algorithms, the target function Q is critical. Here, Q can be tried to be optimized has the following general fuzzy form:

$\begin{matrix} Q_{i μ} & = & \sum_{j = 1}^{n} f_{μ}^{+} a_{ij} x_{j μ} \\ - \sum_{j = 1}^{n} f_{μ}^{-} (1 - a_{ij}) x_{j μ} + R_{i μ}, \end{matrix}$ (3) and choose R_iμ such that ∂R_iμ/∂x_iμ = 0 and $R_{mu} = \sum_{i = 1}^{n} R_{i μ}$ , e.g. $R_{i μ} = \frac{2}{l_{μ}} R_{mu}$ . Interestingly, when all x_iμ are in fuzzy membership state, the H function with Q_iμ defined as Equation. (3) can be reduced to well-known measures by following considerations:

(1) Hofman and Wiggins [15]

$\begin{matrix} f_{μ}^{+} & = & log \frac{p^{in}}{p^{out}}, f_{μ}^{-} = log \frac{1 - p^{out}}{1 - p^{in}}, \\ R_{μ} & = & l_{μ} log π_{μ} . \end{matrix}$ (4) (2) Ronhovde and Nussinov [23] $f_{μ}^{+} = 1, f_{μ}^{-} = min_{μ} p_{in, μ}, R_{μ} = 0 .$ (5) (3) RB Potts model (Erdős-R $\overset{´}{e}$ nyi null model) [16] $f_{μ}^{+} = 1 - γ_{RB} p, f_{μ}^{-} = γ_{RB} p, R_{μ} = 0 .$ (6) (4) RB Potts model (Configuration null model)[16]

$\begin{matrix} f_{μ}^{+} & = & 1 - \frac{γ_{RB}}{2 m}, f_{μ}^{-} = \frac{γ_{RB}}{2 m}, \\ R_{μ} & = & \sum_{i > j} \frac{γ_{RB}}{2 m} (k_{i} k_{j} - 1) x_{i μ} x_{j μ} . \end{matrix}$ (7) where k_i is the degree of node i and m is the number of all edges in the network.

(5) Modularity [18] $f_{μ}^{+} = 1, f_{μ}^{-} = \frac{k_{i} k_{j}}{2 m}, R_{μ} = \sum_{i > j} \frac{1}{2 m} (k_{i} k_{j} - 1) x_{i μ} x_{j μ} .$ (8) where k_i is the degree of node i and m is the number of all edges in the network.

(6) Label propagation [27] $f_{μ}^{+} = 1, f_{μ}^{-} = 0, R_{μ} = 0 .$ (9) where k_i is the degree of node i and m is the number of all edges in the network.

5 Significance of community structure

It is essential to establish a detail framework analyzing the significance of community structure, since real networks own specific characteristics [5 , 29]. In this section, we discuss these important characteristics and give a detailed introduction of the framework.

Node similarity. We define the similarity of nodes i and j, sim (i, j), as the ratio between the intersection and the union of their neighborhoods Γ (i) and Γ (j), $sim (i, j) = \frac{| Γ (i) \cap Γ (j) |}{| Γ (i) \cup Γ (j) |},$ (10) By employing Equation. (10), we can calculate the expected similarity between a given node and the community leader z, $E [sim (x, z)] = \int_{ℝ^{M}} sim (x, z) Q (x | z) dx,$ (11) where Q (x|z) is a distribution of nodes in a community with leader z.

Next, using the maximum entropy principle, the statistical unbiased distribution fulfilling constraint can be obtained using the maximum entropy principle: $Q (x | z, η) = \frac{1}{Z_{η}} P_{0} (x) e^{η sim (x, z)} dx,$ (12) where P₀ (x) is the background distribution used to contrast with an alternative hypothesis: node x being part of a community, a group of nodes distinguished by enhanced mutual similarity. Z_η is the normalisation constant depends on the value of the scoring parameter η: $\frac{\partial}{\partial η} log Z_{η} = E [sim (x, z)] .$ (13)η is the parameter which used to control the “width” of a community and the larger the value of η, the smaller the expected width or scale of a given community. Specially, the distribution Q (x|z, η) is the same as the background model P₀ (x) when η = 0.

Log-likelihood score and community tightness. We define the log-likelihood score as the deviations of the community distribution from the null model $s (x | z, η) \equiv log \frac{Q (x | z, η)}{P_{0} (x)} = η sim (x, z) - log Z_{η} .$ (14) By Equation. (14), nodes which are more likely to be in a community with center z and scoring parameter η own larger positive value, than in the null background model. Given a community with nodes set {1, . . . , N}, for a given leader z and a scoring parameter η, the log-likelihood scores s (i|z, η) are positive. The community tightness is the sum of the scores of the community elements, $S (1, . . ., N | z, η) = \sum_{i} max [s (i | z, η), 0] .$ (15)

However, we can’t determine the scoring parameter η easily. Here, the tightness function of Equation.(15) can be simplified as: $S (1, . . ., N | z, η) = \sum_{i = 1}^{N} max [s (i | z) - μ, 0],$ (16) where s (i|z) = sim (i, z). By this transformation, one can control the width of community using parameter μ simply. The community tightness is determined both by the number of elements and by their similarities with the leader, that is, tighter communities with fewer elements own comparable more tightness to looser but larger communities.

Calculation of Significance score. We can the quantified the quality of the true and random communities by characterize the distribution of the tightness score p (S) from the background distribution. A new “p-value” form measure [14] can be used to define the statistical significance of score S₀, as the probability that a random chosen nodes set contains a community with score greater than or equal to S₀. This “p-value” form significance can be explained by a null hypothesis: “These nodes are drawn from the background distribution”. To test this hypothesis, we compute the statistical significance of score S₀: low value suggests that the null hypothesis is unlikely and allows for rejecting it. This method provides a new connection between statistical p-value theory and network analysis and then get an interesting significance measure.

If the network is large enough, according to the mean field theory, s_i = s (i|z) owns an approximate Gaussian-distribution with variance M, $P (s (i | z)) = \sqrt{1 / (2 M π)} exp {- s^{2} / (2 M)}$ . The distribution of the tightness S can be calculated straightforwardly using the derivation. Specifically, we need to compute the following quality function:

$\begin{matrix} Z_{c} (β, μ) = \int_{ℝ^{N}} e^{β S (1, . . ., N | z, η)} P (s_{1}) . . . P (s_{N}) {ds}_{1} . . . {ds}_{N} \\ = {[\int_{- \infty}^{+ \infty} e^{β max [s_{i} - μ, 0]} P (s) ds]}^{N} \\ = {[\int_{- \infty}^{μ} P (s) ds + \int_{μ}^{+ \infty} e^{β (s_{i} - μ)} P (s) ds]}^{N} \\ = {[(1 - H (μ)) + e^{\frac{(β)^{2}}{2} - β μ} H (μ - β)]}^{N}, \end{matrix}$ (17) where $H (x) = \int_{x}^{+ \infty} \frac{1}{\sqrt{2 π}} e^{- \frac{1}{2} y^{2}}$ is the complementary cumulative Gaussian distribution. In Equation.(17), two intervals are divided: below the score threshold μ, the score is zero, which contributes the cumulative distribution $\int_{- \infty}^{μ} ds / (2 π)^{1 / 2} exp [- s^{2} / 2]$ to the generating function. Above μ, the score is positive, which generates a contribution of $\int_{μ}^{+ \infty} ds / (2 π)^{1 / 2} exp [- s^{2} / 2 + β (s - μ)]$ . The free energy function reads $- β f (β, μ) = log [(1 - H (μ)) + e^{\frac{(β)^{2}}{2} - β μ} H (μ - β)],$ (18) and the entropy is $ω (s, μ) = - max_{β} [β s + β f (β, μ)] .$ (19) Using the distribution of community tightness, there is $log p (S, μ) ≃ N ω (S / N, μ) - \frac{1}{2} log N .$ (20)

Given a specifical community, we can calculated the significance score F using the probability that the community tightness S, p (S), larger than or equal to S, $F (S, μ) = \int_{S}^{+ \infty} p (S^{'}, μ) {dS}^{'} .$ (21) Furthermore, from the perspective of the whole network, we use the average significance score 〈F〉_Q to indicate the robustness of a partition, defined as the average value among F values of all communities partitioned by maximizing a particular quality function Q shown in Section 4.

6 Experiments on benchmark network

In this section, we will test the validity of our framework on some famous benchmark network and real networks.

GN benchmark network. First, we apply to the classical Girven-Newman benchmark [21], where the network with n = 128 nodes are divided into four 32 nodes communities. According to the establish mechanism, the community structure will fuzzier and thus when 〈k^out〉 increases, it is more difficult to identify them correctly. Hence, the significance of communities will tend to be weaker and the value of F index will also decrease. The comparison results of F value corresponding to all five optimization algorithms are shown in Fig. 3a when μ = 0.3. It can be observed that the index F has a great performance on GN benchmark: when 〈k^out〉 approaching 0, the community structure is quite strong and all corresponding 〈F〉 value is close to 1; while when the network is fuzzy enough, the corresponding 〈F〉 value of all algorithm is low, extremely for Modularity optimization method and Label propagation method, only near 0.2–0.3.

Moreover, by comparing five algorithms, we find in Fig. 3a that the 〈F〉 values corresponding to Hofman & Wiggins method is largest, and the Label propagation method is the lowest. This may because Label propagation method emphasize the simplicity of calculation too much while ignoring the accuracy of results. Furthermore, the 〈F〉 values between Modularity optimization method and Label propagation method are similar when 〈k^out〉 becomes lower. This result is similar as Ref. [22], which verifies the inner correlation between these two methods. These observations are no evidence of overall superiority of one method over another, but an example of how to compare the significance and use the different partitioning algorithms on a given network.

Furthermore, when 〈k^out〉 increases, the topology becomes fuzzier and the sizes of communities will become more and more smaller correspondingly. At the same time, as the width parameter μ increases, the significance will favor tighter communities with fewer elements. We test the Hofman & Wiggins method and Label propagation method in Fig. 3b, the value of 〈F〉 corresponding to μ = 0.3 will be larger than μ = 0.1 for all two examples. As a conclusion, we argue that when the corresponding 〈F〉 is smaller than 0.3 on average (〈k^out〉≈4), it is not safe to say there exists significant community structure for a given network.

LFR benchmark network. We also test the index on the more challenging LRF benchmark presented by Lancichinetti, Fortunato and Radicchi [3]. In this network, the average degree k = 20, maximum degree is 50 and P (k) ∝ k^γ. Maximum and minimum community sizes are 50 and 20 respectively. The significance score changes when we adjust the value of θ in LFR benchmark, and numerical results in the LFR-benchmark are shown in Fig. 4a. It can be observed that with the augment of θ, F decreases for all five optimization methods when μ = 0.3. Same as GN network, the 〈F〉 values corresponding to Hofman & Wiggins method is largest at the begining, and the Label propagation method is the lowest. However, the 〈F〉 values corresponding to Ronhovde & Nussinov method will exceed Hofman & Wiggins method when when θ is larger than 0.4. Furthermore, when θ larger than 0.32, the 〈F〉 value corresponding to Label propagation method is close to Modularity optimization method. In addition, from Fig. 4b, it can be observed the value of 〈F〉 corresponding to μ = 0.3 will larger than μ = 0.1 when we take the Hofman & Wiggins method and Label propagation method as examples.

Stochastic block model. Furthermore, we consider the famous stochastic block model which used to detect community structure by Decelle and Zhang et al. [1 , 24]. In this benchmark, ɛ = c_out/c_in is the parameter used to control the fuzziness of generated network. To verify the performance on sparse stochastic block model with low average degree, we generate a large network with N = 5000 nodes and q = 10 groups with average degree c = 8, which shown in Fig. 5. Each point in curves is the result averaged by testing 100 times. When ɛ is close to 0, it can be observed the community structure is quite strong and the corresponding 〈F〉 value of all five algorithms are very high when μ = 0.3. In contrast, when ɛ is increased close to 0.8, the network is nearly a fuzzy random one, and all 〈F〉 values are very low, near 0.1–0.3. Furthermore, we find that the 〈F〉 value of Hofman & Wiggins method will larger than all others when ɛ < 0.81, while lower than Ronhovde & Nussinov method when ɛ > 0.81. Specifically, we argue that for all algorithm when the corresponding 〈F〉 is nearly larger than 0.3 on average(ɛ ≈ 0.4), there exists significant community structure which may detectable [1]. From the results, the F shows a great ability in characterizing the significant modular structure for optimization methods as we adjust the parameter ɛ.

Real network. Finally, we show significance can also be used to rank the real network partitions obtained by different algorithmic strategies. Zachary karate club network, Collage football network and Political books network are employed as the examples. Table 1 presents the results estimated from three algorithms and we observed that they are coincided with the analysis in artificial networks. These observations are no evidence of overall superiority of one method over another, but an example of how to compare the significance and use the different partitioning algorithms on a given network.

7 Discussion

In this paper, we present a novel framework comparing the significance of fuzzy community structure revealed by multiple optimization functions. Based on the distribution of community tightness, a new “p-value” form significance measure is proposed for analysis. As part of the future work, it is necessary to take a deeper look into how different similarity measures impact the results of this method. Additionally, this framework can be easily extended to a weighted and directed form, which only needs to modify the formation of the quality function Q. As a conclusion, this method shows a great performance and deserves more attention from us.

Acknowledgments

We are grateful to the anonymous reviewers for their valuable suggestions which are very helpful for improving the manuscript. The authors are separately supported by NSFC grants 71401194, 91324203, 11131009 and “121” Youth Development Fund of CUFE grants QBJ1410.

References

Decelle

Krzakala

Moore

Zdeborová

2011

Phase transition in the detection of modules in sparse networks

Physical Review Letters 107 6 065701

Decelle

Krzakala

Moore

Zdeborová

2011

Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications

Physical Review E 84 6 066106

Lancichinetti

Fortunato

Radicchi

2008

Benchmark graphs for testing community detection algorithms

Physical Review E 78 4 046110 046115

Lancichinetti

Fortunato

2009

Community detection algorithms: A comparative analysis

Physical Review E 80 5 056117

Lancichinetti

Radicchi

Ramasco

Fortunato

2011

Finding statistically significant communities in networks

PloS One 6 4 e18961

Jordan

Weiss

2002

On spectral clustering: Analysis and an algorithm

Advances in Neural Information Processing Systems 2 849 856

Verma

Meila

2003

A comparison of spectral clustering algorithms

University of Washington Tech Rep UWCSE 1 1-18 030501

Fortunato

2010

Community detection in graphs

Physics Reports 486 3-5 75 174

Wang

Chen

2015

Measuring robustness of community structure in complex networks

Europhysics Letters 108 6 68009

10.

Daniels

2015

Social significance of community structure: Statistical view

Physical Review E 91 1 012801

11.

Wang

Zhang

2012

Potts model based on a Markov process computation solves the community structure problem effectively

Physical Review E 86 1 016109

12.

Zhang

2013

Analysis of stability of community structure across multiple hierarchical levels

Europhysics Letters 103 5 58002

13.

Duch

Arenas

2005

Community detection in complex networks using extremal optimization

Physical Review E 72 2 027104

14.

Wilson

Wang

Mucha

Bhamidi

Nobel

2013

A testing based extraction algorithm for identifying significant communities in network

The Annals of Applied Statistics 8 3 1853 1891

15.

Hofman

Wiggins

2008

Bayesian approach to network modularity

Physical Review Letters 100 25 258701

16.

Reichardt

Bornholdt

2006

Statistical mechanics of community detection

Physical Review E 74 1 016110

17.

Hastings

2006

Community detection as an inference problem

Physical Review E 74 3 035102

18.

Newman

MEJ

Girvan

2004

Finding and evaluating community structure in networks

Physical Review E 69 2 026113

19.

Newman

MEJ

2004

Fast algorithm for detecting community structure in networks

Physical Review E 69 6 066133

20.

Newman

MEJ

Leicht

2007

Mixture models and exploratory analysis in networks

Proceedings of the National Academy of Sciences of the United States of America 104 23 9564 9569

21.

Girvan

Newman

MEJ

2002

Community structure in social and biological networks

Proceedings of the National Academy of Sciences of the United States of America 99 12 7821 7826

22.

Barber

Clark

2009

Detecting network communities by propagating labels under constraints

Physical Review E 80 2 026129

23.

Ronhovde

Nussinov

2010

Local resolution-limit-free Potts model for community detection

Physical Review E 81 4 046114

24.

Zhang

Moore

2014

Scalable detection of statistically significant communities and hierarchies, using message-passing for modularity

Proceedings of the National Academy of Sciences of the United States of America 111 18144

25.

Guimera

Amaral

LAN

2005

Functional cartography of complex metabolic networks

Nature 433 7028 895 900

26.

Zhang

Wang

Zhang

2007

Identification of overlapping community structure in complex networks using fuzzy c-means clustering

Physica A: Statistical Mechanics and its Applications 374 1 483 490

27.

Raghavan

Albert

Kumara

2007

Near linear time algorithm to detect community structures in large-scale networks

Physical Review E 76 3 036106

28.

Zachary

1977

An information flow model for conflict and fission in small groups

Journal of Anthropological Research 1 452 473

29.

Nie

Yang

Cheng

Fan

2010

Measuring the significance of community structure in complex networks

Europhysics Letters 82 6 066106