Sage Journals: Discover world-class research

Abstract

Revealing the structural features of social networks is vitally important to both scientific research and practice, and the explosive growth of online social networks in recent years has brought us dramatic advances to understand social structures. Here we proposed a community detection approach based on user interaction behavior in weighted dynamic online social networks. We researched interaction behaviors in online social networks and built a directed and unweighted network model in terms of the Weibo following relationships between social individuals at the very beginning. In order to refine the interaction behavior, level one fuzzy comprehensive evaluation model was employed to describe how closely individuals are connected to each other. According to this intimate degree description, weights are tagged to the prior unweighted model we built. Secondly, a heuristic community detection algorithm for dynamic network was provided based on the improved version of modularity called module density. As for the heuristic rule, we chose greedy strategy and merely fed the algorithms with the changed parts within neighboring time slice. Experimental results show that the proposed algorithm can obtain high accuracy and simultaneously get comparatively lower time complexity than some typical algorithms. More importantly, our algorithm needs no a priori conditions.

1. Introduction

Research of complex networks has a wide attraction of scientific research. Some scholars focus on semantic network [1] which means a form of human knowledge relationships [2, 3], and some other researchers obtain algorithms for community detection, which is a fundamental problem in network analysis. Detecting communities is of great importance in sociology, biology, and computer science, disciplines where systems are often represented as graphs. This problem is so complex that we did not find a satisfying solution by far, despite the huge effort of a large interdisciplinary community of scientists working on it over the past few years [4].

Traditional methods treat community detection as graph partitioning and clustering problems. Kernighan-Lin (KL) algorithm [5] is an early proposed graph partitioning based method, which uses greedy strategy to optimize the divided structure according to the edges between communities; the evaluation function is defined as the number of edges in the two communities minus the number of edges connecting them. KL algorithm can divide target network into specific size subsets, but the number of subsets is needed a priori. As KL algorithm, the majority of graph partitioning methods require too much prior knowledge, and most variants of the graph partitioning problem are NP-hard. More importantly, the partitioning results usually consisted of clusters with similar size. Taking all these shortcomings into consideration, graph partitioning algorithms are not fully suitable for detecting communities in big social networks [6].

Unlike graph partitioning methods, clustering does not need much prior knowledge, so clustering based methods can be more effective for community detection. For example, hierarchical clustering algorithms are known as suitable approaches to be used to detect communities in social networks. Social networks naturally have hierarchical structures, which mean networks with hierarchical structure may display several levels of grouping of the nodes. Small clusters are contained within large ones, which are in turn included in larger ones, and so on [7]. Some clustering algorithms try to divide a target network naturally into many parts based on the similarity and intensity of each node, and these works can be classified into two kinds: aggregation algorithms and divisive algorithms. For example, the Newman fast algorithm [8] is a typical aggregation method based on greedy strategy, mainly focusing on the network with numerous vertices. Typical divisive algorithm is as method [9], which treats the network as a huge community and then divides it step by step. The GN algorithm proposed by Girvan and Newman [10] is another example of divisive method. The main idea is to remove the edges with maximal betweenness [11], which provides a rule of edge detection for the small community in the network.

Except for traditional methods, community detection in weighted, directed, or dynamic networks is also widely discussed separately. Few works took these features together into consideration, and, comparing to the research on unweighted and undirected networks, no significant achievements had been made. Most networks in real world can be seen as heterogeneous ones because the nodes have different characteristics and the relationship between two nodes usually means different influences to them. In fact, heterogeneity of nodes can be represented in weighted, directed network models. On the other hand, dynamic network analysis had drawn a lot of attention in the past few years. As we know, a dynamic community detection method was firstly proposed by Hopcroft et al. [12]; it divided the dynamic network into several static parts and then obtained the hierarchy clustering results with the method of cosine similarity. They selected relatively slightly changed parts from the results as natural communities, and dynamic network evolution was studied on the above bases. Chakrabarti et al. [13] tried to analyze dynamic network via continuous iteration; the concepts called snapshot quality and history cost were proposed in this work to satisfy both of the following: “clustering results should keep pace with the current network topology as far as possible” and “the results should minimize the difference between the clustering processes.” In order to understand the dynamic of social network structure, Shan et al. build a mathematical model [14], and an incremental identification method is also proposed, which analyzes the nodes incrementally rather than just redivide all nodes in the network, to improve actual operation efficiency of the algorithm. Ning et al.'s work is also a valuable attempt [15]; they put forward the incidence vector concept to present the dynamic process of the network through incrementally updated characteristic value system. Chen et al. approach a novel multiobjective algorithm for detecting communities in dynamic networks, which provides a solution to keep a balance between accuracy and efficiency [16]. In Yu et al.'s work, a feasible algorithm for the identification of overlapping modules in PPI networks is proposed, which detects communities with more flexibility [17].

As we can see, excellent research works had been done on community structure detection; however, the existing algorithms either need too much a priori knowledge or are not good enough to handle complex network models. The research on weighted, directed, and dynamic networks may provide us more ways to understand real networks. And designing fast and reliable network community detection algorithms will continue to be the attractive area for us in the future.

On the other hand, with regard to community detection methods, quality functions are very important, because an algorithm needs a stopping criterion. Nowadays, the most commonly known function is called network modularity [18], which represents one of the first attempts to achieve a first principle understanding of the clustering problem, and it embeds in its compact form all essential ingredients and questions, from the definition of community to the choice of a null model and to the expression of the strength of communities and partitions [4]. Because of this, we are going to employ modularity in this paper to evaluate the division results and we take online social network as our target network; a detailed analysis on the dynamic process of how an online social network evolves will be shown.

The rest of this paper is organized as follows. In Section 2 we build a weighted social network based on interaction behavior analysis of online social network. Section 3 gives a community detection algorithm for dynamic directed weighted networks. Section 4 compares our method with well-known algorithms through experiments. Finally, Section 5 makes the conclusions.

2. Social Network Weighted Method Based on User Interaction Behavior Research

When studying network structure, we first focus on the interactions between individuals and define an approach to weight the edges to involve more online social network information into our network model. We assume that edge weights represent the interaction intensity between nodes, which will reflect both the existing parameters in physical world and calculating features defined by ourselves. The calculating features will reveal the underlying characteristics of online social networks. For a network containing complex social relationships such as similarity and intimacy, we try to transform one or more of the interactive attributes to be weights, to measure the intimacy between users. The method employed a weight with fuzzy synthetic evaluation model according to the interaction of followers and the frequency of interaction over a period of time. The theoretical basis of weighting method is fuzzy comprehensive evaluation method [19], which transforms qualitative evaluation to quantitative evaluation according to the fuzzy membership degree theory. For short, it is a method to provide an overall evaluation to objects which are restricted by multiple factors with fuzzy comprehensive evaluation on multiple factors [20].

In dynamic online social networks, such as Facebook [21] and microblog [22], the behaviors, for example, “comment,” “forward,” “like,” and “at,” can reflect the proximity between users during a period of time. In our comprehensive evaluation method, we select level one comprehensive evaluation model [23] as the method to measure the relationship. Here we assume B as the collection of user behaviors:

\begin{matrix} B = \{comment, forward, like, at\} . \end{matrix}

(1)

It is crucial to allocate the weight for the above factors. The weights of the behaviors are confirmed by statistical methods. The data used in the paper comes from NLP laboratory of Beijing Institute of Technology which collects data of Sina Microblogs in recent two years [22]. The dataset contains 1.5 million users and 50 million microblogs. There are 86,341,256 interaction behaviors, including 21,510,466 “comments,” 36,538,478 “like,” 23,299,712 “forward,” and 4,992,600 “at.” So we can easily figure out the proportion shown as the following vector A and the member of A is the proportion of each kind of interactions:

\begin{matrix} A = (0.2491,0.4232,0.2699,0.0578) . \end{matrix}

(2)

Then we assume a $4 \times n$ evaluation matrix R for a single factor, which is formulated as follows:

\begin{matrix} R_{x} = [\begin{bmatrix} r_{11} & r_{12} & \dots & r_{1 n} \\ r_{21} & r_{22} & \dots & r_{2 n} \\ r_{31} & r_{32} & \dots & r_{3 n} \\ r_{41} & r_{42} & \dots & r_{4 n} \end{bmatrix}] . \end{matrix}

(3)

Here n represents the total number of users x followed and $r_{1 n}$ denotes how many times user x did “comment” action. $r_{2 n}$ represents how many times user x did “forward” action, $r_{3 n}$ represents how many times user x did “like” action, $r_{4 n}$ denotes how many times user x did “at” action, and $n | x$ represents that user n was followed by user x. User x's behavior and connection relationship with other users are shown in Table 1.

Table 1

Evaluation of user x's behaviors.

Behaviors	Followed
Behaviors	Followed_id₁	Followed_id₂	⋯	Followed_ ${id}_{n}$
Comment	$r_{11}$	$r_{21}$	⋯	$r_{1 n}$
Forward	$r_{12}$	$r_{22}$	⋯	$r_{2 n}$
Like	$r_{13}$	$r_{23}$	⋯	$r_{3 n}$
At	$r_{14}$	$r_{24}$	⋯	$r_{4 n}$

Matrix R for user x can be calculated as follows:

\begin{matrix} r_{1 n} = \frac{commentnum (n | x)}{\sum_{i = 1}^{n} commentnum (i | x)}, \\ r_{2 n} = \frac{forwardednum (n | x)}{\sum_{i = 1}^{n} forwardednum (i | x)}, \\ r_{3 n} = \frac{likenum (n | x)}{\sum_{i = 1}^{n} likenum (i | x)}, \\ r_{4 n} = \frac{atnum (n | x)}{\sum_{i = 1}^{n} atnum (i | x)}, \end{matrix}

(4)

where

commentnum (n | x)

counts how many times user x “comments” on user n, and

forwardednum (n | x)

counts how many times user x “forwards” n's articles,

likenmu (n | x)

counts how many times user x sets “like” tag to user n's articles, and

atnum (n | x)

counts how many times user x did “at” actions to user n when they post their own articles or comments.

Then we explain the evaluation model with Figure 1, which is generated from nodes of Sina Microblog. We take user 003 as an example; detailed information can be discovered in Table 2.

Table 2

Statistics of user 003.

User_id	Attention_id	Commentnum.	Transmitnum.	Likenum.	Atnum.
003	009	1	1	4	1
003	025	0	2	5	0
003	032	5	11	22	7

Figure 1

The network of test example.

We can calculate the evaluation matrix $R_{003}$ of node 003; for example, the user did “at” behavior to others for eight times during a period of time, particularly doing “at” to user 032 for seven times. So we can get the following data:

\begin{array}{l} r_{4 (032 | 003)} = (atnum (032 | 003)) \\ \cdot (atnum (009 | 003) + atnum (023 | 003) \\ + {atnum (032 | 003))}^{- 1} \\ = 0.875 . \end{array}

(5)

If we calculate all the data according to the above method, we can get the matrix as follows:

\begin{matrix} R_{003} = [\begin{bmatrix} 0.1667 & 0.0714 & 0.1290 & 0.1250 \\ 0 & 0.1429 & 0.1613 & 0 \\ 0.8333 & 0.7857 & 0.7097 & 0.8750 \end{bmatrix}] . \end{matrix}

(6)

Finally, we can get level one comprehensive evaluation model $B_{003}$ :

\begin{array}{l} B_{003} = A \cdot R_{003}^{'} \\ = (\begin{pmatrix} 0.2491 & 0.4232 & 0.2699 & 0.0578 \end{pmatrix}) \\ \cdot [\begin{bmatrix} 0.1667 & 0 & 0.8333 \\ 0.0714 & 0.1429 & 0.7857 \\ 0.1290 & 0.1613 & 0.7097 \\ 0.1250 & 0 & 0.8750 \end{bmatrix}] \\ = (\begin{pmatrix} 0.1138 & 0.1040 & 0.7822 \end{pmatrix}) . \end{array}

(7)

The result represents the weight value of “at” behavior from user 003 to the three neighboring users, that is, 0.1138, 0.1040, and 0.7822.

3. Community Partition Algorithm for Dynamic Directed Weighted Networks

This paper proposed an algorithm which needs no a priori conditions, the algorithm firstly set up heuristic rules of the dynamic changes of the social network based on the basic nature of community structure and employed an improved module density function D combined with weighted network to ensure the accuracy of the algorithm [24].

We give some basic definitions which will be used in the remaining part.

For directed weighted network $G^{w} = (V, E, W)$ , where V is the collection of nodes inside $G^{w}$ , E is the collection of edges inside $G^{w}$ , $e_{i j} \in E$ is an edge from node i to node j, W is the collection of weights of $G^{w}$ 's edges, and $w_{i j}$ is the weight of edge $e_{i j}$ , we define the following.

$(1)$ The in-weight and out-weight of nodes in the network: the symbol $w_{in}^{i}$ represents the sum of all edges weights ending with node i, and $w_{out}^{i}$ denotes the sum of all edges weights starting with node i.

$(2)$ The inside-edge weight and outside-edge weight in the community: symbol $W_{in} (c_{i})$ means the total weight of edges inside a community, $W_{out} (c_{i})$ discloses the total weight of edges outside a community, which are shown in formula 8. An edge inside a community means that both of the two ends of the edge fall into the community:

\begin{matrix} W_{in} (c_{i}) = \sum_{i, j \in c_{i}} w_{i j}, \\ W_{out} (c_{i}) = \sum_{i \notin c_{i}, j \notin c_{i}} w_{i j} . \end{matrix}

(8)

$(3)$ Function L in directed weighted networks: function L is a polymorphism function with two variants for directed and weighted networks. As shown in the following,

\begin{matrix} L (c_{1}, c_{2}) = \sum_{i \in c_{1}, j \in c_{2}} w_{i j}, \end{matrix}

(9)

L calculates the sum weights of edges in community

c_{1}

and community

c_{2}

, where

c_{1}

and

c_{2}

are two collections of disjointed nodes or two communities who have no overlapping parts between them.

The second function also has two variants as shown in the following:

\begin{matrix} L (i, c_{i}) = \sum_{j \in c_{i}} w_{i j}, \end{matrix}

(10)

where i represents a node. Function L counts neighbors of node i inside community

c_{i}

or the increased number of nodes inside community

c_{i}

when a new node i is added into the community. Furthermore,

L_{in} (i, c_{i})

counts the number of edges from node i to community

c_{i}

, where

L_{out} (i, c_{i})

discloses the number of edges from community

c_{i}

to node i.

$(4)$ $D^{w d} (c_{i})$ formulates module density of community $c_{i}$ ; it can be calculated by the following:

\begin{matrix} D^{w d} (c_{i}) = \frac{W_{in} (c_{i}) - W_{out} (c_{i})}{2 n} . \end{matrix}

(11)

Here n is the number of the nodes of the network.

Then we define $Δ D^{w d}$ , which means the increased module density when a node i is added into community $c_{i}$ to constitute a new community $c_{i}^{'}$ ; it can be calculated by the following:

\begin{matrix} Δ D^{w d} = D_{t_{n}}^{w d} - D_{t_{n - 1}}^{w d} . \end{matrix}

(12)

In order to deal with the affection of direction factor in the network, we can assume a node i with high out-degree and low in-degree, while node j is on the contrary. Obviously, if there is an edge between node i and node j, the direction of the edge is most possibly from node i to node j. Similarly, if there is an edge from node i to node j, the module density of the community will be greatly influenced, which means that node i prefers to be a member of community $c_{i}$ . Instead, if the direction is from node j to node i, then only marginal influence is impressed on the whole community and we can ignore it.

So in the paper, we use a parameter β $(0 \leq β < 1)$ to represent the weight of in-degree. It means a node i waiting to join community $c_{i}$ , connected with a directed edge. If the edge is an out-degree edge for node i, then the importance of the node i to the community is 1; otherwise, it is β $(0 \leq β < 1)$ . For a node of a community, an in-degree edge has more influence on the community than an out-degree edge. Now we assume the situation of node i joining into community $c_{i}$ to form a new community $c_{i}^{'}$ . As shown in the following, $W_{in}^{t} (c_{i})$ calculates $W_{in} (c_{i})$ of time t, and $W_{out}^{t} (c_{i})$ calculates $W_{out} (c_{i})$ of time t:

\begin{array}{l} W_{in}^{t + 1} (c_{i}) = W_{in}^{t} (c_{i}) + β L_{in}^{t} (i, c_{i}) + L_{out}^{t} (i, c_{i}), \\ W_{out}^{t + 1} (c_{i}) = W_{out}^{t} (c_{i}) - (β + 1) L_{in}^{t} (i, c_{i}) \\ - (β + 1) L_{out}^{t} (i, c_{i}) + β (w_{in}^{i} + w_{out}^{i}) . \end{array}

(13)

So we can calculate the module density following formula 14. To make the structure of the new community $c_{i}^{'}$ more reasonable, we calculate the increased module density $Δ D^{w d}$ as follows:

\begin{matrix} Δ D^{w d} = \frac{W_{out} - W_{in} + 3 β L_{in} + 3 L_{out} - β d_{in} - d_{out}}{2 n}, \end{matrix}

(14)

where n represents the number of nodes of the network. Formula 14 gives the evaluation rule, which is also the key of the algorithm in this paper. Most of the algorithms are based on greedy strategy or approximation algorithm to improve the efficiency of calculation.

In our daily life, we can easily find that if a person is my friend, he or she will most probably be my friend's friend. So we prefer to set the node i to be a member of the original community. Thus, we put forward the principle combining with heuristic greedy strategy: for any node i in the network, we prefer to put it into community which contains the most neighbors of node i. It has two key points: greedy selection property and optimal substructure.

Our algorithm deals with directed and weighted networks. However, for a real network, it is almost impossible for the degree of all nodes to be even. Then, the structure of the network is an undirected and unlooped graph with a corresponding matroid. The greedy strategy is one of the situations of the matroid, so the idea based on module density with greedy strategy is reasonable.

There are mainly 4 kinds of reasons that will cause the dynamic variety of real networks. We can conclude them as follows.

(1) Adding a Node $(V + v)$ . When a new node is joined in the network, it is likely to connect other edges or not.

(2) Deleting a Node $(V - v)$ . When a node is deleted, then all of the edges connected with this node are also removed.

(3) Adding an Edge $(E + e)$ . Connect two nodes in the network to produce a new edge.

(4) Deleting an Edge $(E - e)$ . Remove an edge between two nodes.

A dynamic community changes its structure in accordance with the above four situations, and in most cases the first and the third situations often occurred, while the second and the fourth situations are very scarce.

3.1. Analysis of Community Structure Variations

The four changing situations of a dynamic community result in the structure of community evolving, too. In the network we confronted in real life, there are only six cases when a community structure changes.

(1) Community Generation. New nodes are added to a network dynamically, and these nodes may form a new community as shown in Figure 2(a).

Figure 2

The six changes of a community.

(2) Community Extinction. Nodes of a community are removed gradually, and finally the community disappeared as shown in Figure 2(b).

(3) Community Expansion. The community will expand when many new nodes or edges are joined as shown in Figure 2(c).

(4) Community Decrease. The community will decrease with the removing of part of nodes or edges and it makes the community smaller and smaller as shown in Figure 2(d).

(5) Community Combination. Two communities are merged into one community dynamically as shown in Figure 2(e).

(6) Community Split. A community is split into two or more communities dynamically as shown in Figure 2(f).

It can be easily proved that the changing results are among the above six situations.

3.2. Algorithm Description

We analyze the four possible dynamic behaviors and design the algorithm as follows.

(1) Adding a Node. When the new node has no connection with any other nodes, it will be a new independent community. If the new node has connection with other nodes, we put the node into the community which can get the highest gain in module density. The pseudocode of the algorithm is designed as shown in Algorithm 1 ( $c^{t}$ denotes the community structure at time t).

Algorithm 1: AddNode.

Input: the structure of a community at time t, and a new node v awaiting to be added

Output: the updated structure at time $t + 1$

// $d_{v}$ is the degree of node v.

If $(d_{v} = = 0)$ then

Update $c^{t + 1} = c^{t} \cup {v}$

Else

For existing communities, put v into each of them

Calculate $Δ D^{w d}$

End For

c = get the maximal $Δ D^{w d}$ when node v joins in the community

Update $c^{t + 1} = (c^{t} ∖ c) \cup (c \cup {v})$

(2) Deleting a Node. If the degree of the removed node is 1, just delete the edge connected to the node; otherwise, delete all the edges connected to the node and reassign the neighbor nodes of the deleted node to the community which can achieve a higher module density. If the connected edges are all within the community, just keep the neighbor node in the original community. The pseudocode of deleting a node is given as shown in Algorithm 2.

Algorithm 2: DeleteNode.

Input: the structure of a community at time t, and the removed node v

Output: the updated structure at time $t + 1$

$k = 1$ ;

If (node v's degree is 1)

$c^{t + 1} = c^{t} ∖ c \cup (c ∖ v) \cup {v}$

Else

While (Node v's neighbor is not 0) do

$S =$ {nodes who doesn't have connections with nodes outside $c}$

End while

End if

Put the nodes in S into the considered best community which has the maximal $Δ D^{w d}$

Update $c^{t + 1}$

(3) Adding an Edge. When the new node i and node j belong to the same community ( $c_{i} = c_{j}$ ), the new edge from i to j can enhance the value of function D according to the definition of module density. So the inside edge will not break up the original community. However, if the two nodes i and j are not in the same community ( $c_{i} \neq c_{j}$ ), then the new edge $e_{i j}$ is an outside edge; it will decrease the module density of the community comparatively. Thus, the new edge may not change the structure of the community or make one node in $c_{i}$ or $c_{j}$ break away from the original community and join another community so as to obtain higher module density.

When a new edge $e_{i j}$ joined network, we assume $X = c_{i}$ , $Y = c_{j}$ to judge if the new edge will increase the module density for community Y. The algorithm is designed as shown in Algorithm 3, where $Δ D_{i x y}^{w d}$ and $Δ D_{j x y}^{w d}$ are the gain of module density when putting nodes i and j into X and Y.

Algorithm 3: AddEdge.

Input: the structure of community at time t, and the new edge $e_{i j}$ .

Output: the updated structure at time $t + 1$

If (node i and node j are new nodes) then

Update $c^{t + 1} = c^{t} \cup {i, j}$

Else if ( $c_{i} \neq c_{j}$ ) then

If ( $Δ D_{i x y}^{w d} < 0 & Δ D_{j x y}^{w d} < 0$ )

Update $c^{t + 1} = c^{t}$

Else

Node v = $Δ D_{i x y}^{w d} > Δ D_{j x y}^{w d} ? i : j$

Move v to the new community

For all neighbors of v

Put it into the best community which has the maximal $Δ D^{w d}$

End for

Update $c^{t + 1}$

End if

(4) Deleting an Edge. When the nodes i and j of an edge come from different communities $(c_{i} \neq c_{j})$ , removing the edge weakens the linkage of intercommunity. Removing an outside edge makes a higher module density, so it will keep the original community structure.

When the two nodes (i and j) of the removed edge belong to the same community $(c_{i} = c_{j})$ , if the degree of node i or node j is 1, then one of the two nodes will be isolated node after removing the edge. The node will separate from the original community, and the new separate community will contain only one node. If the node degree of both i and j is 1, then two new isolated communities emerged.

When the two nodes (i and j) of the removed edge come from different communities $(c_{i} \neq c_{j})$ and the node degree is different from above, the removing of the outside edge $e_{i j}$ results in a lower module density. Then the original structure either remains unchanged or divides into two communities.

Thus, we get the idea from GN algorithm [15], which regards the community $(c_{i} \neq c_{j})$ as a small network, and then calculate betweenness value for each edge. If the intermediate value of the removed edge $e_{i j}$ is comparatively large, we will redivide the community when removing the edge; otherwise, just keep the original community structure unchanged. The algorithm is designed as shown in Algorithm 4.

Algorithm 4: DeleteEdge.

Input: the structure of community at time t and the removed edge $e_{i j}$

Output: the updated structure at time $t + 1$

If ( $e_{i j}$ is a single edge) then

$c^{t + 1} = (c^{t} ∖ {i, j}) \cup {i} \cup {j}$

Else if (the degree of either i or j is 1) then

$c^{t + 1} = (c^{t} ∖ c (j)) \cup {i} \cup {c (i) ∖ i}$

Else if (node i and node j don't belong to the same community) then

$c^{t + 1} = c^{t}$

Else

Int z = Get the betweenness of $e_{i j}$

If (z is the top n biggest betweenness of the network)

Put other nodes into the considered best community which holds the maximal $Δ D^{w d}$

End if

Update $c^{t + 1}$

Then we analyze the time complexity of every step in the algorithm.

(1) Initialization. Initialize N nodes and at most N communities, which takes the time complexity of $O (N)$ .

(2) Iteration. Iterations take place for k times, and for each iteration, the maximal $L (i, c_{i})$ are sought for all of the N nodes. For node i, the $d (i)$ neighbors are assigned to N communities, so there are at most $\sum_{i = 1}^{N} d (i)$ times satisfying $L (i, c_{i}) \neq 0$ for each iteration. So it takes time complexity of $O (k \sum_{i = 1}^{N} d (i))$ for searching maximal $L (i, c_{i})$ . It must be estimated for each iteration to make sure whether the constraint condition is satisfied or not for all of the nodes, which takes time complexity of $O (k N)$ .

(3) Modification. Algorithm 1 to Algorithm 3 (described in Section 3) are simple linear calculation on given variables, which takes time complexity $O (k N)$ ; linear calculation on neighbors of every variable brings complexity $O (k \sum_{i = 1}^{N} d (i))$ .

The total time complexity is $O (N + k \sum_{i = 1}^{N} d (i) + k N + k N + k \sum_{i = 1}^{N} d (i))$ , which is simplified as $O (k N + k \sum_{i = 1}^{N} d (i))$ . Assume that the edge number of the network is E; then $\sum_{i = 1}^{N} d (i) = 2 E$ , so the final time complexity is $O (k N + k E)$ .

From the reckoning above, we can get that the time complexity of the algorithm is $O (k N + k E)$ , N denotes the number of nodes, E represents edges in the network, and k is the number of iterations. Consequently, the value of k is vital in running the algorithm. From the time complexity proved above, we can conclude that the consuming time of the algorithm will decrease with the decline of k value. So, the smaller the k value becomes the faster the algorithm runs. According to the heuristic principles of basic property in community structure, the algorithm will need to detect a community from the node with as more neighbor nodes as possible. Obviously, the more nodes a community contains, the more suitable it is for the algorithm to detect, which enables the main community to become greater during the dynamic community evolution. It also conforms to the objective rule of “the rich get richer.”

In reality, the iteration time will be acceptable for most networks, so the algorithm will be efficient. The more obvious the community structure especially will be, the less the iteration time will be needed and the more efficient the algorithm will be. However, if the community structure is not clear, the community detection algorithm will make no sense all the same.

4. Experiment and Analysis

In this section, we carry out the experiments by using Sina Microblog dataset with 5,000 users. And we establish four networks with 100, 500, 1000, and 5000 nodes, respectively. To show the results conveniently, we choose some representative nodes to build a small network. Then we compare the efforts with other algorithms.

Firstly, to confirm the value of β, we do the experiment with different network scale (dataset size). When the experiments end, we choose the value of β when the community obtains the maximum value of module density. The results are shown in Table 3.

Table 3

Relation between β and module density.

Dataset size	β
Dataset size	0.40	0.45	0.50	0.55	0.60	0.65	0.70	0.75	0.80
$S_{100}$	0.399	0.420	0.442	0.459	0.467	0.471	0.456	0.433	0.384
$S_{500}$	0.435	0.473	0.494	0.509	0.522	0.519	0.498	0.492	0.478
$S_{1000}$	0.511	0.537	0.554	0.568	0.577	0.585	0.575	0.560	0.542
$S_{5000}$	0.553	0.592	0.627	0.658	0.683	0.697	0.680	0.657	0.621

From Table 3, we can conclude that, in most cases when the value of β is 0.65, the module density will reach the maximal value. Thus, we choose the approximate value of $β = 0.65$ for the remaining experiments.

Comparing with extremal optimization algorithm [25] and improved CNM algorithm [26], we do experiments to get the time overhead and the module density. The comparison results are listed in Tables 4 and 5.

Table 4

Time overhead comparison of different algorithms.

Algorithms	Dataset size
Algorithms	$S_{100}$	$S_{500}$	$S_{1000}$	$S_{5000}$
Our algorithm	0.285	1.962	7.450	105.317
Extremal optimization	0.317	2.231	7.983	132.250
Improved CNM algorithm	0.279	2.195	7.964	125.725

Table 5

Modularity comparison of different algorithms.

Algorithms	Dataset size
Algorithms	$S_{100}$	$S_{500}$	$S_{1000}$	$S_{5000}$
Our algorithm	0.369	0.433	0.527	0.573
Extremal optimization	0.362	0.405	0.485	0.510
Improved CNM algorithm	0.347	0.398	0.451	0.494

From Tables 4 and 5, we can conclude that the algorithm in this paper outperforms in time consuming compared with extreme optimization and improved CNM algorithm as an increase in network scale [26].

To demonstrate the high quality of the divided results, we can evaluate the module density of the algorithms. Figure 3 shows the changing tendency of module density when network scale increases, which discloses that the larger the network scale, the bigger the module. Module density reflects the divisive quality of network, so the proposed algorithm is suitable for large scale network.

Figure 3

Changing tendency of module density.

The above experiments have presented the results in static networks. So we randomly add four behaviors of dynamic network variations (described in Section 3) to compare with the other algorithms. The adjustment of the community is based on increments of the network and it is locally conducted, even in the most complex situations; the time expenses of the algorithm can be neglected, and the results are shown in Figure 4.

Figure 4

Changing tendency of modularity.

We can conclude that during the process of changes, all the module density values of the three algorithms have a bit of fluctuation but just in a small range relatively (from 0.26 to 0.37), and the algorithm in this paper always maintains the highest module density value after a certain time (time = 4.5 s) in dynamic network as shown in Figure 4.

In brief, the algorithm in this paper can achieve better results compared with the extremal optimization and improved CNM algorithm.

5. Conclusions

A weighted approach based on user interaction behavior and module density based communities detection algorithm for online social networks are provided in this paper, which are suitable for dynamic social network modeling and community detecting, considering the characteristics of the social network and the shortage of the most current community partition algorithm. According to the real-time and dynamic characteristics of the social network, we first weighted the network edge based on the user's behaviors and then combined the following methods to design a new community partition algorithm: the strategy of greedy heuristic rules improves module density function D in terms of the characteristics of the weighted and directed network, which is employed to derive the standard measure function for estimating community detection and dynamic rules in real social network. The algorithm can obtain very high accuracy and low time complexity without any a priori condition. Moreover, the complexity only depends on the number of iterations, namely, the clarity of the community structure of network, regardless of the scale of the network, so it is suitable for dynamic and large scale network analysis.

Footnotes

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgment

This work is supported by the Central Universities Fundamental Research Funds of China under Grant no. N120317003.

References

Luo

Chen

Building association link network for semantic link on web resources

IEEE Transactions on Automation Science and Engineering 2011 8 3 482 494

10.1109/TASE.2010.2094608

2-s2.0-79960112691

Wei

Luo

Liu

Mei

Chen

Knowle: a semantic link network based system for organizing large scale online news events

Future Generation Computer Systems 2015 43-44 40 50

10.1016/j.future.2014.04.002

Luo

Wang

Incremental building association link network

Computer Systems Science and Engineering 2011 26 3 153 162

2-s2.0-80051710371

Fortunato

Community detection in graphs

Physics Reports 2010 486 3–5 75 174

10.1016/j.physrep.2009.11.002

MR2580414

2-s2.0-74049087026

Kernighan

B. W.

Lin

An efficient heuristic procedure for partitioning graphs

Bell System Technical Journal 1970 49 2 291 307

10.1002/j.1538-7305.1970.tb01770.x

Williams

R. J.

Martinez

N. D.

Simple rules yield complex food webs

Nature 2000 404 6774 180 183

Malliaros

F. D.

Vazirgiannis

Clustering and community detection in directed networks: a survey

Physics Reports 2013 533 4 95 142

10.1016/j.physrep.2013.08.002

MR3132042

2-s2.0-84888062097

Newman

M. E. J.

Fast algorithm for detecting community structure in networks

Physical Review E 2004 69 6 5

066133

10.1103/physreve.69.066133

Albert

Barabási

A.-L.

Statistical mechanics of complex networks

Reviews of Modern Physics 2002 74 1 47 97

10.1103/revmodphys.74.47

MR1895096

2-s2.0-0036013593

10.

Girvan

Newman

M. E. J.

Community structure in social and biological networks

Proceedings of the National Academy of Sciences of the United States of America 2002 99 12 7821 7826

10.1073/pnas.122653799

MR1908073

2-s2.0-0037062448

11.

Newman

M. E. J.

Analysis of weighted networks

Physical Review E—Statistical, Nonlinear, and Soft Matter Physics 2004 70 5 9

10.1103/physreve.70.056131

2-s2.0-41349096036

12.

Hopcroft

Khan

Kulis

Selman

Tracking evolving communities in large linked networks

Proceedings of the National Academy of Sciences of the United States of America 2004 101 1 5249 5253

10.1073/pnas.0307750100

2-s2.0-1842688009

13.

Chakrabarti

Kumar

Tomkins

Evolutionary clustering

Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

2006

New York, USA

ACM

554 560

10.1145/1150402.1150467

14.

Shan

Jiang

S.-X.

Zhang

Gao

J.-Z.

IC: Incremental algorithm for community identification in dynamic social networks

Journal of Software 2009 20 supplement 184 192

2-s2.0-77951261729

15.

Ning

Chi

Gong

Huang

Incremental spectral clustering with application to monitoring of evolving blog communities

SIAM Internation Conference on Data Mining

2007

261 272

16.

Chen

Wang

Wei

A new multiobjective evolutionary algorithm for community detection in dynamic complex networks

Mathematical Problems in Engineering 2013 2013 7

161670

MR3062935

10.1155/2013/161670

17.

G.-H.

Huang

J.-F.

MOfinder: a novel algorithm for detecting overlapping modules from protein-protein interaction network

Journal of Biomedicine and Biotechnology 2012 2012 10

103702

10.1155/2012/103702

2-s2.0-84863287368

18.

Newman

M. E. J.

Modularity and community structure in networks

Proceedings of the National Academy of Sciences of the United States of America 2006 103 23 8577 8582

10.1073/pnas.0601602103

2-s2.0-33745012299

19.

Barabási

A. L.

Bonabeau

Scale-free networks

Scientific American 2003 288 1 60 69

20.

Guo

Comprehensive Evaluation Theory and Method 2002

Beijing, China

Press of Science and Technology

21.

Facebook https://www.facebook.com/

22.

Sinamicroblog, http://us.weibo.com/gb

23.

Fortunato

Barthélemy

Resolution limit in community detection

Proceedings of the National Academy of Sciences of the United States of America 2007 104 1 36 41

10.1073/pnas.0605965104

24.

Zhang

Wang

R.-S.

Zhang

X.-S.

Chen

Quantitative function for community detection

Physical Review E: Statistical, Nonlinear, and Soft Matter Physics 2008 77 3

036109

10.1103/physreve.77.036109

2-s2.0-40949161791

25.

Duch

Arenas

Community detection in complex networks using extremal optimization

Physical Review E—Statistical, Nonlinear, and Soft Matter Physics 2005 72 2

027104

10.1103/physreve.72.027104

2-s2.0-27244441304

26.

Wakita

Tsurumi

Finding community structure in mega-scale social networks

Proceedings of the 16th International Conference on World Wide

2007

New York, NY, USA

ACM

1275 1276

10.1145/1242572.1242805

A Community Finding Method for Weighted Dynamic Online Social Network Based on User Behavior

Abstract

1. Introduction

2. Social Network Weighted Method Based on User Interaction Behavior Research

3. Community Partition Algorithm for Dynamic Directed Weighted Networks

3.1. Analysis of Community Structure Variations

3.2. Algorithm Description

Algorithm 1: AddNode.

Algorithm 2: DeleteNode.

Algorithm 3: AddEdge.

Algorithm 4: DeleteEdge.

4. Experiment and Analysis

5. Conclusions

Footnotes

Conflict of Interests

Acknowledgment

References