Identifying Missing and Spurious Interactions in Directed Networks

Abstract

Recent years, the studies of link prediction have been overwhelmingly emphasizing on undirected networks. Compared with it, how to identify missing and spurious interactions in directed networks has received less attention and still is not well understood. In this paper, we make use of classical link prediction indices for undirected networks, adapt them to directed version which could predict both the existence and direction of an arc between two nodes, and investigate their prediction ability on six real-world directed networks. Experimental results demonstrate that those modified indices perform quite well in directed networks. Compared with bifan predictor, some of them can provide more accurate predictions.

1. Introduction

Network is an effective and efficient tool to describe real-world complex systems [1, 2], such as social, biological, traffic, and information systems, where nodes represent individuals, proteins, airports, web pages, and so forth and links denote the relations and interactions between them. The study of complex networks is an important growing field that attracts lots of attention from different branches of science. While making great efforts to understand the structural features and evolutionary mechanism of networks, scientists gradually realize that the inaccuracy and incompleteness of data sets is a significant obstacle to the research [3, 4]. To address this issue, link prediction algorithms have been adopted to extract the missing information, identify spurious interactions, and reconstruct network.

The problem of link prediction aims at estimating the likelihood of the existence of a link between two nodes in a given network based on the observed links [5], and it has a wide range of applications in the real-world. For example, in online social networks, very likely but not-yet-existent links can be recommended as promising friendship, which can help users in finding new friends; in biological networks, compared to blindly checking all possible protein-protein interactions, accurate prediction of the most likely existent ones can dramatically reduce the experimental cost; in e-commerce, with the help of recommendation systems, sellers enhance their sales by watching customers' purchases and recommending other goods to them in which they may be interested [6]; in security domain, link prediction methods could be used to assist identifying groups of terrorist or criminals [7].

Since the link prediction problem is relevant for various domains, lots of algorithms have been proposed to solve it. The most widely used link prediction indices are the local similarity measures [8–12], for example, Common Neighbors, Jaccard index, Adamic-Adar index, and Resource Allocation index, which only require consideration of the local structure of the networks. They are favored for low complexity, low time consumption, and relatively high accuracy especially when the network is highly clustered [13].

Algorithm based on statistical inference is another branch in the study of link prediction, in which some organizing principles of the network structure, like hierarchical organizations or community structures, are often presupposed [7, 14, 15]. Using Bayes theorem, people can infer the underlying structure from observed network and take advantage of the knowledge to reliably identify both missing and spurious interactions. Usually, the performance of this type of method is more accurate and robust, but it has an obvious drawback—the high computational complexity makes it infeasible for large-scale networks.

All the algorithms mentioned above are designed for undirected networks. However, in the real-world asymmetric interactions are widespread such as worldwide web, food web, neural network, email network, and citation network. Unfortunately, there has been little research focusing on how to identify missing and spurious interactions in directed networks. Until very recently, a new mechanism for the local organization of directed networks, potential theory, has been proposed [16]. Combining the potential theory with the clustering and homophily mechanisms, bifan structure which consists of 4 nodes and 4 directed links is deduced to be the most favored local motif structure and could be used directly as a well-performed missing link predictor in directed networks.

In this paper, we focus on the link prediction problem in directed networks. Our contribution is twofold: (1) Instead of proposing a brand-new directed link predictor, we make the best of classical prediction indices of undirected networks. We extend the most representative measures to directed version, to meet the requirement of predicting both the existence and direction of an arc between two nodes. (2) To investigate the performance of these modified indices, we design simulation experiments on five real biological and technological directed networks, the results of which vividly demonstrate their prediction ability.

The remainder of this paper is organized as follows. In Section 2, we describe the link prediction problem in directed networks and the standard metric for performance evaluation. Then, we present how the adapted indices work in asymmetric networks and access their prediction ability in Sections 3 and 4, respectively. Finally, the conclusion is drawn in Section 5.

2. Problem Description and Evaluation Metric

Given a directed network $G (V, E)$ , where V and E are sets of nodes and directed links, respectively, multiple links and self-connections are not allowed. The fundamental task of a link prediction algorithm is to give a rank of all nonobserved links in the set $U ∖ E$ , where U is the universal set containing all $| V | (| V | - 1)$ possible directed links. The mainstream method is to assign each nonobserved link a score and the one with higher score ranks ahead. The top ranked links are regarded as the most likely missing (or future) interactions.

To evaluate the algorithmic performance, the observed links E is divided into two parts: the training set $E^{T}$ is treated as known information, while the probe set $E^{P}$ is used for testing and no information therein is allowed to be used for prediction. Clearly, $E = E^{T} \cup E^{P}$ and $E^{T} \cap E^{P} = \emptyset$ .

We adopt metric AUC [17] (area under the receiver operating characteristic curve) to quantify the prediction accuracy. It evaluates the predictor performance according to the whole list and can be interpreted as the probability that a randomly chosen missing link (i.e., a link in $E^{P}$ ) is given a higher score than a randomly chosen nonexistent link (i.e., a link in $U ∖ E$ ). Among n independent comparisons, if there are $n^{'}$ times the missing link having a higher score and $n^{''}$ times they are the same; the AUC value is

\begin{matrix} A U C = \frac{n^{'} + 0.5 n^{''}}{n} . \end{matrix}

(1)

Similarly, the link prediction algorithm can also be used to identify spurious interactions. In this case, the task is changed to score all the observed links in E and rank them in an ascending order; then the top ranked links are suspected as the most likely spurious interactions. To test the algorithmic performance, $E^{T^{'}} = E \cup E^{P^{'}}$ is treated as the “hypothetical observed” links, where $E^{P^{'}}$ is made up of some randomly selected nonexistent links from $U ∖ E$ . Now the AUC represents the probability that a randomly chosen spurious link (i.e., a link in $E^{P^{'}}$ ) is given a lower score than a randomly chosen real link (i.e., a link in E).

In a word, for both predicting missing links and identifying spurious links cases, the higher the AUC value is, the better the prediction algorithm works.

3. Directed Link Predictors

In this section, we first review some classical prediction indices for undirected network and present how to extend them to directed version.

(1) Preferential Attachment (PA). This is a basic link prediction algorithm corresponding to the preferential attachment phenomena in many real-world networks [18, 19]. Preferential attachment means that the more connected a node is, the more likely it is to receive new links. Nodes with higher degree have stronger ability to grab links added to the network. So PA index defines that the probability that a link will connect i and j is proportional to the number of their neighbors; that is,

\begin{matrix} S_{i j}^{P A} = k_{i} \times k_{j} = |Γ (i)| \times |Γ (j)|, \end{matrix}

(2)

where

Γ (i) = {j | (i, j) \in E \lor (j, i) \in E}

is the set of node i's neighbors and

k_{i} = | Γ (i) |

denotes the degree of it.

In directed networks, taking the link direction into consideration, $Γ_{o u t} (i) = {j | (i, j) \in E}$ is regarded as the set of outgoing neighbors, while $Γ_{i n} (i) = {j | (j, i) \in E}$ is the group of incoming neighbors. And $k_{i}^{o u t} = |Γ_{o u t} (i)|$ , $k_{i}^{i n} = |Γ_{i n} (i)|$ represents the out-degree and in-degree of node i, respectively. Usually, $k_{i}^{o u t}$ and $k_{i}^{i n}$ are not equal, so we can take advantage of this asymmetry to extend PA index. For an arc $i \to j$ in directed networks, the PA score could be defined as

\begin{matrix} S_{i \to j}^{{P A}^{*}} = k_{i}^{o u t} \times k_{j}^{i n} = |Γ_{o u t} (i)| \times |Γ_{i n} (j)| . \end{matrix}

(3)

Actually, the meaning of (3) is quite straightforward: from the perspective of information dissemination, the more outgoing neighbors of i imply the more channels to get the message out; the more incoming neighbors j attracts, the stronger its ability to gather information, so it is more likely to build a connection from i to j.

(2) Common Neighbors (CN). The common neighbors refer to the nodes that are connected with both i and j. This concept is very important that nearly all the local similarity measures are based on it.

Triadic closure principle in social network analysis states that if two people A and B have a common friend C, they are very likely to be friends too; thus these three nodes form a closed triangle ABC. For undirected networks, it is natural to assume that the more common neighbors two nodes share, the more similar they are and the more likely to build a connection between themselves. So the CN index is defined as

\begin{matrix} S_{i j}^{C N} = |Γ (i) \cap Γ (j)| . \end{matrix}

(4)

It is not hard to find that

S_{i j}^{C N} = (A^{2})_{i j}

, where

A^{2} = A \cdot A

. A is the adjacency matrix, the element of which is defined as

\begin{matrix} A_{i j} = \{\begin{cases} 1 & i f i i s c o n n e c t e d w i t h j \\ 0 & o t h e r w i s e, \end{cases} \end{matrix}

(5)

and obviously

A_{i j} = A_{j i}

However, in directed networks, adjacency matrix $\tilde{A}$ becomes asymmetric; that is,

\begin{matrix} {\tilde{A}}_{i j} = \{\begin{cases} 1 & i f t h e r e i s a d i r e c t e d a r c i \to j \\ 0 & o t h e r w i s e . \end{cases} \end{matrix}

(6)

Now the meaning of

({\tilde{A}}^{2})_{i j}

changes, which equals the number of different two-step paths from i to j; in other words,

({\tilde{A}}^{2})_{i j}

measures how many “transit nodes” in the network that could forward the message got from i to j. The more numbers of such “transit nodes,” the more likely to build a directed arc

i \to j

. Thus the extension of CN index in directed networks could be naturally defined as

\begin{matrix} S_{i \to j}^{C N^{*}} = {({\tilde{A}}^{2})}_{i j} = |Γ_{o u t} (i) \cap Γ_{i n} (j)| . \end{matrix}

(7)

As shown in Figure 1, from the analysis of geometry, CN indices could be interpreted as the number of triadic closure in undirected networks, while in directed networks this index measures the formation of feedforward structure.

Figure 1

Geometric interpretation of CN index in different networks. (a) Triadic closure; (b) feedforward structure.

In fact, (7) is not the only way to extend CN index. Other alternative indicators include the following. (i)

Co-citation index: $| Γ_{o u t} (i) \cap Γ_{o u t} (j) |$ .

(ii)

Bibliographic coupling index: $| Γ_{i n} (i) \cap Γ_{i n} (j) |$ .

(iii)

Feedback structure: $| Γ_{i n} (i) \cap Γ_{o u t} (j) |$ .

From Figure 2 we can see that co-citation index and bibliographic coupling index cannot present different judgement for $i \to j$ and $j \to i$ . The existence possibility for these two arcs will always be the same, so co-citation index and bibliographic coupling index are not suitable for link prediction in directed networks. And we choose feedforward structure instead of feedback structure as the indicator, because the empirical research demonstrates that the former one is more popular in real-world networks. Milo et al. [20] studied the network motifs in networks from biochemistry, neurobiology, ecology, and engineering. They found that feedforward structure appears in all the observed network.

Figure 2

Alternative indicators for CN index.

(3) Stochastic Block Model (BM) [14]. Stochastic block model is one of the most general network models, where nodes are partitioned into groups and the probability that two nodes are connected depends only on the groups to which they belong.

In undirected networks, given a block model $M = (P, Q)$ , in which P is the partition of nodes into groups and the matrix $Q = (Q_{α β})$ describes the probabilities of the linkage between groups, then the likelihood of the observed network is

\begin{matrix} p (A | P, Q) = \prod_{α \leq β} ‍ Q_{α β}^{l_{α β}} {(1 - Q_{α β})}^{γ_{α β} - l_{α β}}, \end{matrix}

(8)

where A is the adjacency matrix,

l_{α β}

is the number of links between nodes in groups α and β, and

γ_{α β}

is the maximum number of such links. Using Bayes theorem, the reliability of an individual link is

\begin{array}{l} S_{i j}^{BM} = p (A_{i j} = 1 | A) \\ = \frac{1}{Z} \sum_{p \in P} \int_{{[0, 1]}^{G}} p (A_{i j} = 1 | P, Q) p \\ \times (A | P, Q) p (P, Q) d Q, \end{array}

(9)

where

P

is the space of all partitions, G is the number of distinct group pairs, and Z is a normalizing constant (

Z = \sum_{p \in P} ‍ \int_{[0,1]^{G}} ‍ p (A | P, Q) p (P, Q) d Q

In directed networks, the interaction between two nodes are no longer reciprocal, which implies that the underlying matrix Q becomes asymmetric; that is, $Q_{α β}$ determines the probability of nodes in group α linking to the ones in group β, while $Q_{β α}$ limits the chance of links from group β to α. $Q_{α β}$ and $Q_{β α}$ are not necessarily the same. In addition, the links of different direction between two groups should be counted separately. Thus, the likelihood of the directed network structure is

\begin{matrix} p^{*} (\tilde{A} | P, Q) = \prod_{α, β} ‍ Q_{α β}^{l_{α β}^{*}} {(1 - Q_{α β})}^{γ_{α β}^{*} - l_{α β}^{*}} . \end{matrix}

(10)

Notice that

l_{α β}^{*}

now becomes the number of directed arcs from groups α to β, and

γ_{α β}^{*}

is the corresponding maximum number of such links. Consider

\begin{matrix} γ_{α β}^{*} = \{\begin{cases} |α| |α - 1| & α = β \\ |α| |β| & α \neq β \end{cases} \end{matrix}

(11)

Then by replacing

p (A | P, Q)

with

p^{*} (\tilde{A} | P, Q)

in (9), we can easily get the score for the arc

i \to j (Z^{*} = \sum_{p \in P} ‍ \int_{[0,1]^{G}} ‍ p^{*} (\tilde{A} | P, Q) p (P, Q) d Q)

\begin{array}{l} S_{i \to j}^{{BM}^{*}} = p ({\tilde{A}}_{i j} = 1 | \tilde{A}) \\ = \frac{1}{Z^{*}} \sum_{p \in P} \int_{{[0, 1]}^{G}} p ({\tilde{A}}_{i j} = 1 | P, Q) p^{*} \\ \times (\tilde{A} | P, Q) p (P, Q) d Q, \end{array}

(12)

Among all the link prediction measures for undirected networks, the above three indices are chosen to be adapted for the following reasons. Firstly, the design philosophies of them are different from each other, and they all have their own advantages. CN index follows the nature intuition and has wide range of applications in practice. PA index reflects the mechanism of rich-get-richer and is superior for its least information requirement, while BM method takes into account network community structures, rests on solid mathematical foundations, and returns excellent results. Secondly, those three indices are fairly representative, so their modification methods are enlightening and could be extended to other indices. For example, inspired by the modified CN index, other local similarity measures can also be extended in a similar way. Table 1 illustrates a few common local similarity indices and their corresponding extensions.

Table 1

Some local similarity measures and their corresponding extensions for directed networks.

Index	Undirected network	Directed network
Jaccard	$\frac{\| Γ (i) \cap Γ (j) \|}{\| Γ (i) \cup Γ (j) \|}$	$\frac{\| Γ_{out} (i) \cap Γ_{in} (j) \|}{\| Γ_{out} (i) \cup Γ_{in} (j) \|}$

Hub Promoted	$\frac{\| Γ (i) \cap Γ (j) \|}{\min {k (i), k (j)}}$	$\frac{\| Γ_{out} (i) \cap Γ_{in} (j) \|}{\min {k_{out} (i), k_{in} (j)}}$

Hub Depressed	$\frac{\| Γ (i) \cap Γ (j) \|}{\max {k (i), k (j)}}$	$\frac{\| Γ_{out} (i) \cap Γ_{in} (j) \|}{\max {k_{out} (i), k_{in} (j)}}$

Salton	$\frac{\| Γ (i) \cap Γ (j) \|}{\sqrt{k (i) \times k (j)}}$	$\frac{\| Γ_{out} (i) \cap Γ_{in} (j) \|}{\sqrt{k_{out} (i) \times k_{in} (j)}}$

Leicht-Holme-Newman	$\frac{\| Γ (i) \cap Γ (j) \|}{k (i) \times k (j)}$	$\frac{\| Γ_{out} (i) \cap Γ_{in} (j) \|}{k_{out} (i) \times k_{in} (j)}$

Sørensen	$\frac{2 \| Γ (i) \cap Γ (j) \|}{k (i) + k (j)}$	$\frac{2 \| Γ_{out} (i) \cap Γ_{in} (j) \|}{k_{out} (i) + k_{in} (j)}$

Adamic-Adar	$\sum_{z \in Γ (i) \cap Γ (j)} \frac{1}{\log k (z)}$	$\sum_{z \in Γ_{out} (i) \cap Γ_{in} (j)} \frac{1}{\log k_{out} (z)}$

Resource Allocation	$\sum_{z \in Γ (i) \cap Γ (j)} \frac{1}{k (z)}$	$\sum_{z \in Γ_{out} (i) \cap Γ_{in} (j)} \frac{1}{k_{out} (z)}$

To analyze the performance of those modified measures, next we introduce bifan predictor as a comparison index. Bifan structure is found to be quite widespread in real-world [20] and proven to be a well-performed predictor for directed networks.

(4) Bifan Predictor [16]. The potential theory is proposed as a microscopic organizing principle for directed networks, which assumes that each directed link corresponds to a decrease of a unit potential and subgraphs with definable potential values for all nodes are preferred. Combining the clustering and homophily mechanisms with potential theory, it is deduced that bifan subgraph is the most favored local structure in directed networks (see Figure 3).

Figure 3

Bifan structure consists of 4 nodes and 4 directed links. All the links are equivalent to each other and nodes are of two different potentials—the potentials of two black nodes are a unit higher than that of the grey ones.

This special structure could be directly used as a link predictor, with the assumption that a link that can generate more bifan subgraphs is more significant and is thus of a higher probability to appear. So the bifan index is determined by how many bifan subgraphs that link $i \to j$ could generate, which can be calculated by

\begin{matrix} S_{i \to j}^{B i f a n^{*}} = \vec{r_{i}} (\sum_{s}^{} {\vec{r_{k_{s}}}}^{T}), \end{matrix}

(13)

where

\vec{r_{i}} = (a_{i 1}, a_{i 2}, \dots, a_{i |V|})

is the ith row vector of the adjacency matrix

\tilde{A}

and

\vec{r_{k_{s}}}

are the ones that meet the requirement of

a_{k_{s} j} \neq 0

4. Result

4.1. Data Sets

Our experiments are carried out on five real-world directed networks drawn from biological and technological fields. Those networks are pretty representative and are often employed in simulation experiments to validate the effectiveness of the proposed scheme. (i) FoodWeb1 (FW1) [21]: a network of food web represents the predator-prey relations between 69 species living in Everglades Graminoids during wet season. (ii) FoodWeb2 (FW2) [22]: a network of food web consists of 97 species living in Mangrove Estuary during wet season. (iii) FoodWeb3 (FW3) [23]: a network of food web consists of 128 species living in Florida Bay during wet season. (iv) C. elegans (CE) [24]: it is the neural network of the nematode worm C. elegans, in which an edge joins two neurons if they are connected by either a synapse or a gap junction. (vi) Political blogs (PB) [25]: it is a directed networks of hyperlinks between weblogs on US political blogs.

For convenience, we eliminate all the loops and multiedges of the above networks. The basic topological features of these five networks are summarized in Table 2.

Table 2

The basic topological features of the five directed networks. $| V |$ and $| E |$ are the number of nodes and links. $k_{\max}^{in}$ and $k_{\max}^{out}$ are the maximum of in-degree and out-degree of all nodes, $〈 k 〉$ is the average degree of the network, $〈 d 〉$ is the average shortest distance between pair nodes, and C is the clustering coefficient of the directed networks.

Networks	$\| V \|$	$\| E \|$	$k_{\max}^{in}$	$k_{\max}^{out}$	$〈 k 〉$	C	$〈 d 〉$
FW1	69	911	63	44	13.275	0.309	2.168
FW2	97	1492	90	46	15.381	0.261	2.185
FW3	128	2106	110	63	16.453	0.177	2.412
CE	297	2345	134	39	7.896	0.174	3.992
PB	1222	19021	337	256	15.565	0.219	3.39

4.2. Experimental Results

In this section, we investigate the prediction ability of the indices presented in Section 3. Comparisons of algorithms' accuracy are displayed in Figures 4 and 5. The x-axis denotes fraction $f = | E^{P} | / | E |$ for missing interactions ( $f = | E^{P^{'}} | / | E |$ for spurious interactions), ranging from 0.05 to 0.95, and the interval is set to be 0.05. The dashed line indicates the baseline accuracy when the score of each link is got by pure chance. Each AUC value is obtained by averaging 100 independent realizations, except for the PB, of which the implementation times are 50.

Figure 4

Experimental results for extended local similarity indices.

Figure 5

Experimental results for different kinds of measures.

4.2.1. Comparison of Local Similarity Measures

Figure 4 illustrates the experimental results for 10 extended local similarity indices. It is worth noting that PA index, which is regarded as the worst predictor in undirected networks, surprisingly performs quite well in the experiments. It outperforms all the other local similarity measures in identifying both missing and spurious links. Hence the relatively good prediction ability together with the least information requirement makes PA index more competitive in directed networks.

Note that when applied in finding missing links, the performance of other nine local similarity measures based on common neighbors is nearly the same, especially in CE and PB networks. It is because all these extended indices are on the basis of feedforward structure, if and only if there are some “transit nodes” between i and j; the score of arc $i \to j$ is larger than 0. So when the average shortest distance $〈d〉$ is relatively large, the score of lots of arcs will be zero for all these measures, which leads to the little difference in their prediction ability.

4.2.2. Comparison of Different Kinds of Measures

In Figure 5, we choose PA and CN as representatives of local similarity measures and compare their performance with BM and bifan. It is obvious that when applied in finding missing links, BM outperforms CN and PA and works even better than bifan predictor in most networks. The advantage of BM to others is usually remarkable except for network PB, in which the performances of BM, bifan, and PA nearly keep pace with each other. This outcome demonstrates that community structure also plays an important role in directed networks. Compared with other three measures utilizing only the local structure information, BM index which takes advantage of the organizing principles of the whole network could get more accurate prediction results.

Figure 5 also shows that the prediction accuracy of all four indices decline with the increases of f, but the rate of decrease is varying. Compared with the linear decrease of CN index, the trend of AUC for others is slowly downward until it reaches a “turning point,” after which the remaining links in $E^{T}$ are not enough to infer the underlying structure of network, so the prediction accuracy drops sharply. For example, in all FoodWebs, the prediction accuracy falls dramatically after f reaches 0.75. In network PB, 0.65 could be regarded as the “turning point,” before which the AUC values for PA, BM, and bifan nearly remain unchanged, but after that the prediction accuracy drops quickly. Note that in all five networks the AUC of bifan is even worse than PA when the value of f exceeds some value. Bifan index is defined as the number of bifan subgraphs that link $i \to j$ could generate. With the increase of f, more and more links are removed from $E^{T}$ , and it is more and more difficult to build bifan structure, which leads to the decline of discrimination for $S_{i \to j}^{B i f a n^{*}}$ . While the increase of f does not affect the PA index that much, so the prediction accuracy of bifan is lower than PA when lots of the links are removed.

As is illustrated in Figure 5, in identifying spurious interactions, BM and bifan are the two best predictors which are followed by PA, while CN index still has the worst overall performance as before. Note that, for each index, different from the results in finding missing links, AUC values are relatively high and are basically not changed with the increases of f. However, high accuracy is not sufficient for spurious link detection algorithms. If just a few unexpected important links are incorrectly removed, the structural and dynamical properties of the network may change dramatically [26]. So when applied in identifying spurious links, the prediction indices should be evaluated meticulously; however this problem is beyond the scope of this paper.

5. Conclusion

This paper studied how to identify both missing and spurious interactions in directed networks. We have showed how to extend classical link prediction indices to directed version, making them able to predict both the existence and direction of an arc between two nodes. Simulation experiments on five real-world directed networks have vividly demonstrated the prediction ability of those modified measures.

The purpose of this paper is not to present a better directed link predictor, but to give an example of how to use the valuable knowledge of undirected networks to solve the problems in directed networks. For directed networks, the direction of links is a double-edged sword. It provides additional valuable information for link prediction but also leads to greater difficulty—how to determine the direction of the unknown links is a tough problem. The extended CN index made use of the asymmetric similarity to determine the link direction, while BM benefited from asymmetric likelihood of the linkage between groups. In brief, asymmetry is the key. However, prediction methods based solely on asymmetry are incomplete, and other natures of the directed networks should be taken into consideration. For example, in neural networks and some technological networks, information or resources are often collected by basal nodes, transmitted through directed links, and delivered to top nodes eventually. Such kind of macroscopic flow direction may help to determine the unknown link direction.

On the other hand, each prediction index has its own advantages and is suitable for different kind of networks. How to choose the most appropriate algorithm according to the features of the networks is also a problem worth study.

Footnotes

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgment

The work is supported by Basic Research Project of National University of Defense Technology and the Open Fund from HPCL no. 201403-01.

References

Lin

Duan

Zhao

Systems Science: Methodological Approaches 2013

Boca Raton, Fla, USA

CRC Press

Cheng

Znati

The complexity of channel scheduling in multi-radio multi-channel wireless networks

Proceedings of the IEEE INFOCOM

April 2009

Rio de Janeiro, Brazil

1512 1520

10.1109/INFCOM.2009.5062068

Schafer

J. L.

Graham

J. W.

Missing data: our view of the state of the art

Psychological Methods 2002 7 2 147 177

10.1037//1082-989x.7.2.147

2-s2.0-0036595423

Kossinets

Effects of missing data in social networks

Social Networks 2006 28 3 247 268

10.1016/j.socnet.2005.07.002

2-s2.0-33747485518

Lü

Zhou

Link prediction in complex networks: a survey

Physica A: Statistical Mechanics and Its Applications 2011 390 6 1150 1170

10.1016/j.physa.2010.11.027

2-s2.0-78751613300

Lü

Medo

Yeung

C. H.

Zhang

Y.-C.

Zhang

Z.-K.

Zhou

Recommender systems

Physics Reports 2012 519 1 1 49

10.1016/j.physrep.2012.02.006

2-s2.0-84866757975

Clauset

Moore

Newman

M. E. J.

Hierarchical structure and the prediction of missing links in networks

Nature 2008 453 7191 98 101

10.1038/nature06830

2-s2.0-43049151837

Liben-Nowell

Kleinberg

The link-prediction problem for social networks

Journal of the American Society for Information Science and Technology 2007 58 7 1019 1031

2-s2.0-34249751638

10.1002/asi.20591

Newman

M. E. J.

Clustering and preferential attachment in growing networks

Physical Review E 2001 64 2

025102

2-s2.0-0035420724

10.1103/PhysRevE.64.025102

10.

Jaccard

Etude comparative de la distribution orale dans une portion des Alpeset des Jura

Bulletin de la Société Vaudoise des Sciences Naturelles 1901 37 547 579

11.

Adamic

L. A.

Adar

Friends and neighbors on the Web

Social Networks 2003 25 3 211 230

10.1016/S0378-8733(03)00009-1

2-s2.0-10944272139

12.

Zhou

Lü

Zhang

Y.-C.

Predicting missing links via local information

European Physical Journal B 2009 71 4 623 630

10.1140/epjb/e2009-00335-8

2-s2.0-84870843485

13.

Feng

Zhao

J. C.

Link prediction in complex networks: a clustering perspective

European Physical Journal B 2012 85 1, article 3

10.1140/epjb/e2011-20207-x

2-s2.0-84863059751

14.

Guimerà

Sales-Pardo

Missing and spurious interactions and the reconstruction of complex networks

Proceedings of the National Academy of Sciences of the United States of America 2009 106 52 22073 22078

10.1073/pnas.0908366106

2-s2.0-76049100050

15.

Zhang

Wang

Zhao

Xie

Degree-corrected stochastic block models and reliability in networks

Physica A: Statistical Mechanics and its Applications 2014 393 553 559

10.1016/j.physa.2013.08.061

2-s2.0-84886592136

16.

Zhang

Q.-M.

Lü

Wang

W.-Q.

Zhu

Y.-X.

Zhou

Potential theory for directed networks

PLoS ONE 2013 8 2

e55437

10.1371/journal.pone.0055437

2-s2.0-84873667603

17.

Hanley

J. A.

McNeil

B. J.

The meaning and use of the area under a receiver operating characteristic (ROC) curve

Radiology 1982 143 1 29 36

10.1148/radiology.143.1.7063747

2-s2.0-0020083498

18.

Barabasi

A. L.

Albert

Emergence of scaling in random networks

Science 1999 286 5439 509 512

10.1126/science.286.5439.509

MR2091634

2-s2.0-0038483826

19.

Xie

Y.-B.

Zhou

Wang

B.-H.

Scale-free networks without growth

Physica A 2008 387 7 1683 1688

2-s2.0-37549059614

10.1016/j.physa.2007.11.005

20.

Milo

Shen-Orr

Itzkovitz

Kashtan

Chklovskii

Alon

Network motifs: simple building blocks of complex networks

Science 2002 298 5594 824 827

2-s2.0-0037174670

10.1126/science.298.5594.824

21.

Ulanowicz

R. E.

Heymans

J. J.

Egnotovich

M. S.

Network analysis of trophic dynamics in South Florida ecosystems, FY 99: the graminoid ecosystem

2000 TS-191-99

22.

Baird

Luczkovich

Christian

R. R.

Assessment of spatial and temporal variability in ecosystem attributes of the St Marks national wildlife refuge, Apalachee Bay, Florida

Estuarine, Coastal and Shelf Science 1998 47 3 329 349

10.1006/ecss.1998.0360

2-s2.0-0032168038

23.

Ulanowicz

R. E.

Bondavalli

Egnotovich

M. S.

Network analysis of trophic dynamics in South Florida ecosystem, FY 97

1998 CBL 98-123

The Florida Bay Ecosystem

24.

White

J. G.

Southgate

Thomson

J. N.

Brenner

The structure of the nervous system of the nematode Caenorhabditis elegans

Philosophical Transactions of the Royal Society B: Biological Sciences 1986 314 1165 1 340

10.1098/rstb.1986.0056

25.

Adamic

Glance

The political blogosphere and the 2004 US election

Proceedings of the Workshop on the Weblogging Ecosystem (WWW ‘05)

2005

26.

Zeng

Cimini

Removing spurious interactions in complex networks

Physical Review E 2012 85 3

036101

10.1103/physreve.85.036101

2-s2.0-84858182799