Study on similarity indices for link prediction in opportunistic networks

Abstract

Link prediction aims to estimate the existence of links between nodes, using information of network structures and node properties. According to the characteristics of node mobility, node intermittent contact, and high delay of opportunistic network, novel similarity indices are constructed based on CN, AA, and RA. The indices CN, AA, and RA do not consider the historic information of networks. Similarity indices, T_CN, T_AA, and T_RA, based on temporal characteristics are proposed. These take the historic information of network evolution into consideration. Using historic information of the evolution of opportunistic networks and 2-hop neighbor information of the nodes, similarity indices based on the temporal-spatial characteristics, O_CN, O_AA, and O_RA, are proposed. Based on the imote traces cambridge (ITC) and detected social network (DSN) datasets, the experimental results indicate that similarity indices O_CN, O_AA, and O_RA outperform CN, AA, and RA. Furthermore, index O_AA has superior performance.

Keywords

Opportunistic network link prediction similarity indices temporal-spatial characteristics local information

Introduction

Opportunistic network (ON)¹ is a new type of network that is self-organizing and does not require complete connectivity, as communication between nodes is generated by their movement. The data transmission in ON occurs using a “store-carry-forward” routing mechanism. ON has an advantage in many applications compared to the established fully connected networks. Because of the development and low cost, ON has been applied across many fields, such as vehicular ad hoc networks,² mobile data diversion,³ information sharing,⁴ and mobile computing.⁵

The dynamic characteristics of ON lead to a difficulty in the selection of routing. The point to solve the problem is to predict the link through capturing the change law of network topology so that routing algorithm is supported.

Link prediction is to predict the probability of connection between nodes at the next moment through node attributes, network topology, and network historical information. According to the characteristics of node mobility and intermittent connectivity, this article studies the similarity indices of link prediction. Section “Related work” analyzes the studied reported in the literature to date. Section “Problem description” identifies the problems associated with link prediction. Section “Improvement of similarity indices” constructs improved similarity indices. Section “Experiments and analysis” describes the experimental data and analysis. Section “Conclusion” makes conclusions.

Related work

The methods of link prediction for ON primarily fall into the following categories: predictions based on similarity indices, node moving models, mixed frameworks, matrix and tensor decompositions, and machine learning. The study of similarity indices can be divided into local information similarity indices and path similarity indices.

Based on local information similarity indices

The similarity indices based on local information calculate the similarity between nodes using the following local information, such as the degree of nodes and common neighbors. The method is suitable for large-scale network applications because of its lower computational complexity.

The simplest index is CN^6,7 which means common neighbor. If there are many common neighbors between nodes x and y, the nodes x and y are similar, and the CN index is defined in equation (1)

S_{xy}^{CN} = | Γ (x) \cap Γ (y) |

(1)

where $Γ (x)$ is the neighbor set of node x, and $Γ (y)$ is the neighbor set of node y. The CN index does not consider the influence of the node degree. On the basis of CN, there are some similarity indices which consider the influence of node degree, such as Jaccard’s Coefficient (JC) index,⁸ Salton index,⁹ and Leicht-Holme-Newman-I (LHN-I) index.¹⁰ These similarity indices are defined in equation (2)–(4), respectively

S_{xy}^{JC} = \frac{| Γ (x) \cap Γ (y) |}{| Γ (x) \cup Γ (y) |}

(2)

S_{xy}^{Salton} = \frac{| Γ (x) \cap Γ (y) |}{\sqrt{k_{x} k_{y}}}

(3)

S_{xy}^{LHN - I} = \frac{2 \times | Γ (x) \cap Γ (y) |}{k_{x} + k_{y}}

(4)

The Adamic/Adar (AA) index^11,12 and resource allocation (RA) index^13,14 take into account the degree of the common neighbors. The AA index considers that the contribution of the common neighbors with a small degree is bigger than the common neighbors with a large degree. According to the degree of the common neighbors, each node is given a weight, and AA is defined in equation (5)

S_{xy}^{AA} = \sum_{z \in Γ (x) \cap Γ (y)} \frac{1}{\log k_{z}}

(5)

The RA index is the same as the AA index, with the exception that the weight value of the RA index is equal to the reciprocal of the node degree, which is defined in equation (6)

S_{xy}^{RA} = \sum_{z \in Γ (x) \cap Γ (y)} \frac{1}{k_{z}}

(6)

The difference between the AA index and the RA index is the weight given to the common neighbor. If the network clustering coefficient is small, link prediction primarily depends on the existence of common neighbors between the nodes in the network, resulting in the prediction results of index AA and RA being similar. If the network clustering coefficient is big, and the heterogeneity is large, the two indices are significantly different because of the different methods of weighting. The literature^15,16 suggests that the RA index is better in depicting the weighted network and community network.

The literature¹⁷ analyzes the betweenness centrality of the nodes, in combination with the similarity index of CN. At the same time, considering the network topology features and local information of the nodes, and the optimization of the common neighbors weighting, it proposes an improved B-CN index to improve prediction accuracy. In the literature,¹⁸ based on the AA index, it combines the frequency of connection among nodes with the degree of the common neighbors as the similarity index, and the characteristics of network history structure using kernel regression are extracted, improving the prediction accuracy of the link. Another study reported in the literature¹⁹ is based on the AA index. This study proposes a hypothesis for network data flow regions and estimates the similarity among nodes using three stream-based methods. Three kinds of links have been proposed in the literature:²⁰ periodic and frequent connection, non-periodic but frequent connection, and non-frequent connection. The method of periodic pattern recognition mining, decision tree, and the AA index is used to predict the link. The prediction efficiency and accuracy of the link are improved by selecting the optimal threshold, jitter tolerance ratio, and the time slice length.

Based on path similarity indices

The similarity indices based on the path primarily fall into the following categories: local path (LP) index,²¹ Katz index,²² LHN-II index.¹⁰

The local path similarity index adds the path information with length 3 to the index CN, which is defined in equation (7)

S_{xy}^{LP} = A^{2} + α \cdot A^{3}

(7)

where $α$ is an adjust parameter and $A$ is an adjacency matrix of the network.

The Katz index considers the n-hop paths, and its computational complexity increases in sequence, which is defined in equation (8)

S_{xy}^{Katz} = \sum_{t = 1}^{\infty} α^{l} \cdot | {paths}_{x, y}^{〈 l 〉} |

(8)

where $α > 0$ and denotes an adjust parameter that controls the weight of paths, $| paths_{x, y}^{〈 l 〉} |$ indicates the set of paths with length $l (l = 1, 2, \dots, n)$ between nodes x and y. If the above coefficients converge, the parameter α is less than the reciprocal of the largest eigenvalue of the adjacency matrix. The Katz index can also be defined in equation (9)

S_{xy}^{Katz} = {(I - α \cdot A)}^{- 1} - I

(9)

where $I$ is an identity matrix. When the parameter $α$ is very small, the contribution of the high-level paths is very small, which makes the result of prediction based on the Katz index similar to the LP index.

A link prediction algorithm based on matrix and tensor decomposition for intermittently connected wireless network has been proposed in the literature.²³ It uses Katz as an index to predict the existence of the link in time slice $(T + 1) th$ . Experiments demonstrate the effectiveness of this method.

Based on the regular equivalence, the LHN-II index has been proposed in the literature.¹⁰ If the neighbors of node x are similar to node y, then nodes x and y are similar. That is, the similarity of the nodes is transitive. LHN-II index is defined as in equation (10)

S_{xy}^{LHN - II} = 2 M λ_{1} D^{- 1} {(I - \frac{ϕ}{λ_{1}} A)}^{- 1} D^{- 1}

(10)

where $D$ is the degree matrix, $ϕ$ is an adjust parameter and $0 < ϕ < 1$ , $λ_{1}$ is the largest eigenvalue of the adjacent matrix $A$ , and M is the total number of network edges.

Based on local information, similarity indices and path similarity indices show a good prediction performance for static networks or social networks with topology change slowly. However, these indices are not suitable for ON whose network topology changes over time. In this article, according to the characteristics of node mobility, node intermittent contact and high delay of ON, based on local information similarity indices, similarity indices based on temporal characteristics are proposed, which take the historic information of network evolution into account. Considering the 2-hop neighbor information of nodes, similarity indices based on spatial characteristics are calculated using the neighborhood expansion method.²⁴ Considering the temporal and spatial characteristics, similarity indices based on temporal-spatial characteristics are proposed, which can obtain better performance for link prediction in the ON.

Problem description

ON is a typical, dynamic network. The network structure changes over time and it is characterized by a sequence of graphs. To depict the dynamic characteristics of ON, the ON is defined as $G (V, E)$ , V is a set of all nodes in the network, $V = {v_{1}, v_{2}, v_{3}, \dots, v_{m}}$ , m is the number of nodes, E is a set of edges in the network, $E = {〈 v_{a}, v_{b} 〉, v_{a}, v_{b} \in E} (a = 1, 2, \dots, m, b = 1, 2, \dots, m, a \neq b)$ , $G = (G_{1}, G_{2}, \dots, G_{n})$ is an ordered atlas in the time slice T, $G_{i} = (V_{i}, E_{i}) (i = 1, 2, 3, \dots, n)$ is the topology graph of $G (V, E)$ at time i, $V_{i}$ is a set of vertices, and $E_{i}$ is a set of edges.

Link prediction of ON is to predict the network topology at time $T + 1$ , using the nodes and network topology information in the time slice T. Link prediction can be defined as follows: given a set of network snapshots $G = (G_{1}, G_{2}, \dots, G_{n})$ , the change law of network snapshots is captured using the link prediction method, obtaining the network topology at time $T + 1$ , as shown in Figure 1.

Figure 1.

ON link evolution.

In equation (11), $A$ is an adjacency matrix of $G (V, E)$ , which is defined below

A_{i, j} = {\begin{matrix} 1 〈 v_{i}, v_{j} 〉 \in E \\ 0 〈 v_{i}, v_{j} 〉 \notin E \end{matrix}

(11)

The similarity indices based on local information assume that the two frequent communication nodes have highly similarity of behavior in the network, which conform to the communication characteristics of ON.

Improvement of similarity indices

Based on the local information similarity indices, this article proposes improved similarity indices using the temporal, spatial, and temporal-spatial characteristics of ON.

The temporal characteristics of similarity indices

Indices CN, AA, and RA have good performance for link prediction with network topology change slowly. However, ON’s topology changes frequently and link status is time-varied. Therefore, similarity indices are improved by taking the historic information of network evolution into account.

As shown in Figure 2, the network structure at time $t_{3}$ is used to predict the link at the next moment, and then the possibility of generating a link between nodes S and D is small. However, if we combine the network topology of time $t_{1}$ , $t_{2}$ , and $t_{3}$ , the possibility of generating a link between nodes S and D becomes bigger. Therefore, the prediction accuracy of the link can be improved by using the historic information of multiple moments in the process of network evolution.

Figure 2.

Link change.

Based on the local information similarity indices CN, AA, and RA, improved similarity indices T_CN, T_AA, and T_RA are proposed, which take the following historic information of the network into account, such as the total length of contact time, the frequency of contact, and the most recent contact time in the evolution of network topology.

As shown in Figure 3, the length of T time slice $[t_{S}, t_{E}]$ is taken as an example, where $t_{S}$ is the beginning of the time slice, $t_{E}$ is the end of the time slice, $t_{0}$ is the beginning of contact, and $t_{1}$ is the end of contact.

Figure 3.

Temporal characteristic of link.

1. The total length of contact time

In equation (12), l denotes the total length of contact time, which is defined below

l = \sum_{i, j = 1}^{n} (t_{i} - t_{j})

(12)

where $i = 1, 2, 3, \dots, n$ , $j = 1, 2, 3, \dots, n$ . As shown in Figure 3, $l = (t_{1} - t_{0}) + (t_{3} - t_{2})$ , if $t_{0} < t_{s}$ , it indicates that the node pair is connected before the time slice T, and $t_{S}$ is taken as the criterion, then $t_{0} = t_{S}$ . Similarly, if $t_{1} > t_{E}$ , then $t_{1} = t_{E}$ .

2. The frequency of contact

In equation (13), f denotes the frequency of contact, which is defined below

f = sum (t_{i})

(13)

where $t_{i}$ is the contact time of node pair $(x, y)$ in the time slice T. As shown in Figure 3, the contact frequency of node pair $(x, y)$ is 2 $(t_{0} and t_{2})$ and $f = 2$ .

3. The most recent contact time

c is the most recent contact time. As shown in Figure 3, the contact frequency of node pair $(x, y)$ is 2; if the last time of contact is $t_{3}$ , then $c = t_{3}$ .

The above analysis suggests that the influence coefficient of the time parameters on the link is defined in equation (14)

HIC = I_{l} + I_{f} + I_{c}

(14)

As an example of node pair $(x, y)$ , $I_{l}$ is the coefficient of the total length of contact time, which is defined in equation (15)

I_{l} = \frac{l}{(t_{E} - t_{S})}

(15)

where $I_{f}$ is the coefficient of the frequency of contact, which is defined in equation (16)

I_{f} = \frac{f}{(k_{x} + k_{y})}

(16)

where $k_{x}$ is the degree of node x, and $k_{y}$ is the degree of node y. $I_{c}$ is the coefficient of the most recent contact time, which is defined in equation (17)

I_{c} = \frac{(c - t_{0})}{(t_{E} - t_{S})}

(17)

This article combines the influence coefficient of the time parameters on the link with the indices CN, AA, and RA. Based on temporal characteristics, improved similarity indices T_CN, T_AA, and T_RA are proposed, which are defined in equations (18)–(20)

S_{xy}^{T_CN} = | Γ (x) \cap Γ (y) | \cdot HI C_{xy}

(18)

S_{xy}^{T_AA} = \sum_{z \in Γ (x) \cap Γ (y)} \frac{HI C_{xy}}{\log k_{z}}

(19)

S_{xy}^{T_RA} = \sum_{z \in Γ (x) \cap Γ (y)} \frac{HI C_{xy}}{k_{z}}

(20)

ON has a complex topology which changes frequently over time. Improved similarity indices of T_CN, T_AA, and T_RA consider the similarities of network structure and network history information of the link, which predicts the links in the ON effectively.

The spatial characteristics of similarity indices

In the ON, node mobility and intermittent contact lead to the network sparse. In a specific time slice, only a few node pairs are connected, and the spatial structure information is insufficient, leading to low accuracy of link prediction. To address the problem, the method of the neighborhood expansion²⁴ has been proposed. The 2-hop neighbor information of the nodes can improve the indices of CN, AA, and RA.

The network is sparse, and its computational complexity increases as the number of neighbors increases. This article only considers the nodes with 1-hop neighbors and the nodes with 2-hop neighbors.

For arbitrarily node x, we define the 1-hop neighbors in equation (21)

Γ_{1} (x) = min (D_{zx}) = 1

(21)

where $D_{zx}$ is the distance between nodes z and x. The 2-hop neighbors can be defined in equation (22)

Γ_{2} (x) = min (D_{zx}) = 2

(22)

The parameters $θ, β, γ$ are taken as the weight of different order neighbors. The similarity indices E_CN, E_AA, and E_RA are proposed, which are defined in equation (23)–(25)

\begin{matrix} S_{xy}^{E_CN} = (1 - 2 θ) | Γ_{1} (x) \cap Γ_{1} (y) | \\ + θ | Γ_{1} (x) \cap Γ_{2} (y) | \\ + θ | Γ_{2} (x) \cap Γ_{1} (y) | \end{matrix}

(23)

\begin{matrix} S_{xy}^{E_AA} = (1 - 2 β) \sum_{z \in Γ_{1} (x) \cap Γ_{1} (y)} \frac{1}{\log k_{z}} \\ + β \sum_{z \in Γ_{1} (x) \cap Γ_{2} (y)} \frac{1}{\log k_{z}} \\ + β \sum_{z \in Γ_{2} (x) \cap Γ_{1} (y)} \frac{1}{\log k_{z}} \end{matrix}

(24)

\begin{matrix} S_{xy}^{E_RA} = (1 - 2 γ) \sum_{z \in Γ_{1} (x) \cap Γ_{1} (y)} \frac{1}{k_{z}} \\ + γ \sum_{z \in Γ_{1} (x) \cap Γ_{2} (y)} \frac{1}{k_{z}} \\ + γ \sum_{z \in Γ_{2} (x) \cap Γ_{1} (y)} \frac{1}{k_{z}} \end{matrix}

(25)

where the range of $θ, β, γ$ is [0, 0.5].

The temporal-spatial characteristics of similarity indices

Similarity indices based on temporal characteristics take the historic information of network evolution into account, whereas similarity indices based on spatial characteristics consider the network spatial structure. Considering the historic information of ON evolution and 2-hop neighbor information of nodes, similarity indices based on temporal-spatial characteristics O_CN, O_AA, and O_RA are proposed.

According to equation (18)–(20) and (23)–(25), O_CN, O_AA, and O_RA are proposed, which are defined in equations (26)–(28)

S_{xy}^{O_CN} = (\begin{matrix} (1 - 2 θ) | Γ_{1} (x) \cap Γ_{1} (y) | \\ + θ | Γ_{1} (x) \cap Γ_{2} (y) | \\ + θ | Γ_{2} (x) \cap Γ_{1} (y) | \end{matrix}) • HI C_{xy}

(26)

S_{xy}^{O_AA} = (\begin{matrix} (1 - 2 β) \sum_{z \in Γ_{1} (x) \cap Γ_{1} (y)} \frac{1}{\log k_{z}} \\ + β \sum_{z \in Γ_{1} (x) \cap Γ_{2} (y)} \frac{1}{\log k_{z}} \\ + β \sum_{z \in Γ_{2} (x) \cap Γ_{1} (y)} \frac{1}{\log k_{z}} \end{matrix}) • HI C_{xy}

(27)

S_{xy}^{O_RA} = (\begin{matrix} (1 - 2 γ) \sum_{z \in Γ_{1} (x) \cap Γ_{1} (y)} \frac{1}{k_{z}} \\ + γ \sum_{z \in Γ_{1} (x) \cap Γ_{2} (y)} \frac{1}{k_{z}} \\ + γ \sum_{z \in Γ_{2} (x) \cap Γ_{1} (y)} \frac{1}{k_{z}} \end{matrix}) • HI C_{xy}

(28)

where the range of $θ, β, γ$ is $[0, 0.5]$ , $k_{z}$ is the degree of node z, $Γ_{1} (x) \cap Γ_{1} (y)$ is the common neighbor between 1-hop neighbors of nodes x and y, $Γ_{1} (x) \cap Γ_{2} (y)$ is the common neighbor between 1-hop neighbors of node x and 2-hop neighbors of node y, and $Γ_{2} (x) \cap Γ_{1} (y)$ is the common neighbor between 2-hop neighbors of node x and 1-hop neighbors of node y.

Experiments and analysis

In this article, the area under the receiver operating characteristic curve (AUC) and precision of the receiver operating characteristic curve (ROC) are adopted as evaluation indices. In the different datasets, comparison experiments are used to verify the improved similarity indices.

Experimental datasets

The representative real trace datasets imote traces cambridge (ITC) and detected social network (DSN) are selected for the experiments. ITC is a dense dataset, which records the movement and interconnection of students on campus. DSN is a sparse dataset, which records network communication on campus scenes (Table 1).

Table 1.

Information of datasets.

Data set	ITC	DSN
Device	iMote	tMote
Mobile nodes	50	27
Duration (days)	12	79
Network type	Bluetooth	Wireless
Sample interval (s)	10	6.67

The ITC dataset records a 12-day visual trace of students on the Cambridge University campus. The experiment collects the movement trajectories and interconnection using 50 mobile iMote devices. The dataset is short duration and intensive data. A series of network snapshots of 2 days are selected from the ITC dataset, as shown in Figure 4.

Figure 4.

Snapshots of ITC.

The DSN dataset records 79 days of data of students on the St Andrews campus. The experiment collects communication over a wireless network using 27 mobile tMote devices. This dataset is longer and sparse data. A series of network snapshots of 2 days are selected from DSN dataset, as shown in Figure 5.

Figure 5.

Snapshots of DSN.

Experimental environment

MATLAB integrates many powerful functions such as numerical analysis, matrix calculation, scientific data visualization, modeling, and simulation of nonlinear dynamic system into an easy-to-use Windows environment. In this article, MATLAB is used to conduct comparative experiments on the similarity index between the indices CN, AA, and RA and the improved similarity indices. In total, 80% for training and 20% for testing are used in the ITC and DSN datasets.

Experimental results and analysis

In the ITC and DSN datasets, the improved similarity indices are compared to the indices CN, AA, and RA, and O_CN, O_AA, and O_RA are also compared.

Comparison with indices CN, AA, and RA

The AUC and precision are adopted to evaluate the performance of these similarity indices.

1. Comparison with index CN

In the ITC and DSN datasets, the different length of time slices is intercepted. The AUC and precision of CN, T_CN, E_CN, and O_CN are calculated. The results are shown in Figures 6 –9.

Figure 6.

AUC comparison of CN and improved similarity index in ITC dataset.

Figure 7.

AUC comparison of CN and improved similarity index in DSN dataset.

Figure 8.

Precision comparison of CN and improved similarity index in ITC dataset.

Figure 9.

Precision comparison of CN and improved similarity index in DSN dataset.

From Figures 6 to 9, it shows that the prediction accuracy of indices CN, E_CN, and O_CN decreases with the increase in the time slice in the ITC dataset. And the prediction accuracy of four indices increases with the increase in the time slice in the DSN dataset.

The prediction accuracy of T_CN index increases with the increase in the time slice in ITC and DSN datasets, which indicates that the T_CN index is less affected by the degree of network. E_CN has little effect on the similarity of nodes in sparse networks, but has a significant effect in dense networks. In general, compared with the CN index, the similarity indices T_CN, E_CN, and O_CN have better performance, and the prediction accuracy of O_CN is greatest.

2. Comparison with index AA

In the ITC and DSN datasets, the different length of time slices is intercepted. The AUC and precision of AA, T_AA, E_AA, and O_AA are calculated. The results are show in Figures 10 –13.

Figure 10.

AUC comparison of AA and improved similarity index in ITC dataset.

Figure 11.

AUC comparison of AA and improved similarity index in DSN dataset.

Figure 12.

Precision comparison of AA and improved similarity index in ITC dataset.

Figure 13.

Precision comparison of AA and improved similarity index in DSN dataset.

From Figures 10 to 13, it shows that the prediction accuracy of indices AA, E_AA, and O_AA decreases with the increase in the time slice in the ITC dataset. And the prediction accuracy of four indices increases with the increase in time slice in the DSN dataset.

The prediction accuracy of T_AA index increases with the increase in the time slice in the ITC and DSN datasets, which indicates that the T_AA index is less affected by the degree of network. E_AA has little effect on the similarity of nodes in sparse networks, but has a significant effect in dense networks. In general, compared with the AA index, the similarity indices T_AA, E_AA, and O_AA have better performance, and the prediction accuracy of O_AA is greatest.

3. Comparison with index RA

In ITC and DSN datasets, the different length of the time slices is intercepted. The AUC and precision of RA, T_RA, E_RA, and O_RA are calculated. The results are shown in Figures 14 –17.

Figure 14.

AUC comparison of RA and improved similarity index in ITC dataset.

Figure 15.

AUC comparison of AA and improved similarity index in DSN dataset.

Figure 16.

Precision comparison of RA and improved similarity index in ITC dataset.

Figure 17.

Precision comparison of RA and improved similarity index in DSN dataset.

From Figures 14 to 17, it shows that the prediction accuracy of indices RA, E_RA, and O_RA decreases with the increase in the time slice in the ITC dataset. The prediction accuracy of four indices increase with the increase in the time slice in the DSN dataset.

The prediction accuracy of T_RA index increases with the increase in the time slice in the ITC and DSN datasets, which indicates that the T_RA index is less affected by the degree of network. E_RA has little effect on the similarity of nodes in sparse networks, but has significant effect in dense networks. In general, the similarity indices T_RA, E_RA, and O_RA have better performance compared to the RA index, and the prediction accuracy of O_RA is greatest.

The above experiments suggest that the improved similarity indices perform better in ITC and DSN datasets, comparing to the indices CN, AA, and RA. Particularly, the similarity indices O_CN, O_AA, and O_RA have the best performance.

Comparison with indices O_CN, O_AA, and O_RA

In the ITC and DSN datasets, the different length of time slices is intercepted, and the AUC and precision of O_CN, O_AA, and O_RA are calculated. The results are shown in Figures 18 –21.

Figure 18.

AUC comparison of similarity index in ITC dataset.

Figure 19.

AUC comparison of similarity index in DSN dataset.

Figure 20.

Precision comparison of similarity index in ITC dataset.

Figure 21.

Precision comparison of similarity index in DSN dataset.

From Figures 18 to 21, the AUC and precision of O_AA index are better than O_CN, O_RA in the ITC and DSN datasets. Therefore, in the ON, the O_AA index has the superior performance.

Finally, based on the above experimental results, we can conclude that the similarity indices O_CN, O_AA, O_RA are superior to CN, AA, and RA, and the index O_AA is the best in the ON.

Conclusion

In this article, the similarity indices for link prediction of ON are analyzed. According to the temporal-spatial characteristics of ON, improved similarity indices are proposed, which outperform the indices CN, AA, and RA. Furthermore, index O_AA has the superior performance. The primary works in this article are as follows: (1) analyzing the similarity indices based on local information and on path, (2) describing an ON model and defining its network snapshots, (3) proposing similarity indices based on the temporal characteristics, T_CN, T_AA, and T_RA, which take historic information of network evolution into account, (4) proposing similarity indices based on the temporal-spatial characteristics, O_CN, O_AA, and O_RA, which consider the historic information of ON evolution and 2-hop neighbor information of the nodes, and (5) verifying these similarity indices using ITC and DSN datasets.

The shortcomings of this research are time slice segmentation and weight parameters based on temporal-spatial characteristic similarity indices. Therefore, the future research directions include (1) the method of time slice segmentation, and the appropriate time slice improves the accuracy of the similarity indices and (2) the selection of weight parameters. The appropriate weight parameters make the similarity indices represent the similarity between nodes more property.

Footnotes

Handling Editor: Nicolas Garcia-Aracil

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the National Natural Science Foundation of China (61762065,61363015, 61262020), the Natural Science Foundation of Jiangxi Province (20171ACB20018, 20171BAB202009, 20171BBH80022), and the Innovation Foundation for Postgraduate Student of Jiangxi Province (YC2017-S344).

ORCID iD

Xulin Cai

References

Jadhav

Satao

A survey on opportunistic routing protocols for wireless sensor networks. Procedia Comput Sci 2016; 79: 603–609.

Kitsis

Datta

Layer 3 enhancements for vehicular ad hoc networks. Procedia Comput Sci 2018; 130: 628–635.

Han

Hui

Kumar

et al . Mobile data offloading through opportunistic communication and science participation. IEEE Trans Mobile Comput 2011; 11: 821–834.

Jung

Lee

Chang

et al . Bluetorrent: cooperative content sharing for bluetooth users. Pervasive Mob Comput 2007; 3: 609–634.

Wang

Can mobile cloudlets support mobile applications? In: Proceedings of IEEE INFOCOM’14, Toronto, ON, Canada, 27 April–2 May 2014, pp.1060–1068. New York: IEEE.

Zhou

Liu

Zhao

et al . Link prediction algorithm based on local centrality of common neighbor nodes using multi-attribute ranking. In: Proceedings of the 12th international conference on computer science and education, Houston, TX, 22–25 August 2017, pp.506–511. New York: IEEE.

Yao

Wang

Pan

et al . Link prediction based on common-neighbors for dynamic social network. Procedia Comput Sci 2016; 83: 82–89.

Lin

Wang

et al . Link prediction with node clustering coefficient. Physica A 2016; 452: 1–8.

Yin

. A similarity index algorithm for link prediction. In: Proceedings of the 12th international conference on intelligent systems and knowledge engineering, Nanjing, China, 24–26 November 2017, pp.1–6. New York: IEEE.

10.

Leicht

Holme

Newman

ME.

Vertex similarity in networks. Phys Rev E 2006; 73: 026026–026120.

11.

Adamic

Adar

Friends and neighbors on the web. Soc Networks 2003; 25: 211–230.

12.

Moradabadi

Meybodi

MR.

Link prediction in weighted social networks using learning automata. Eng Appl Artif Intel 2018; 70: 16–24.

13.

Zhou

Lü

Zhang

YC.

Predicting missing links via local information. Eur Phys J B 2009; 71: 623–630.

14.

Liu

et al . Extended resource allocation index for link prediction of complex network. Physica A 2017; 479: 174–183.

15.

Wang

Zhou

Shi

et al . Empirical analysis of dependence between stations in Chinese railway network. Physica A 2009; 388: 2949–2955.

16.

Pan

Liu

et al . Detecting community structure in complex networks via node similarity. Physica A 2010; 389: 2849–2857.

17.

Liu

Xie

et al . An improvement of Link prediction by combining local information and betweenness. In: Proceedings of the 11th international conference on natural computation, Zhangjiajie, China, 15–17 August 2015, pp.456–461. New York: IEEE.

18.

Huang

Zhang

Hui

et al . Link pattern prediction in opportunistic networks with kernel regression. In: Proceedings of the 7th international conference on communication systems and networks, Bangalore, India, 6–10 January 2015, pp.1–8. New York: IEEE.

19.

Shao

. Data exchange similarity based on flow field for link prediction problem. In: Proceedings of the 6th international conference on information science & technology, Dalian, China, 6–8 May 2016, pp.84–89. New York: IEEE.

20.

Zhang

Combo-pre: a combination link prediction method in opportunistic networks. In: Proceedings of the 24th international conference on computer communication and networks, Las Vegas, NV, 3–6 August 2015, pp.1–6. New York: IEEE.

21.

Pei

Liu

Jiao

Link prediction in complex networks based on an information allocation index. Physica A 2017; 470: 1–11.

22.

Behnaz

Mohammad

RM.

Link prediction in stochastic social networks: learning automata approach. J Comput Sci 2018; 24: 313–328.

23.

Zayani

Gauthier

Slama

et al . Tracking topology dynamicity for link prediction in intermittently connected wireless networks. In: Proceedings of the 8th wireless communications and mobile computing conference, Limassol, Cyprus, 27–31 August 2012, pp.469–474. New York: IEEE.

24.

Zheng

The study of link prediction methods in complex networks. Wuhan, China: Wuhan University of Technology, 2012.