Abstract
In large-scale networks, the structure of the underlying network changes frequently, and thus the power iteration method for Personalized PageRank computation cannot deal with this kind of dynamic network efficiently. In this paper, we design a Monte Carlo-based incremental method for Personalized PageRank computation. In a dynamic network, first, we do a random walk starting from each node and save the performed walks into a fingerprint database; second, we update the fingerprint database in a fixed time interval with our proposed update algorithm; finally, when a query is issued by a user, we estimate the Personalized PageRank vector by our proposed approximation algorithm. Experiments on real-world networks show that our method can handle multichanges of the underlying network at a time and is more efficient than related work, so it can be used in real incremental Personalized PageRank-based applications.
1. Introduction
Large-scale networks are ubiquitous in today's world, such as World Wide Web, online social networks, and huge search and query-click logs regularly collected and processed by search engines. PageRank [1], which is the stability distribution of a random walk in a Markov chain, has emerged as an effective measure for ranking on large-scale networks. In addition, Personalized PageRank [2, 3], which is the stability distribution of a random walk with restart in a Markov chain, has been proved to be an effective method of personalized ranking in many applications, such as link prediction [4–6], friend recommendation [7–9], and personalized searching [10, 11]. Given a network, there are two main methods for computing the PageRank or Personalized PageRank vector: one is power iteration applying the linear algebra proposed by Page et al. [1] and the other is the Monte Carlo approximation methods proposed by Litvak [12] and Fogaras and Rácz [13].
In practice, the structure of the underlying network is dynamic, such that Webpages join or leave frequently in World Wide Web and following relations appear or disappear frequently in an online social network. Recomputing PageRank or Personalized PageRank for every change, that appears or disappears in an edge, of the structure is impossible in large-scale networks, so incremental PageRank or Personalized PageRank algorithms are necessary for this situation.
1.1. Background
Given a graph
where
The transition possibility matrix
where the weights on the outgoing edges of each node sum up to 1.
PageRank is the stability distribution of a random walk in a Markov chain. The main idea of random walk on a graph is that starting from a source node, at each step, jump to a random node with a probability
Personalized PageRank is the same as PageRank, except that the random walk jumps to the source node with a probability
where
1.2. Problem Statement
In this paper, we study the incremental computation of Personalized PageRank based on the Monte Carlo method; that is, given a graph, we do a random walk of fixed length starting from each node update the performed walks incrementally in fixed time interval, and approximate Personalized PageRank with the performed walks whenever needed.
Here, the core of the problem is designing an incremental update algorithm and a Personalized PageRank approximation algorithm.
1.3. Our Contribution
First, we design an approximation method for Personalized PageRank based on Monte Carlo method. We do a random walk of fixed length starting from each node, and save the performed walks in a fingerprint database.
Second, we propose an incremental algorithm, which updates the performed walks in fixed time interval, while the structure of underlying graph changes.
Third, we propose an approximation algorithm for computing Personalized PageRank vector based on the performed walks.
Finally, we do some experiments on real world graphs, and the experiments show that our proposed method is more efficient than related researches in estimating Personalized PageRank on dynamic networks.
2. Related Work
3. Monte Carlo-Based Incremental Personalized PageRank Computation
In this section, we describe our proposed system for incremental Personalized PageRank computation and illustrate the system in detail.
3.1. System Overall Design
Our system answers users’ Personalized PageRank queries or recommend Personalized PageRank results to the specific users, and the architecture of the system is in Figure 1. First, we initialize the fingerprints and save them in a fingerprint database, and then, in a fixed time interval, we update the fingerprint database. These two phases are done automatically by the system. When a query is issued by a personalized user, the searching engine retrieves related fingerprints from the fingerprint database, estimates the Personalized PageRank vector, and then returns nodes with highest score to the user. Some of the main phases of the system work as follows.

System architecture.
3.2. Initializing Fingerprint Database
Given a graph
node (1) (2) Let (3) (4) (5) (6) (7) (8) (9) Return
After doing the performed random walks, we save the performed walks in the fingerprint database.
3.3. Updating Fingerprint Database
In dynamic networks, Bahmani et al. [20] propose a model to update the fingerprint database whenever there is a joining edge or leaving edge. However, the structure of the underlying graph changes frequently, so it is impossible to update the fingerprint database for very change. In this paper, we propose an update algorithm that handles multichanges while updating the fingerprint database.
In a time interval
and (1) Reorder (2) Let (3) (4) Let (5) (6) (7) (8) (9) (10) Let (11) (12) (13) (14) Let (15) Let (16) (17) Let (18) (19) (20) (21) (22) (23) Let (24) (25) (26) (27) (28) Let of (29) (30) Update
Given a series of changes
For the
For the
Finally, for both
3.4. Computing Personalized PageRank
In this subsection, we propose an approximation method for estimating Personalized PageRank vector based on the performed walks, and the details are in Algorithm 3.
Personalized PageRank, and (1) Let (2) (3) Scan from (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) Return
For each fingerprint
4. Experiments
In this section, we do some experiments to validate the efficiency of our proposed method. We use the method proposed by the Bahmani et al. [20] as baseline algorithm, and compare it with our method in execution time.
4.1. Dataset
To validate the performance of both algorithms, we use two real-world datasets of networks, Dianping [33] and Gowalla [34]. The details of the datasets are in Table 1.
Details of datasets.
4.2. Experimental Setup
Our experimental environment is a personal computer with a 4-core processor and 4 GB memory. All algorithms are implemented by Java on the Ubuntu operation system. Our toolkit for dealing with graph is Jung [35], and the database for storing fingerprints is MySQL.
4.3. Experimental Results
For each graph
First, we compare the update time between our method with the method proposed by Bahmani et al. We update the fingerprint database in a fixed time interval from 1 to 10 second, and the results for two datasets are in Figures 2 and 3. From these two figures we can see that our update algorithm executes only once, but the Bahmani's algorithm need to update the fingerprint database multitimes that is proportional to the time interval. Moreover, we find that the time used in updating the fingerprint database contains a nearly constant time retrieving the related fingerprints.

Comparison of update time (Dianping).

Comparison of update time (Gowalla).
Second, we compare the execution time of Personalized PageRank approximation between the proposed method and Bahmani's method. Both of the algorithms retrieve fingerprints from the fingerprint database first and then estimate the Personalized PageRank vector with the fingerprints. The Bahmani's algorithm retrieves related fingerprints starting from the personalized node, and there are millions of fingerprints, so the searching process takes a lot of time. Our algorithm returns all fingerprints, and it needs only one pass of scanning the whole fingerprints to compute the Personalized PageRank vector. We randomly choose 10 personalized nodes and compare the average approximation time of Personalized PageRank, and the details are in Figure 4.

Average approximation time comparison of Personalized PageRank (PPR) approximation.
Third, we compare the overall performance between the two methods. The update time and Personalized PageRank approximation time of our method are all shorter than the Bahmani's method, and thus it is obvious that the overall time used by our method should be shorter. Here, using the multithreading technology of Java, we compare the throughput of the two methods, and the results are in Figure 5. In this experiment, one thread is used to generate changes of the structure of underlying graph, one thread is used to update fingerprints, and other threads are generated dynamically to compute the Personalized PageRank vector. We use the number of finished Personalized PageRank vectors per minute as the throughput. From the figure we can see that our proposed method is more efficient than the Bahmani's method in throughput.

Comparison of throughput.
Finally, we compare the accuracy of the two algorithms. To measure the quality of the results, for a node
where

Bucket versus

Bucket versus

Bucket versus

Bucket versus
5. Conclusion
Personalized PageRank is very important for complex networks, and it has been proved to be effective in many real applications, such as link prediction, entity recommendation, and social searching. However, in these applications, the underlying networks are usually dynamic, and traditional power iteration method cannot deal with them efficiently.
In this paper, we design a Monte Carlo-based incremental method for Personalized PageRank approximation in dynamic networks. We propose an efficient update algorithm to handle changes of the structure of the underlying network, and an efficient Personalized PageRank approximation algorithm with the performed walks. Experiments on real-world networks show that our method can handle multichanges of the underlying network structure at one time and is more efficient than related work, so can be used in real incremental Personalized PageRank-based applications.
Footnotes
Acknowledgments
This work was supported in part by the National Significant Science and Technology Special Project of China and the National Natural Science Foundation of China.
