Abstract
Location publication in check-in services of geo-social networks raises serious privacy concerns due to rich sources of background information. This article proposes a novel destination prediction approach Destination Prediction specially for the check-in service of geo-social networks, which not only addresses the “data sparsity problem” faced by common destination prediction approaches, but also takes advantages of the commonly available background information from geo-social networks and other public resources, such as social structure, road network, and speed limits. Further considering the Destination Prediction–based attack model, we present a location privacy protection method Check-in Deletion and framework Destination Prediction + Check-in Deletion to help check-in users detect potential location privacy leakage and retain confidential locational information against destination inference attacks without sacrificing the real-time check-in precision and user experience. A new data preprocessing method is designed to construct a reasonable complete check-in subset from the worldwide check-in data set of a real-world geo-social network without loss of generality and validity of the evaluation. Experimental results show the great prediction ability of Destination Prediction approach, the effective protection capability of Check-in Deletion method against destination inference attacks, and high running efficiency of the Destination Prediction + Check-in Deletion framework.
Introduction
Driven by the explosive increase of online social media and modern mobile devices with integrated position sensors, such as smart phones and tablet PC, geo-social networks (GeoSNs) have attracted millions of users in recent years. GeoSNs are online social networks (OSNs) that combine location-reporting capabilities with traditional social network functionality, 1 such as Facebook Places, Foursquare, Google Latitude, Gowalla, and Flickr. One of the most popular services in GeoSNs is check-in service. It allows users to publicly report their arrival at a location (i.e. Point Of Interest, POI) and share their experiences, which not only provides a platform for social interactions and self-expression, but also material benefits to users, such as special offers or discounts from the business providers. Thus, from this aspect, GeoSNs users tend to check in as frequently as possible to benefit from GeoSNs’ check-in services.
However, along with the aforementioned pleasure and benefits check-in services bring, publication of users’ locations from the services do raise serious privacy threats.2–4 On one hand, revealing a user’s exact location would lead to privacy leakage when the place itself is sensitive, such as home or work place, or it may allow adversary to infer sensitive information if the user visits, such as a hospital and a night club. One may argue that the user should notice the potential privacy violations and avoid checking in such sensitive positions. Indeed, this is inevitable according to findings of several sociological and psychological studies,5,6 which prove that the vast majority of people underestimate the risks. On the other hand, continuously shared locations constitute trajectories, which enable intrusive inferences of users’ movement pattern and locations that may be misused for stalking, mugging, or determining empty houses for burglaries. 3 Consequently, practical location privacy protection mechanisms are badly needed in GeoSNs.
Currently, there are some existing location privacy protection approaches based on different attack models.7–9 Unfortunately, the protection of location privacy in GeoSNs still remains a big challenge. 3 (1) Many approaches only consider limited attack models, ignoring the commonly available background information that may be used by advanced adversaries to violate location privacy, such as social information and geographical information. (2) Because of the inevitable “data sparsity problem,” 10 the prediction ability of the existing Bayesian-based location inference algorithms is considerably restricted when the available historical trajectories are too sparse to cover all the possible trajectories. (3) Existing location privacy protection methods mainly focus on processing user’s real-time check-in by location obfuscation or position dummies, 11 which badly affect user experience. Furthermore, the efficiency of existing location privacy protection frameworks is too low to meet real-time check-in demand. 12
Our work was motivated by the above limitations. In this article, we focus on the destination prediction attack and propose a new attack model named DesPre (Destination Prediction) and corresponding protection method named CkiDel (Check-in Deletion) and framework named DPCD (Destination Prediction + Check-in Deletion) specially for check-in services. We first construct the personalized historical trajectories data set for each target user based on social relationships and user similarities. Then, to conquer the “data sparsity problem,” we decompose the historical trajectories in the user’s personalized trajectories data set to construct a Markov model to calculate transition probabilities offline. Afterwards, by directly fetching those transition probabilities, we compute the destination probability and then the top-
A novel approach to personalized destination prediction, named DesPre, is proposed specially for check-in services in GeoSNs, which not only addresses the “data sparsity problem” faced by other destination prediction approaches, but also takes advantage of the commonly available background information from the GeoSNs so that the most probable locations to be visited by the target user can be inferred accurately even when the available historical trajectories are too sparse to cover all the possible trajectories.
Making use of the DesPre approach as the privacy attack model, we present a new privacy protection method CkiDel, which prevents adversaries from obtaining the correct patterns of users’ movements by deleting the smallest number of users’ historical check-ins so that the real sensitive destinations of target users would not be available without sacrificing the real-time check-in precision and user experience.
Based on the DesPre inference approach and CkiDel protection method, the location privacy protection framework DPCD is proposed to guard against destination inference attack. It achieves both effective protection ability and high running efficiency since the inference approach DesPre transfers large quantities of time-consuming calculations to the offline training phase and utilizes the geographical and time constraints to filter the unreachable destinations to avoid further complex calculations, which is proved by extensive experiments.
Without loss of generality and validity of the experimental evaluation, a new data preprocessing method is designed to construct a reasonable complete check-in subset from the worldwide check-in data set of Gowalla, a real-world GeoSN.
The remainder of this article is structured as follows. We discuss related work in section “Related work.” The problem this article concerned is formalized in section “Problem formalization.” In section “DesPre-based attack model,” DesPre approach is presented in detail along with DesPre-based attack model. Section “Privacy protection method CkiDel and framework DPCD” describes our privacy protection method CkiDel and framework DPCD. Experimental evaluations are discussed in section “Experimental evaluation.” Section “Conclusion and future work” concludes the article and presents our future work.
Related work
Past work on privacy in GeoSNs includes identifying privacy threats and proposing privacy protection approaches. 13 Freni et al. 14 first formalized the notions of location and absence privacy in GeoSNs and proposed a cloaking algorithm to enforce them. CR Vicente et al. 1 complemented the privacy preservation concepts and identified four specific classes of privacy threats systematically in GeoSNs. Mascetti et al. 15 designed two protocols for preserving privacy by encrypting location information to prevent collection of a user’s geographical information. L Siksnys et al. 16 proposed a client–server solution to achieve private and flexible proximity detection in GeoSNs. G Zhong et al. 17 introduced three distributed protocols to guarantee location privacy in location-based services (LBSs). However, some of the above solutions1,14 could not meet the demands of exact location and real-time publication in check-in services, while some others15–17 were specially designed for the proximity services of GeoSNs, which did not apply to location publication of check-in services in GeoSNs.
Check-in services in GeoSNs could be considered as a combination of general LBSs and OSNs. So the most intuitive idea is to apply the existing privacy-related findings in LBSs and OSNs to GeoSNs setting. Unfortunately, direct application is unfeasible. On one hand, research works on privacy in OSNs mainly focus on profile information, while the biggest challenge in check-in services is posed by the dynamic geo-spatial information which bridge the online world and offline world. On the other hand, techniques in general LBSs are mostly based on the assumption of user anonymity, 18 which does not correspond with the reality in most existing GeoSNs. 14 Furthermore, there is no much non-positional information like social relationships in general LBSs, while these information plays a key role in GeoSNs’ location privacy leakage.
Although the achievements in LBSs cannot be applied to GeoSNs setting directly, we could still draw lessons from them. Some scholars who work on destination prediction approaches in general LBSs 11 are motivated mainly by the application in location-based targeted advertising or POI recommendation. Proper utilization of these location inference approaches could benefit both users and business providers, nevertheless, abuse of them by adversaries could pose great threat to location privacy. The attack model proposed in this article is inspired by these ideas. Since continuously shared locations in check-in services constitute trajectories, those historical trajectory-based destination prediction approaches could be transplanted into check-in services of GeoSNs if only taking the GeoSNs’ features into consideration. Furthermore, the literature 19 showed that the best predictors of human behavior are based on his friends, relatives, and other related people. Cho et al. 20 found that, by jointly analyzing GeoSNs social relationships and user mobility, more than half of the users’ movements are affected by their friends. These findings further support the idea of this article since the commonly available social structures in GeoSNs could be used to improve the accuracy of the inference model as well as the attacking ability of the attack model.
The most related work to ours is presented by Xue et al. 10 Nevertheless, our destination inference model is different from theirs in the following four aspects:
Our model is specially designed for check-in services of GeoSNs, whose unique features pose us a new problem, while the prediction algorithm by Xue et al. 10 mainly focuses on general LBSs.
When formalizing the destination prediction problem, we consider the time factor and decompose the prediction into two main issues, which is more in line with reality. However, they only concern the time-independent destination probabilities, which is just one of the two main issues we concern.
Our destination inference approach is individual-oriented, which means each target user would have his or her own personalized model. However, the inference model for different users is exactly the same in the work by Xue et al. 10 which may introduce inaccuracy since human moving patterns vary from person to person. 20
Our model makes full use of commonly available information by utilizing historical and social information to calculate posterior probabilities and considering geographical information to filter unreachable locations, which contributes to the accuracy and efficiency of the model, respectively. But Xue’s model only took the historical trajectories into account.
Besides, considering user’s need of self-expression and social interaction, our privacy protection method CkiDel would keep the real-time check-in from being removed to the greatest extent, while the End-Points Generation Method in the work by Xue et al. 10 may easily cancel users’ current check-in, which dampens user experience and discounts the functionalities of GeoSNs.
Problem formalization
Terms and notions
Check in is the process whereby a user
In previous studies on destination prediction,10,21–24 a uniform grid is commonly used to help represent the data set. Following this paradigm, we divide the entire interested area into square cells with sides of

Sample grid graph and trajectory.
Problem and goal
In this article, we focus on the basic location privacy problem in GeoSNs and limit our attention to destination prediction attack, a sort of location privacy attack from which the adversary could infer the most likely place to be visited at certain time in the near future by the target user. Given the historical trajectories data set, the user-specified query trajectory
Note that, given a definite
In equation (2), the former probability
According to equation (2), the destination prediction problem could be decomposed into two main issues. One is to calculate the reachability probability of each grid cell in the training data set, and the other is to compute the time-independent destination probability. The goal of our proposed inference approach DesPre is to list the top-
DesPre-based attack model
Background information of attackers
It is widely accepted that the GeoSN service providers are un-trusted.
12
They may either analyze users’ data or share them with un-trusted third parties. We assume that adversaries have access to all the resources published on all the users’ GeoSN profile, including users’ basic attributes, check-in records, and articulated lists of friends. This assumption may be conservative, but for the sake of security, we should never underestimate the enemies. Furthermore, since geographical information is generally available, adversaries could easily obtain their interested road networks and related geographical resources, such as the maximum speed of road sections. Taking these geographical constraints into consideration, the attacking ability of the adversaries would be enhanced since the field of potential destinations can be narrowed and the accuracy and efficiency of destination inference can be improved. In sum, the background information the adversary possesses is as follows (n is the total number of users in GeoSN): (1) historical information: the set
Destination prediction approach: DesPre
Based on the abovementioned background information the adversaries may have, the destination prediction approach DesPre is proposed, which mainly includes two phases: the offline training phase and the online prediction phase. Detailed descriptions are as follows.
DesPre—offline training
The offline training phase of DesPre aims to prepare the probabilities for efficient online computing of destination probabilities, and it includes three steps.
Personalized historical trajectories filtering
Human movement exhibits structural patterns, which may vary from person to person due to geographical and social constraints. Thus, the practice of utilizing the whole historical trajectory data set to construct one inference model for all users by Xue et al. 10 would introduce inaccuracy when predicting destinations. Given this, our DesPre approach first filters the historical trajectories to construct personalized historical data set for individual-oriented destination prediction. The filter rules we employ are as follows:
1. Filtering by friend closeness. Previous studies indicate that more than half of the users’ movements are affected by their friends.
20
The closer the two persons are, the more likely they hang out together, resulting in more similar visiting behaviors. From this aspect, the historical check-ins of the target user’s friends count for much in the user’s movement prediction. Thus, a variable friend closeness, denoted by clos, is used to identify the target user
Then, the first filter rule we employed could be formalized as
It means that only when clos meets the given friend closeness filter threshold
2. Filtering by user similarity. Similar users who have common interests and tastes tend to visit similar places. 12 Thus, similar users’ historical trajectories may contribute a lot to the target user’s destination prediction. In order to estimate user similarity, check-in vector 25 is introduced to measure a user’s visiting possibility to a set of grid cells.
Definition 1. (check-in vector)
Given a grid cell sequence
Thus, given two users
where
It means that only when sim meets the given user similarity threshold
By applying the two filter rules to the whole data set, we can obtain an individual-oriented historical trajectories training set
Markov model construction
In order to make full use of the selected trajectories and conquer the data sparsity problem, we decompose each trajectory in
Specifically, each state in Markov model corresponds to a grid cell in
Using equation (6), we can get the one-step transition probabilities of each pair of adjacent grid cells in
Take the scenario in Figure 1 as an example; suppose that the three trajectories shown in the grid graph are all the trajectories in
Total transition probability matrix formation
The probabilities stored in
where ds is the length of the shortest path between
DesPre—online prediction
The online Prediction phase of DesPre aims to list the top-
Reachability probability
Human’s movement may be circumscribed by geographical factors, such as road connectivity, maximum moving speed of road sections, and available time. Taking these geographical constraints into consideration, the attacking power of the adversaries would be enhanced, because the field of potential destinations can be narrowed and the accuracy and efficiency of destination inference can be improved.
Reachability probability is a variable indicating whether a certain grid cell is reachable or not from the target use’s current location within the time interval
It is mainly used for filtering the unreachable grid cells to avoid further complex calculations of time-independent destination probabilities. If the reachability probability equals 1, the DesPre model will continue to calculate the time-independent destination probability. Otherwise, the current candidate grid cell will be eliminated.
Time-independent destination probability
Time-independent destination probability
where
where
In equation (9),
where
where every
Online prediction algorithm
Given the sample grid cells set
As shown in Algorithm 1, we first calculate the trajectory probability
Privacy protection method CkiDel and framework DPCD
Privacy protection method: CkiDel
As indicated above, people’s movement patterns are hidden in their historical check-in records, and destination prediction approaches exactly utilizes users’ historical check-ins to mine the behavior patterns and thus to infer destinations. Given these, to guard against destination inference attack, the most intuitive idea is to prevent adversaries from obtaining the correct patterns of users’ movements so that the most likely destinations would not be available. This is exactly the basic idea of our proposed location privacy protection method CkiDel. In practice, users with high security awareness tend to manually remove some of the published historical contents on their OSN profiles at regular intervals, which exactly proves the feasibility of our idea. In CkiDel, we act appropriately to break the users’ original behavior patterns by deleting some records of their historical check-ins so that the sensitive destination of the query trajectory would no longer be predicted as the top-
CkiDel—check-in nodes removing strategy
According to the analysis in sections “Problem and goal” and “Destination prediction approach: DesPre,” given a specific query trajectory
which shows that the value of the destination probability only lies on the starting grid cell
All in all, given an individual-oriented DesPre inference model and a user-specified query trajectory, destination probabilities of potential positions will not change unless the endpoints of the trajectories are removed. So, the strategy of our proposed CkiDel method is to delete the endpoints of the query trajectory until the calculated destination probability of the sensitive actual destination meets the user-specified privacy threshold. Moreover, since the current grid cell
CkiDel algorithm
Based on the above idea and strategy, the algorithm of CkiDel is shown in Algorithm 2.
As shown in Algorithm 2, given the query trajectory
Privacy protection framework: DPCD
Based on the above inference model and protection method, we propose a location privacy protection framework DPCD to guard against destination inference attacks. DPCD consists of three main components: mobile user device, trusted proxy server, and GeoSN server. We assume the centralized trusted proxy server lies between GeoSN users and GeoSN server. The architecture of DPCD is shown in Figure 2.

Architecture of the DPCD framework.
The trusted proxy server has three modules: DesPredictor, PrvProtector, and CkiProcessor. Based on the DesPre approach, DesPredictor is in charge of preparing all the intermediate parameters offline and calculating the most likely destinations online for target users. Given the user-specified sensitive locations and predicted destinations from DesPredictor, PrvProtector is responsible for designing the privacy preserving check-in proposals by applying the CkiDel method. According to the proposals made by PrvProtector, CkiProcessor acts on behalf of target users to publish the real-time check-in on GeoSN and deletes the chosen historical check-in records. To clearly articulate the working process of DPCD, Figure 3 shows a sequence diagram to present the flow of a check-in event under the DPCD framework.

Sequence diagram of the check-in event under the DPCD framework.
The proposed protection method and framework actually tend to supply a privacy alert and suggestive solution rather than a compelling measure. This is the exact reason why the DPCD framework is designed to involve multiple interactions between GeoSN users and the trusted proxy server. As mentioned in section “Privacy protection method: CkiDel,” whether it is convenient or not to remove certain check-in nodes is quite subjective. Thus, the best thing we could do is to remind the users of the potential threats they may face if they insist checking in certain nodes.
Experimental evaluation
Data preprocessing and properties analysis
Data preprocessing method
The data set we utilized is a real-world check-in data set from a GeoSN named Gowalla, which was collected by Cho et al. 20 from February 2009 to October 2010. It is a worldwide data set containing both public check-in data and an explicit social network. For simplicity, only the resident users of California (i.e. the study area) and related check-ins are studied by this article, and a subset selection method is needed.
As far as we know, the commonly used subset selection methods in such studies tend to directly keep the check-in records located in the interest study area 12 and then the corresponding users would be selected as the experimental subjects in turn. Frankly, this method seems brutal, and quite a number of trajectories would be artificially broken, resulting in the breaking of users’ moving patterns. Destination predictions and performance evaluations based on such broken trajectories make no sense. To overcome the deficiency, we propose a data preprocessing method to construct a reasonable check-in subset from the worldwide check-in data set of Gowalla. Detailed preprocessing method is discussed as follows.
First, we select the check-in records whose POI is located in California, namely, California-related check-ins. Then, users involved in the California-related check-ins, namely, California-related users, would be certain. Note that, among the California-related users, there must be the ones whose main activity area was beyond our study area and only short visits were paid to California. That is to say, for such users, our check-in selection breaks their movement patterns, so the destination prediction based on the broken trajectory data would make no sense. Thus, these kinds of users should be abandoned. Specifically, for each California-related user, we calculate the ratio of the number of California-related check-ins to the total number of the user’s check-ins in the original data set. Then, taking the calculated check-in ratio as the filter, all the California-related users whose check-in ratio is less than 1 would be removed from our data set so that the resident users of California, namely, significant users, would be fixed.
Our new method could guarantee all the needed trajectories of the selected experimental subject (i.e. a target GeoSN user) are completely reserved and are never artificially broken. In other words, for a given target user, no matter the ultimately used data set is the worldwide one or the tailored one obtained through our preprocessing method, trajectories related to him or her (i.e. those needed to construct his or her prediction model) would always be the same, and the subsequent model construction processes as well as the model’s performance would not be different. This implies that the proposed data preprocessing method will not lead to the loss of the experimental evaluation’s validity or generality.
Properties analysis
The tailored data set obtained through the aforementioned preprocessing method includes 5397 users and 214,961 check-in records. We further calculate the time interval and distance between two consecutive check-ins, and the distributions of them are shown in Figure 4.

The resulting data set’s attributes.
Besides the data set of Gowalla, we also obtain the road network data of California 28 for the calculation of reachability probabilities, which contains 21,693 edges (i.e. road sections).
Effectiveness evaluation
Evaluation metrics
Two performance metrics, Predictive Accuracy and Aggregated Distance Error, are used to evaluate the effectiveness of our proposed approaches. Predictive Accuracy measures how accurately the prediction models can predict the actual destination of the users. For instance, accuracy of 0.8 means that in 80% of the cases, the actual destination is among the top-
where
Baseline models
We employ three non-trivial baseline models for effectiveness evaluation.
1. Most Frequent Visit (MFV) model
The MFV model
29
assigns the probability of a user
2. ZMDB model
The ZMDB model, which is named after the authors’ name of literature, 24 is based on the Bayesian inference framework, specifically
The posterior probability is defined as
where
3. Sub-Trajectory Synthesis (SubSyn) model
The SubSyn model
10
mainly focuses on the solution of data sparsity problem faced by historical trajectories–based destination inference models. It also follows the Bayesian inference framework, where the destination probability and the prior probability
Prediction capability evaluation of DesPre
Evaluation with the varied time span of training set
Human’s movement pattern may change over time, so the options of historical trajectories we utilize influence the performance of inference models. To study the models’ effectiveness when the time span of training set varies, we separate all the check-in records over the time span of 20 months into 13 time buckets; thus each time bucket covers approximately 1.5 months, and the timestamp at the end of the

Data set partition of the
In each round of experiments, we first utilize the corresponding raw training set to construct individual-oriented Markov model for each target user, where each user’s check-in sequence recorded in 1 day is regarded as a trajectory. Then, we randomly choose five check-ins for each target user from the testing set as the current check-in

Performance of inference models with varied time span of training set.
According to Figure 6, all the models roughly witness a decreasing trend in accuracy and an increasing trend in Aggregated Distance Error along with an increase in training set’s time span, which could be explained by the increasing number of appeared unique check-ins. DesPre performs better than the baselines for all the 13 rounds of experiments, whose best performance occurred in the second round, for example, the time span of the training set is approximately 2.5 months. In the first round of experiments, DesPre performs relatively worse for the reason that few patterns could be mined and utilized to predict destinations without sufficient data, so does SubSyn. SubSyn ranks second to DesPre, but it shows the highest decreasing rate in accuracy due to its processing method of taking all the users’ check-in records to construct one inference model, which would obfuscate the users’ personalized movement pattern in the long run. Without impressive accuracy and precision, the ZMDB model performs stable except in the first round where its effectiveness was seriously affected by the data sparsity problem since there are no enough historical trajectories in a training set of short time span. MFV simply predicts the destination as the most frequently checked-in location in history, ignoring the short-term effect so that it could not distinguish which one is more significant in the long history, which leads to continuous decrease in Predictive Accuracy and large margin of error. All in all, DesPre outperforms SubSyn, ZMDB, and MFV in both Predictive Accuracy and Aggregated Distance Error, and the optimal time span of the training set is approximately 2.5 months in the scenario of this article.
Evaluation with the varied size of grid cell
The size of grid cell may influence both precision and accuracy of prediction models. Specifically, a coarse grid, on one hand, may lead to low prediction precision because the area covered by each grid cell is large, and on the other hand, it may improve the prediction accuracy because more trajectories in the training set would fall into the identical grid cells, thus increasing the number of the matching query trajectories. While a fine grid may lead to high prediction precision but low accuracy relatively, furthermore, it may also introduce inefficiency into prediction models. Thus, we need to study the impact of varied grid size on the performance of inference models to find a balanced and compromised grid cell’s side length so that the best prediction performance could be achieved.
We carry out four rounds of experiments by changing the side length

Performance of inference models with varied size of grid cell.
As shown in Figure 7, Predictive Accuracy of all the prediction models generally shows a rising trend with an increase in grid cell’s side length, which dovetail with the above analysis that sparse grid leads to high accuracy. Among all the models, DesPre achieves the highest accuracy with the lowest Aggregated Distance Error being 1.2 km, since it utilizes rich sources of background information to mine the movement patterns of users, while MFV performs the worst for it simply predicts the destination as the most frequently location in history and cannot distinguish which one is more important over a period of time. ZMDB approach suffers the most from the data sparsity problem caused by the fine grid, especially in the case that the side length of grid cell is very short, for example, 200 m, while SubSyn and DesPre were not much affected. As for the Aggregated Distance Errors, all the models show a general increase in trend with the increase in λ after the global minimum point where λ equals 400 m. In sum, DesPre has the overwhelming advantage in both Predictive Accuracy and Aggregated Distance Error, and 400 m should be the optimal value of grid cell’s side length for the data set utilized by this article.
Protection ability evaluation of CkiDel
Given that the protection ability of DPCD is consistent with that of the CkiDel method, so we only discuss the protective performance of CkiDel in the following. Our evaluation methodology is to take the incomplete trajectories processed by CkiDel as the input of several non-trivial destination inference models to see whether the inference models could still obtain the real destinations or not. If the answer is yes, it means the CkiDel is invalid, if no, it means the CkiDel achieves the protection goal of guarding against destination prediction attacks.
Our evaluation is based on the experiments conducted in subsection “Evaluation with the varied size of grid cell.” We take the resulting incomplete trajectories obtained by deleting a portion of check-ins of the original historical trajectories as the input of several destination inference attack models, including the DesPre-based, SubSyn-based, ZMDB-based, and MFV-based attack models. Then, we record the predictive results of them.
The aforementioned performance metrics, Predictive Accuracy and Aggregated Distance Error, are still used to evaluate the prediction performance of the aforementioned destination inference models. It is shown that the Predictive Accuracy of all the attack models is equal to 0, and the Aggregated Distance Error of them is pretty high (no less than 32.57 km, and nearly 37.41% of the check-in nodes in the query trajectories being removed on average). This exactly implies the invalidity of the attack models in front of CkiDel, and thus proves the protection capability of the CkiDel method and the DPCD framework. In fact, these results could be summarized from theoretical analysis. CkiDel is designed based on the DesPre model, and a removing list of historical check-ins could be accepted if and only if the inference model should be incapable of predicting the real destination using the resulting trajectory. Furthermore, according to the proofs in subsection “Prediction capability evaluation of DesPre,” DesPre, as the the most powerful inference model, has utilized the most sufficient and effective background information for destination prediction, but it still fails to obtain the correct results. Not to mention the other weaker ones whose input information is poorer than DesPre. From this point of view, CkiDel and DPCD’s achieving of the protection goal becomes easy to understand.
Efficiency evaluation
Besides the effectiveness of the proposed approaches, the operating efficiency of the DPCD framework is also significant for it needs to run online and answer real-time queries instantaneously so as to achieve GeoSNs’ check-in service and guarantee user experiences. Given the architecture of DPCD, we, respectively, study the running efficiencies of DesPre and CkiDel on an enterprise server with Intel Xeon CPU E5-2650 and 64 GB RAM.
Efficiency of DesPre
Running efficiency of DesPre offline training phase
We carry out the evaluation with respect to various grid sizes, that is, four rounds of experiments by changing the side length
Average runtime of DesPre-training phase.
As we can see from Table 1, the efficiency of DesPre’s offline training phase is not high, because we choose to transfer large quantities of time-consuming calculations to the offline training phase in order to guarantee the efficiency of the online prediction. Fortunately, the DesPre-training algorithm will not be evoked for each online query, and the transition probabilities generated by the offline training phase need not be updated instantaneously. In other words, DesPre-training algorithm only needs to run occasionally, for example, once in a few months. Therefore, the relatively long runtimes are totally acceptable, especially the one corresponding to the optimal side length of 400 m, whose average runtime is around 2 h.
Running efficiency of DesPre online prediction phase
As for the online prediction phase of DesPre, we record the runtime of the algorithm and compare it with SubSyn and ZMDB, for they all follow the Bayesian inference framework and have a relatively independent and complete online prediction phase. Results show that the average runtime of DesPre’s online prediction algorithm is approximately 10−2 ms for each query, while that of SubSyn is 10−1 ms and ZMDB is more than 102 ms. Detailed average runtime of different online prediction algorithms is shown in Table 2. The worst performance of ZMDB could be explained by that in order to compute the posterior probability, the time-consuming trajectory matching process is obligatory, while DesPre and SubSyn could fetch most of the needed probabilities directly from the existing matrices generated by the offline training phase. The reason why SubSyn performs worse than DesPre is that SubSyn needs to compute all the POIs’ destination probabilities to find the top-
Average runtime of online prediction algorithms.
Efficiency of CkiDel
We conduct the evaluation with respect to number of check-ins in query trajectories, while the grid cell’s side length is 400 m and the time span of the training data set is 2.5 months. The average runtimes of CkiDel are listed in Table 3. Results show that the average runtimes do not change too much with an increase in check-ins in query trajectories.
Average running time of CkiDel.
The average runtime of CkiDel is approximately 10−2 ms, which is one magnitude faster than the privacy protection method by Xue et al. 10 and far more acceptable than the existing privacy alert method by Huo et al. 12 whose average response time is in minute level. The efficiency advantage of CkiDel mainly benefits from the quick response of online destination inference, which is due to the following two strategies. The first one is to transfer large quantities of time-consuming calculations to the offline training phase, and the second one is to utilize the geographical and time constraints to filter the unreachable destinations to avoid further complex calculations of destination probabilities.
Conclusion and future work
Privacy risks in location publication are high due to rich sources of background information. In this article, we propose a novel destination prediction approach specially for the check-in service of GeoSNs and design a protection method CkiDel to guard against destination inference attack. Based on DesPre and CkiDel, a new privacy protection framework DPCD is proposed to help the GeoSN users to detect potential location privacy leakage and retain confidential location information without sacrificing real-time check-in precision and user experience. We also design a data preprocessing method to construct reasonable and complete data subset from the real-world GeoSN data set for performance evaluation. Experimental results prove the effective prediction ability of the DesPre approach, high protection capability of the CkiDel algorithm, and high running efficiency of the DPCD framework.
Currently, the DPCD framework mainly counters the threat of destination inference attack. In the future, we will consider more potential location privacy leakage threats to extend our location privacy protection framework.
Footnotes
Academic Editor: Geng Yang
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Nature Science Foundation of Jiangsu China under Grant No. BK20131069.
