Sage Journals: Discover world-class research

Abstract

With the growing popularity of fifth-generation-enabled Internet of Things devices with localization capabilities, as well as on-building fifth-generation mobile network, location privacy has been giving rise to more frequent and extensive privacy concerns. To continuously enjoy services of location-based applications, one needs to share his or her location information to the corresponding service providers. However, these continuously shared location information will give rise to significant privacy issues due to the temporal correlation between locations. In order to solve this, we consider applying practical local differential privacy to private continuous location sharing. First, we introduce a novel definition of $(ε, δ)$ -local differential privacy to capture the temporal correlations between locations. Second, we present a generalized randomized response mechanism to achieve $(ε, δ)$ -local differential privacy for location privacy preservation, which obtains the upper bound of error, and serve it as the basic building block to design a unified private continuous location sharing framework with an untrusted server. Finally, we conduct experiments on the real-world Geolife dataset to evaluate our framework. The results show that generalized randomized response significantly outperforms planar isotropic mechanism in the context of utility.

Keywords

Location privacy continuous location sharing temporal correlation local differential privacy randomized response

Introduction

With the development of fifth-generation (5G) wireless communication technology, 5G-enabled Internet of Things (5G IoT) with ubiquitous connection is expected to implement various applications.¹ Benefited from connections and mobility of 5G IoT devices, location-based applications have been significantly increasing by the growing popularity of 5G IoT devices (e.g. smartphones and self-driving vehicles) with localization capabilities. In order to enjoy relevant services, users have to contribute their locations to the corresponding service providers or third parties. For example, if one wants to access services of mobile edge computing (MEC) which is a crucial part of 5G technology, then he or she needs to share his or her location information to the MEC service provider. However, location information gives rise to significant privacy issues.^2,3 Since location information itself is sensitive and it can also be easily associated with other sensitive information, this will result in user privacy leakage.

In a recent decade, a large body of location privacy preservation mechanisms (LPPMs)^4,5 have been emerged in the setting of location-based service (LBS) or location sharing where a user shares his or her location to an unknown/untrusted server or other party to enjoy some services while guaranteeing location privacy of the user. One solution is cryptography-based LPPM to prevent revealing individual locations.⁶ However, such method tends to be computationally expensive. Most LPPMs have concentrated on location obfuscation that transforms the real location of individual to a region (e.g. location generalization)⁷ or a perturbed location (e.g. location perturbation).⁸ The commonly used location obfuscation methods depend on syntactic privacy models (e.g. k-anonymity); however, they do not provide stringent and provable privacy guarantee. Some of them only take into account perturbing the location at a single timestamp without considering the temporal correlations of a user’s locations. Therefore, the proposed methods are vulnerable to various inference attacks.

Differential privacy⁹ is a state-of-the-art, stringent and provable privacy notion irrelevant to an adversary’s background knowledge and computing power. There are a large number of works adopting differential privacy to location privacy preservation.^1,10 Most of them, for example, Al-Hussaeni et al.,¹¹ Qardaji et al.,¹² and Fanaeepour and Rubinstein,¹³ have applied the traditional differential privacy (or centralized differential privacy, CDP)⁹ on location or trace data for data publishing or aggregation with a trusted server. As privacy matters are raised frequently, local differential privacy (LDP)¹⁴ has been considered in many application fields due to its more stringent and practical privacy. In LDP model, a user does not trust anybody else and requires that his or her own data be moderately protected before sharing them. In other words, the user’s exact data only allowed to be accessed by himself or herself. From the perspective of model, LDP is more suitable for the setting of private location sharing. Randomized response (RR) is the principal mechanism of LDP and suitable for perturbing classified data (e.g. discrete location data).

Adopting LDP for location privacy preservation is still in its infancy. Recently, few works^15,16 have applied LDP on location data for data aggregation and without considering the temporal correlation between locations. More previous works, for example, Andrés et al.⁸ and Bordenabe et al.,¹⁷ have adopted LDP model in the setting of private location sharing. Andrés et al.⁸ proposed a novel privacy notion of geo-indistinguishability for LBSs, which ensures that any two locations in a circle are geographically close. In fact, geo-indistinguishability is a special notion of event-level differential privacy, and the neighborhood is defined with Euclidean distance. It is implemented by planar Laplacian mechanism to inject noise drawn from a two-dimensional Laplace distribution into a real location. Bordenabe et al.¹⁷ proposed an optimal geo-indistinguishability mechanism to improve the utility by linear programming techniques. However, the notion of geo-indistinguishability is vulnerable to inference attacks if an adversary can learn the correlation between locations over time.

Motivations and contributions

In this article, we study the problem of continuous private location sharing under LDP. As shown in Figure 1, a user who has a set of continuous sensitive locations generated by his or her mobile device shares on-the-fly the locations with an untrusted server to enjoy LBSs, while guaranteeing his or her location privacy at any timestamp. A user’s real locations are only known by himself. The perturbed locations generated by the privacy preservation mechanisms are visible to the untrusted server and adversaries. To this end, we apply LDP to private continuous location sharing under temporal correlations, which cannot be hided from adversaries and hence are assumed to be public. Compared to the existing most similar work by Xiao and Xiong¹⁸ which proposed a traditional differential privacy–based location privacy preservation framework for continuous location sharing, our main contributions are summarized as follows.

Figure 1.

Continuous private location sharing.

First, we propose a novel definition of $(ε, δ)$ -LDP to protect the real location at each timestamp, different from the $δ$ -location set–based differential privacy definition.¹⁸ Motivated by the definition of $(ε, τ)$ -practical local differential privacy ( $(ε, τ)$ -PLDP)¹⁵ where the “possible set” based on safe region only considers spatial correlation and the real location is obfuscated in other locations within safe region, and by the notion of $δ$ -location set¹⁸ based on Markov model that can capture all possible locations under temporal correlation, we consider incorporating $δ$ -location set into LDP to protect location privacy under the temporal correlation. Accordingly, we propose a novel definition of $(ε, δ)$ -LDP to capture the temporal correlations for private continuous location sharing, where the parameter $δ$ is used to construct the “possible set” of LDP, rather than the “neighborhood databases” of traditional differential privacy that used to determine the sensitivity hull.¹⁸

Second, we present an efficient generalized randomized response (GRR) to achieve $(ε, δ)$ -LDP for location perturbation, and serve it as the basic building block to design a unified continuous private location sharing framework with an untrusted server. In addition, we theoretically analyze privacy and utility guarantees of the mechanism, and the analysis results show that GRR for our location setting can achieve the upper bound of error, rather than the lower bound of error of planar isotropic mechanism (PIM).¹⁸

Third, we experimentally evaluate the performance of GRR over real-world datasets through the continuous location sharing framework and the results show that GRR provides superior location utility, compared with PIM.

Related work

Location privacy

A large number of works have been conducted on preserving location privacy. In general, location privacy preservation method can be classified into three categories: cryptography, anonymization, and obfuscation.¹⁹ One of the most widely used definitions of location privacy is the notion of k-anonymity. Gruteser and Grunwald²⁰ first introduced of k-anonymity into location privacy, and the notion has been widely employed in numerous works to preserve the location privacy. The key idea is how to construct an anonymity set and a user reports an anonymity set consisting of at least k users instead of his real location. Therefore, an adversary cannot distinguish a user from at least k–1 other users. However, the main drawback of location privacy preservation methods based on k-anonymity and its variants (e.g. mix zone) is that it does not always provide stringent and provable privacy preservation. Most of them do not consider the temporal correlation among locations and are vulnerable to multifarious inference attacks.

The other significant solution for preserving location privacy is location obfuscation (e.g. spatial cloaking), which transforms and maps a user’s real location to a region or one or more perturbed location. Andrés et al.⁸ proposed a novel notion of geo-indistinguishability for location privacy, which protects a location within a small radius and requires the closer any location pairwise are, the more indistinguishable they are. Nevertheless, such location perturbation mechanism may not be rational in the setting of continuous location privacy due to not considering the correlation among locations. To prevent loss of privacy due to the correlation between locations in the trace, Chatzikokolakis et al.²¹ proposed a predictive differential privacy mechanism to reduce privacy budget consumption rate for trace obfuscation. Similarly, Ma et al.²² proposed an AGENT mechanism for continuous location privacy preservation by introducing R-tree to achieve the reusability of previously perturbed locations. The two mechanisms satisfy the notion of geo-indistinguishability and make use of the previously reported locations to save privacy budget for obfuscated trace. However, the correlation of the two works considered is only used to reduce the privacy budget for improving the utility, not for preventing inference attacks.

Many works have adopted Markov model for modeling users’ mobility and reasoning their locations or traces. Ardagna et al.²³ proposed an approach for preserving sensitive location information in continuous LBSs. The approach first modeled the characteristics of a user’s moving preference by Markov chain and then developed a simple obfuscation method based on his characteristics. Shokri et al.²⁴ presented a quantifying location privacy framework, which provides a Markov model for reconstructing prior knowledge (user mobility) of an adversary to be used in various attacks. In the most similar work¹⁸ to ours, Xiao et al. proposed a PIM for single-user location sharing under temporal correlations modeled by Markov Chain. Hence, we consider adopting the commonly used Markov model for modeling users’ mobility in our work. In fact, PIM is similar to Laplace mechanism and suitable for perturbing numerical data. Nevertheless, since Markov model is a discrete state model and RR is a perturbation mechanism for categorical (discrete) data, PIM based on noise addition will bring more loss of utility than RR in modeling users’ mobility based on Markov model.

Local differential privacy

Differential privacy is the state-of-the-art, stringent and provable privacy technique. It was initially proposed to protect aggregated statistics in the trusted server setting, which is regarded as centralized differential privacy (CDP). But recent works have been focusing on LDP with stronger privacy preservation, since it requires that a user locally perturbs his data and sends the perturbed data to an untrusted server. In recent years, Google has shown its applicability through RAPPOR,²⁵ which is a practical LDP solution for protecting users’ settings in the browser. Hereafter, Apple²⁶ and Microsoft²⁷ have also applied LDP to iOS10 and Windows 10, respectively. LDP has become the most promising privacy preservation technique.

Different from CDP, which basically uses Laplace mechanism and exponential mechanism, LDP mainly adopts randomized response (W-RR)²⁸ as the fundamental perturbation mechanism, which is first proposed by Warner in 1965. However, since the W-RR is only suitable for binary variables, Kariouz et al.²⁹ proposed a staircase mechanism, also known as k-randomized response (K-RR), which can be used for multiple variables $(d > 2)$ . In fact, W-RR is a special case of K-RR $(d = 2)$ . In order to distinguish from the precious notion, Wang et al.³⁰ generalized W-RR and K-RR into GRR, where k is greater or equal to 2 $(d \geq 2)$ . In their other work, Wang et al.³¹ conducted a set of analyses on the LDP mechanisms for frequency estimation problem. This work enables us to select optimal mechanism and corresponding parameters for applications based on LDP, and shows that when possible domain size $d < 3 e^{ε} + 2$ , GRR is optimal for use than other comparative mechanisms.

Most works of LDP, for example, Kairouz et al.,²⁹ Bassily and Smith,³² and Nguyên et al.,³³ have concentrated on the study of locally differentially private simple data, namely, single numeric or categorical attribute. Recently, some researches have applied LDP for preserving more complex data. Qin et al.³⁴ have applied LDP on set-valued data which is a set of items (values) for heavy hitter estimation. Wang et al.³⁵ proposed an optimized LDP mechanism for set-valued data aggregation. Several recent works have adopted LDP to protect location privacy. Chen et al.¹⁵ first applied LDP to solve the private location (spatial) data aggregation problem. Sei and Ohsuga¹⁶ proposed a Bayesian-based multiple dummies method against untrusted server, which satisfies LDP. The method can also be used to protect single location privacy. Our contribution is to extend LDP to the setting of continuous location sharing for single user whose locations are temporally correlated. To our best knowledge, we first apply LDP to continuous location sharing. We will achieve higher utility in practice while guaranteeing location privacy of users over time.

Preliminaries and privacy definition

We use normal letters, bold lowercase letters, and bold uppercase letters to denote scalar variables, vectors, and matrices, respectively. The summary of some significant symbol notations is shown in Table 1.

Table 1.

Summary of some significant symbol notations.

Notation	Description
$c_{i}$	a cell in a divided map, $i = 1, \dots, m$
$s, l$	location in state and map coordinates
$s^{}, l^{}$	real location in state and map coordinates
$z$	perturbed location in state coordinate
$p_{t}^{-}, p_{t}^{+}$	prior and posterior probability
$Δ S, \| Δ S \|$	$δ$ -location set and its size
$M$	public and personal transition matrix

In order to represent a location for Markov model, we use state coordinate system. Space domain is denoted by $C = {c_{1}, \dots, c_{m}}$ , where $c_{i}$ is denoted by a cell, and these cells are the finest granularity generated by dividing $C$ . $c_{i}$ is an m-dimensional unit vector with the ith element being 1 and others being 0. Each cell can represent a state (location). On the contrary, the geometric space is often seen as a map with latitude and longitude, and then a location l is represented by a two-dimensional vector with two elements l [1] and l [2]. An example for such two coordinate systems is shown in Figure 2. Assuming that a location is in $c_{9}$ , its two coordinates are shown as follows

\begin{matrix} s = c_{9} = [\begin{matrix} 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & \dots & 0 \end{matrix}] \\ l = [\begin{matrix} 3 & 2 \end{matrix}] with l [1] = 3 and l [2] = 2 \end{matrix}

Figure 2.

Coordinate systems.

Noting that they can be interconverted, we skip the interconverted process. Over time, a trajectory can be represented by a bunch of locations, $l_{1}, \dots, l_{t}$ in map coordinate or $s_{1}, \dots, s_{t}$ in state coordinate.

Mobility and inference model

In our problem setting, a user’s real locations are only observable by himself, but not by others. The perturbed locations generated by the privacy mechanism are observable to the service provider who might be untrusted. From the perspective of an adversary, the above-mentioned process is a Hidden Markov Model (HMM).

Assume that a vector $p_{t}$ is denoted as the probability distribution of a user’s location at timestamp t. Formally

p_{t} [i] = \Pr (s_{t}^{*} = c_{i})

where $p_{t} [i]$ is the ith element in $p_{t}$ and $c_{i} \in C$ .

Transition probability

$M$ is the state transition probability matrix, where the element $m_{ij}$ is denoted as the transition probability that a user moves from location $l_{i}$ to location $l_{j}$ . Given probability vector $p_{t - 1}$ , the probability at timestamp t transforms $p_{t} = p_{t - 1} M$ . We suppose that the transition matrix $M$ is given in our work.

Emission probability

Given a real location $s_{t}^{*}$ , a mechanism generates perturbed location $z_{t}$ , and the probability $\Pr (z_{t} | s_{t}^{*} = c_{i})$ is referred to as emission probability in HMM, which is used to establish the relationship between states and observations. It is noting that the probability is decided by the obfuscation mechanism.

Inference and evolution

$p_{t}^{-}$ and $p_{t}^{+}$ are denoted by the prior and posterior probability of a user location at timestamp t before and after observing the perturbed z_t respectively. At the current timestamp t, the prior probability can be computed by the posterior probability at previous timestamp t–1 and the state transition matrix of the model, that is $p_{t}^{-} = p_{t - 1}^{+} M$ . According to Bayesian rule, given z_t, the posterior probability can be computed as follows. For each cell $c_{i}$

p_{t}^{+} [i] = \Pr (s_{t}^{*} = c_{i} | z_{t}) = \frac{\Pr (z_{t} | s_{t}^{*} = c_{i}) p_{t}^{-} [i]}{\sum_{j} \Pr (z_{t} | s_{t}^{*} = c_{j}) p_{t}^{-} [j]}

(1)

The inference process at each timestamp can be availably derived by forward–backward algorithm in HMM.

Combining the characteristic of LDP model, we consider two different types of M in the light of data source which is used to construct the inference model: one is Public M which is derived with public data and the other is Personal M which is learned from personal data. Nevertheless, different inference models (adversaries) may take different effects on the utility of shared locations. We will also contrast the two inference models in later experiments.

LDP and RR mechanism

The LDP was proposed for the local setting in which there is an untrusted server which is not allowed to access the private data. In general, users only share their information to service provider for enjoying the corresponding services. And yet, users do not trust anyone except themselves and tend to guarantee their privacy at the root that the shared information has been properly sanitized before sharing it themselves. LDP requires that no matter which data value an individual user possesses, the data collector (e.g. an untrusted server) should get almost identical information. Therefore, an adversary with any background knowledge cannot distinguish the individual real value by accessing to the sanitized information. Formally, the definition of LDP is given below.

Definition 1: (Local differential privacy)

A randomized algorithm is $ε$ -LDP, where $ε \geq 0$ , if and only if any pair of values $l, l' \in L$ , for all $O \subseteq Range (A)$

\Pr [A (l) \in O] \leq e^{ε} \cdot \Pr [A (l') \in O]

where $Range (A)$ denotes the set of all possible outputs of algorithm $A$ and the probability is over the coin flips of $A$ .

LDP provides stringent privacy preservation in local setting, but its utility is bounded by the privacy domain. If the privacy domain size $| L |$ is large, it is difficult in the trade-off between privacy and utility under LDP. A prominent nature of LDP is that a user can take full control of his or her privacy by independently perturbing data to a certain range that meets his or her own privacy preference.

Randomized response

The RR is the fundamental mechanism to achieve LDP. We present the RR by a simple example that is given below. Given a binary attribute, one report a real value with probability $p$ and the flip of unreal value with probability $1 - p$ . This satisfies $\ln (p / (1 - p))$ -LDP.

Composability

In our problem setting, we need to only release one perturbed location at a timestamp, so the sequential composition property is unavailable. However, for multiple releases at a timestamp, the composition property is applicable. In addition, privacy guarantee for the whole trajectory (e.g. release a set of perturbed locations ${z_{1}, \dots, z_{t}}$ from timestamp 1 to t) is also not considered.

$δ$ -location set

To capture the temporal correlations between locations, we introduce the concept of $δ$ -location set to obtain a user’s possible locations at each timestamp. At any timestamp $t$ , a prior probability of the user’s current location is denoted by $p_{t}^{-} [i] = \Pr (s_{t}^{*} | z_{t - 1}, \dots, z_{1})$ . And we set a parameter $δ$ to derive the possible locations of the user; $δ$ -location set $Δ S_{t}$ is a set including minimum number of locations that have prior probability sum no less than $1 - δ$

Δ S_{t} = \min {c_{i} | \sum_{c_{i}} p_{t}^{-} [i] \geq 1 - δ}

For instance, assuming that a prior probability $p_{t}^{-} = [0.2, 0.3, 0.1, 0.02, 0.03, 0.35]$ with corresponding to $[c_{1}, c_{2}, c_{3}, c_{4}, c_{5}, c_{6}]$ , if $δ = 0.2$ , then $Δ S = {c_{6}, c_{2}, c_{1}, c_{3}}$ ; if $δ = 0.06$ , then $Δ S = {c_{6}, c_{2}, c_{1}}$ . Noting that if $δ = 0$ , then $Δ S$ includes all possible locations.

Privacy definition

The nature of $(ε, τ)$ -PLDP¹⁵ is to “hide” a real location in a safe region so that an adversary is unable to distinguish the real location from any other locations in the safe region $τ$ . This is only applicable for spatial correlation problem, rather than spatial-temporal correlation we consider, since temporal correlation based on a user’s moving patterns in practice is not considered in the construction of safe region. So, it is vulnerable to inference attacks by adversary with background knowledge. In addition, the size of safe region is critical to data utility. Motivated by an a notion of $δ$ -location set which reflects a set of possible locations the user might frequently appear (by filtering out the locations with small probability), we consider incorporating $δ$ -location set into LDP and propose a novel definition of LDP for continuous private location sharing setting. We define the LDP based on $δ$ -location set so that an adversary cannot discriminate the real location from these locations in the $δ$ -location set.

Definition 2: ( $(ε, δ)$ -Local differential privacy)

At any timestamp t, a randomized mechanism $A$ satisfies $(ε_{t}, δ_{t})$ -differential privacy on $δ$ -location set $Δ S_{t}$ , if for any output $z_{t} \in Range (A)$ and any locations $l_{t}, l_{t}^{'} \in Δ S_{t}$ , the following holds

\Pr (A (l_{t}) = z_{t}) \leq e^{ε_{t}} \cdot \Pr (A (l_{t}^{'}) = z_{t})

The above definition makes the real location protected within the $δ_{t}$ -location set (temporal safe region) at each timestamp. To be specific, the perturbed location $z_{t}$ is LDP at timestamp t for the setting of continuous private location sharing. For other application settings, such as preserving individual trajectory, we will investigate the setting in future work.

Proposed framework

GRR mechanism

The GRR mechanism is a general version of the RR mechanism. In a particular case where the value is binary, that is $d = | D | = 2$ , $A_{GRR} (v)$ keeps the value unchanged with probability $e^{ε} / (e^{ε} + 1)$ and turns it with probability $1 / (e^{ε} + 1)$ . In the general case where the value is multi-ary, namely, $d = | D | > 2$ , the perturbed mechanism is defined as follows

\Pr [A_{GRR} (v) = z] = {\begin{matrix} \frac{e^{ε}}{e^{ε} + d - 1}, if z = v \\ \frac{1}{e^{ε} + d - 1}, if z \neq v \end{matrix}

Algorithm 1 shows the pseudocodes of GRR mechanism for location. Given privacy budget $ε$ , a real location $l^{*}$ , and location set $Δ S$ , the algorithm returns a perturbed location z that belongs to $Δ S$ . Specifically, the perturbed location is the real location with the probability $e^{ε} / (e^{ε} + | Δ S | - 1)$ ; otherwise, the perturbed location is selected uniformly at random from the set ${Δ S / l^{*}}$ . In other words, the perturbed location is the dummy location with the probability $1 / (e^{ε} + | Δ S | - 1)$ . The following theorem provides the theoretical guarantee of Algorithm 1.

Algorithm 1. Generalized Randomized Response (GRR).
Input: $ε, l^{}, Δ S$ Output: $z$ 1: $b ~$ Bern( $e^{ε} / (e^{ε} + \| Δ S \| - 1)$ );2: if $b = 1$ then3: $z = l^{}$ ;4: else5: $z ~$ Uniform $({Δ S / l^{*}})$ ;6: end if7: return $z$ ;

Theorem 1

Algorithm 1 satisfies $(ε, δ)$ -LDP, where the parameter $δ$ determines the location set $Δ S$ .

Proof

$δ$ -location set is a set of locations that a user often visits. Assume that the $δ$ -location set is denoted by $Δ S$ . For any locations $l, l' \in Δ S$ and output $z$ , we have

\frac{\Pr [A_{GRR} (l) = z]}{\Pr [A_{GRR} (l') = z]} \leq \frac{p}{q} = \frac{\frac{e^{ε}}{(e^{ε} + | Δ S | - 1)}}{\frac{1}{(e^{ε} + | Δ S | - 1)}} = e^{ε}

This completes the proof.

Continuous private location sharing framework

Location obfuscation algorithm is shown in Algorithm 2. At any timestamp t, we compute the prior probability vector $p_{t}^{-}$ . If the location needs to be shared, we construct a $δ$ -location set $Δ S_{t}$ . Then, a local differential private mechanism can be adopted to release a perturbed location $z_{t}$ . Meanwhile, the perturbed $z_{t}$ will also be used to update the posterior probability $p_{t}^{+}$ which will be used to compute the prior probability for the next timestamp $t + 1$ . Subsequently, at timestamp $t + 1$ , the above process is repeated.

Algorithm 2. Location Obfuscation Algorithm.
Input: $ε_{t}, δ_{t}, M, p_{t - 1}^{+}, l_{t}^{}$ Output: $p_{t}^{+}, z_{t}$ 1: $p_{t}^{-} = p_{t - 1}^{+} M$ ;2: if the current location needs to be shared then3: Computing $Δ S_{t}$ based on $p_{t}^{-}$ ;4: if $l_{t}^{} \notin Δ S_{t}$ then5: Adding $l_{t}^{}$ to $Δ S_{t}$ ;6: end if7: $z_{t} = GRR (l_{t}^{}, Δ S_{t}, ε_{t})$ ;8: Computing posterior probability $p_{t}^{+}$ by formula (1);9: return $p_{t}^{+}, z_{t}$ 10: end if

Algorithm 2. Location Obfuscation Algorithm.

Input:

ε_{t}, δ_{t}, M, p_{t - 1}^{+}, l_{t}^{*}

Output:

p_{t}^{+}, z_{t}

p_{t}^{-} = p_{t - 1}^{+} M

;2: if the current location needs to be shared then3: Computing

Δ S_{t}

based on

p_{t}^{-}

;4: if

l_{t}^{*} \notin Δ S_{t}

then5: Adding

l_{t}^{*}

Δ S_{t}

;6: end if7:

z_{t} = GRR (l_{t}^{*}, Δ S_{t}, ε_{t})

;8: Computing posterior probability

p_{t}^{+}

by formula (1);9: return

p_{t}^{+}, z_{t}

10: end if

Since the drawback of $δ$ -location set is that the real location with small probability may be excluded, this leads to a “drift” phenomenon which may also occur when the Markov model is not accurate enough in practice in virtue of its limited predictability. To solve this problem, the “surrogate” approach with real location is applied in our method, instead of the “surrogate” approach¹⁸ that the surrogate location is the cell in $Δ S$ with the shortest distance to the real location. This will achieve better utility while still guaranteeing stringent privacy.

Note that the surrogate method does not leak any information of the real location. Because $l^{*}$ is always included in $Δ S$ , and then $l^{*}$ is protected in $Δ S$ , we formally prove the privacy guarantee in Theorem 2.

Theorem 2

At any timestamp t, Algorithm 2 satisfies $(ε_{t}, δ_{t})$ -LDP.

Proof

In our method, although a drift happens, the real location is still in $Δ S_{t}$ . According to Theorem 1, $z_{t}$ satisfies $ε_{t}$ -LDP. Therefore, Algorithm 2 is $(ε_{t}, δ_{t})$ -LDP.

The above algorithm is single private location sharing. Here, we show the framework of continuous private location sharing in Algorithm 3. To enable the framework, we first initialize the initial prior probability $p_{1}^{-}$ and posterior probability $p_{1}^{+}$ . Assume that the $p_{1}^{-} = p_{1}^{+} = 1$ and the starting location $(t = 1)$ is not shared. Then, at each timestamp $(t > 1)$ , we run Algorithm 1 to share privately the current location.

Algorithm 3. Continuous Private Location Sharing Framework.
require: $ε, δ, M, l^{}$ ;1: initialize prior and posterior probability $p_{1}^{-} = p_{1}^{+} = 1$ ;2: for $t = 2 : T$ 3: $p_{t}^{+}, z_{t}$ = Algorithm 2( $ε_{t}, δ_{t}, M, p_{t - 1}^{+}, l_{t}^{}$ );4: end for

Noting that the two $(ε_{t}, δ_{t})$ privacy parameters are specified by a user, and the user’s location privacy is fully controlled by the user over time, the protection can be enforced at any timestamp. The input parameters $ε, δ$ are vectors of these corresponding parameters over the whole timestamps. For simplicity, we assume the parameters $ε_{t}, δ_{t}$ at each timestamp are same.

Privacy and performance analysis

The privacy analysis is presented in Theorem 2. We will present the utility of GRR.

Theorem 3

The upper bound of error for Algorithm 2 is O(Diam( $Δ S$ )), where Diam( $Δ S$ ) is the diameter of $Δ S$ .

Proof

$Δ S$ is a point set, which is a set of two or more points. If its size is two, the error is the distance between two points; if the size is more than two, the error is the diameter of its convex hull.

Moving model construction and location inference

We discuss the construction (learning) of Markov model. Maximum likelihood estimation and expectation maximization method in HMM can be adopted to obtain the transition matrix. However, depending on the power of adversaries, two typical $M$ can be learned: (1) Public $M$ can be learned from public data and (2) Personal $M$ can be derived with personal data. For simplicity, we consider the maximum likelihood estimation method to estimate the matrix. The public matrix is regarded as public background knowledge, for example, road network. The personal matrix is seen as the personal background knowledge, for example, user’s moving pattern. The knowledge is transparent for an adversary. However, the utility of perturbed locations may vary with different background knowledge of the adversary. We will compare the two models (two different background knowledge) in later experimental evaluations.

Inference methods rest with specific perturbed algorithms. We implement the inference for GRR in the light of the formula (1) and calculate the probability $\Pr (z_{t} | s_{t}^{*} = c_{i})$ as follows

\Pr (z_{t} | s_{t}^{*} = c_{i}) = {\begin{matrix} \frac{e^{ε_{i}}}{e^{ε_{i}} + | Δ S | - 1}, if z_{t} = c_{i} \\ \frac{1}{e^{ε_{i}} + | Δ S | - 1}, if z_{t} \neq c_{i} \end{matrix}

Experimental evaluation

The experimental evaluation is described in this section. All algorithms are implemented in the MATLAB R2018a on a PC with 3.6 GHz Intel Core i3 CPU and 8 GB RAM. We compare the performance of GRR with PIM that is the most relevant to our work.

Experimental setup

Datasets

We use a public real-world Geolife dataset³⁶ as the experimental dataset. The dataset was gathered from 182 users during the 3 years. It recorded a wide range of users’ mobility behaviors, represented by a batch of records including timestamp, latitude, and longitude. The locations data were updated at relatively high frequency, ranging from 1 to 60 s. We extract all trajectories within roughly third ring of Beijing (116.3017E~116.4577E, 39.848N~39.9680N) to train the Markov model. Note that the area is divided into cells of $0.34 \times 0.34 k m^{2}$ , namely, $50 \times 50$ state cells.

Metrics

We evaluate the performance of GRR and PIM in following metrics. On the one hand, since our privacy definition is related to privacy budget $ε$ and $δ$ -location set $Δ S$ , we evaluate the size of $Δ S$ to learn how $Δ S$ changes. On the other hand, we conduct the utility metric by measuring the distance between the perturbed location and the real location, namely, $d (x_{i} - {\tilde{x}}_{i})$ which can be independent of specific location-based applications. And by measuring the average distance between locations in real trace and perturbed trace, namely, $dis (X, \tilde{X}) = \sum_{i} d (x_{i} - {\tilde{x}}_{i}) / | X |$ , where $X$ and $\tilde{X}$ are the real trace and perturbed trace, respectively.

Evaluation results

1. Performance over time

In order to present the performance of a perturbed mechanism while a user moves over time, including how $Δ S$ changes and how the perturbed location is used, we conduct a batch of experiments to evaluate these performances for a single trajectory with personal matrix learned from a user who has a number of trajectories and public matrix learned from all users. We randomly choose a test trajectory from the user’s trajectories, which consists of 1405 timestamps. The real trace in the form of state and map coordinates is shown in Figure 3.

Figure 3.

Real trace.

We evaluate both GRR and PIM at each timestamp with $ε = 1$ and $δ = 0.01$ under the public and personal matrix, respectively. We run each mechanism 10 times, and give the average. Figure 4(a) and (b) as well as Figure 5(a) and (b) show separately the perturbed locations at each timestamp with public and personal matrix. We can perceive that the perturbed locations generated by GRR are closer to the real location, compared with PIM.

Figure 4.

Performance over time under public matrix: (a) PIM perturbed trace, (b) GRR perturbed trace, (c) size of $Δ S$ , and (d) distance.

Figure 5.

Performance over time under personal matrix: (a) PIM perturbed trace, (b) GRR perturbed trace, (c) size of $Δ S$ , and (d) distance.

Size of $Δ S$ over time with public and personal matrix is shown in Figures 4(c) and 5(c), respectively. First, we can figure out the size of $Δ S$ changes dramatically. The reason is that selecting the $δ$ -location set by the inference mechanism will change not only the probabilities of location in $Δ S$ , but also the probabilities of other locations. However, the size of $Δ S$ is seen as stable, since the change is almost 4 at overall timestamps. Second, we can see that the size of $Δ S$ for PIM and GRR is quite close. This can explain the effect of parameter $δ$ on size of $Δ S$ for PIM and GRR is almost the same.

Distance over time with public and personal matrix is shown in Figures 4(d) and 5(d), respectively, from which we can figure out that GRR provides better utility than PIM. The distance between perturbed location and real location over time bounds within a few hundred meters.

In order to present the GRR perturbed trace under different parameter $δ$ , we run a comparative experiment with $δ = 0.01$ and $δ = 0.005$ under the public matrix. The default value is $ε = 1$ . In Figure 6, we can see a higher value of $δ$ will bring better utility. However, the value $δ$ cannot be as high as possible. The detailed analysis is presented in the following parameter influence.

2. Parameters influence

Figure 6.

GRR perturbed trace with public matrix under different parameter $δ$ : (a) $δ = 0.01$ and (b) $δ = 0.005 .$

We further evaluate the overall performance and parameters influence. Each evaluation runs 10 times and the average is given. The performances are shown in Figure 7 (public matrix) and Figure 8 (personal matrix). The default parameters values are $ε = 1$ and $δ = 0.01$ , if not given.

$ε$ versus size of $Δ S$ and utility. We evaluate the relationship between $ε$ and size of $Δ S$ , as well as average distance by comparing PIM and GRR. Figure 7(a) and (b) as well as Figure 8(a) and (b) show changes in terms of size of $Δ S$ and average distance with $ε$ , respectively. Note that size of $Δ S$ and the distance are average on the overall trace. Figures 7(a) and 8(a) show size of $Δ S$ shrinks as $ε$ grows larger. Since the degree of privacy preservation is inversely proportional to $ε$ , larger $ε$ will further enhance inference result. Figures 7(b) and 8(b) show the distance changes with varying $ε$ . We can figure out that GRR is better than PIM in terms of average distance (utility).

$δ$ versus size of $Δ S$ and utility. We evaluate the relationship between $δ$ and size of $Δ S$ , as well as average distance by comparing PIM and GRR. Figure 7(c) and (d) as well as Figure 8(c) and (d) show changes in terms of size of $Δ S$ and average distance with $ε$ , respectively. In general, size of $Δ S$ is mainly determined by $δ$ . We can figure out that PIM and GRR have similar size of $Δ S$ , which indicates the real location is hidden in candidate location sets with approximate size. Figures 7(c) and 8(c) show the size of $Δ S$ reduces with larger $δ$ , so it is also inversely proportional to $δ$ . The reason is that more impossible locations are filtered out. However, $δ$ cannot be too large since the perturbed mechanism hardly protects privacy if size of $Δ S$ is equal to 1. Hence, to guarantee trade-off of privacy and utility, we consider set $δ = 0.01$ so that the size of $Δ S$ for PIM and GRR are larger than 2 and 4, respectively. Figures 7(d) and 8(d) show the distance changes with varying $δ$ . We can figure out that GRR is better than PIM in terms of average distance (utility).

Different Markov model. Comparing Figure 7 on public matrix and Figure 8 on personal matrix, we can perceive the influence of different Markov model. GRR can achieve better utility than PIM in both public and personal model. And it can achieve better utility in personal model than in public model.

Figure 7.

Parameters influence under public matrix: (a) $ε$ versus size, (b) $ε$ versus average distance, (c) $δ$ versus size, and (d) $δ$ versus average distance.

Figure 8.

Parameters influence under personal matrix: (a) $ε$ versus size, (b) $ε$ versus average distance, (c) $δ$ versus size, and (d) $δ$ versus average distance.

Conclusion and future work

This article investigated locally differentially private continuous location sharing problem. Specifically, we proposed a novel definition $(ε, δ)$ -LDP to capture the temporal correlations between locations, and presented a GRR mechanism to achieve the $(ε, δ)$ -LDP and obtain the upper bound of error. We served the GRR to design a continuous private location sharing framework and conducted a set of experiments to demonstrate the GRR-based continuous location sharing framework is better than the PIM-based framework in the context of utility.

As future work directions, we will be interested in adopting LDP to general continuous data sharing and continuous location aggregation problems. In addition, we will investigate the influence on the proposed framework in other more advanced mobility model.

Footnotes

Handling Editor: Feng Ye

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work is supported by National Natural Science Foundation of China (Grant No. 61872431), Major Technical Innovation Project of Hubei (Grant No. 2018AAA046), and Applied Basic Research Project of Wuhan (Grant No. 2017060201010162).

ORCID iD

Xingxing Xiong

References

Song

Zhu

. Social-feature enabled communications among devices toward the smart IoT community. IEEE Commun Mag 2019; 57: 130–137.

Lin

Shen

. Toward edge-assisted internet of things: from security and efficiency perspectives. IEEE Network 2019; 33: 50–57.

Akpakwu

Silva

Hancke

et al. A survey on 5G networks for the internet of things: communication technologies and challenges. IEEE Access 2018; 6: 3619–3647.

Primault

Boutet

Mokhtar

et al. The long road to computational location privacy: a survey. IEEE Commun Surv Tutor. Epub ahead of print 5 October 2018. DOI: 10.1109/COMST.2018.2873950.

Bettini

. Privacy protection in location-based services: a survey. In: Gkoulalas-Divanis

Bettini

(eds) Handbook of Mobile Data Privacy. Cham: Springer, 2018, pp.73–96.

Ghinita

Kalnis

Khoshgozaran

et al. Private queries in location based services: anonymizers are not necessary. In: Proceedings of the 2008 ACM SIGMOD international conference on management of data, Vancouver, BC, Canada, 9–12 June 2008, pp.121–132. New York: ACM.

Palanisamy

. Reversecloak: protecting multi-level location privacy over road networks. In: Proceedings of the 24th ACM international on conference on information and knowledge management, Melbourne, VIC, Australia, 18–23 October 2015, pp.673–682. New York: ACM.

Andrés

Bordenabe

Chatzikokolakis

et al. Geo-indistinguishability: differential privacy for location-based systems. In: Proceedings of the 2013 ACM SIGSAC conference on computer and communications security (CCS), Berlin, 4–8 November 2013, pp.901–914. New York: ACM.

Dwork

McSherry

Nissim

et al. Calibrating noise to sensitivity in private data analysis. In: Proceedings of the theory of cryptography conference, New York, NY, 4–7 March 2006, pp.265–284. Berlin: Springer.

10.

Machanavajjhala

. Analyzing your location data with provable privacy guarantees. In: Gkoulalas-Divanis

Bettini

(eds) Handbook of mobile data privacy. Cham: Springer, 2018, pp.97–127.

11.

Al-Hussaeni

Fung

BCM

Iqbal

et al. SafePath: differentially-private publishing of passenger trajectories in transportation systems. Comput Netw 2018; 143: 126–139.

12.

Qardaji

Yang

. Differentially private grids for geospatial data. In: Proceedings of the 2013 IEEE 29th international conference on data engineering (ICDE), Brisbane, QLD, Australia, 8–12 April 2013, pp.757–768. New York: IEEE.

13.

Fanaeepour

Rubinstein

BIP

. Differentially private counting of users’ spatial regions. Know Inform Syst 2018; 54(1): 5–32.

14.

Kasiviswanathan

Lee

Nissim

et al. What can we learn privately? SIAM J Comput 2011; 40(3): 793–826.

15.

Chen

Qin

et al. Private spatial data aggregation in the local setting. In: Proceedings of the 2016 IEEE 32nd international conference on data engineering (ICDE), Helsinki, 16–20 May 2016, pp.289–300. New York: IEEE.

16.

Sei

Ohsuga

. Differential private data collection and analysis based on randomized multiple dummies for untrusted mobile crowdsensing. IEEE T Inform Foren Secur 2017; 12(4): 926–939.

17.

Bordenabe

Chatzikokolakis

Palamidessi

. Optimal geo-indistinguishable mechanisms for location privacy. In: Proceedings of the 2014 ACM SIGSAC conference on computer and communications security, Scottsdale, AZ, 3–7 November 2014, pp.251–262. New York: ACM.

18.

Xiao

Xiong

. Protecting locations with differential privacy under temporal correlations. In: Proceedings of the 22nd ACM SIGSAC conference on computer and communications security (CCS), Denver, CO, 12–16 October 2015, pp.1298–1309. New York: ACM.

19.

Liu

Zhou

Zhu

et al. Location privacy-preserving mechanisms. In: Liu

Wang

Zhu

et al. (eds) Location privacy in mobile applications. Singapore: Springer, 2018, pp.17–31.

20.

Gruteser

Grunwald

. Anonymous usage of location-based services through spatial and temporal cloaking. In: Proceedings of the 1st international conference on Mobile systems, applications and services, San Francisco, CA, 5–8 May 2003, pp.31–42. New York: ACM.

21.

Chatzikokolakis

Palamidessi

Stronati

. A predictive differentially-private mechanism for mobility traces. In: Proceedings of the international symposium on privacy enhancing technologies symposium, Amsterdam, 16–18 July 2014, pp.21–41. Cham: Springer.

22.

et al. AGENT: an adaptive geo-indistinguishable mechanism for continuous location-based service. Peer-to-peer Network Appl 2018; 11(3): 473–485.

23.

Ardagna

Livraga

Samarati

. Protecting privacy of user information in continuous location-based services. In: Proceedings of the 2012 IEEE 15th international conference on computational science and engineering, Nicosia, 5–7 December 2012, pp.162–169. New York: IEEE.

24.

Shokri

Theodorakopoulos

Le Boudec

et al. Quantifying location privacy. In: Proceedings of the 2011 IEEE symposium on security and privacy (SP), Berkeley, CA, 22–25 May 2011, pp.247–262. New York: IEEE.

25.

Erlingsson Pihur

ÚV

Korolova

. RAPPOR: randomized aggregatable privacy-preserving ordinal response. In: Proceedings of the 2014 ACM SIGSAC conference on computer and communications security, Scottsdale, AZ, 3–7 November 2014, pp.1054–1067. New York: ACM.

26.

Differential Privacy Team Apple. Learning with privacy at scale, https://www.wired.com/2016/06/apples-differential-privacy-collecting-data/ (2016, accessed 12 December 2018).

27.

Ding

Kulkarni

Yekhanin

. Collecting telemetry data privately. In: Proceedings of the advances in neural information processing systems (NIPS), Long Beach, CA, 4–9 December 2017, pp.3571–3580. New York: Curran Associates Inc.

28.

Warner

. Randomized response: a survey technique for eliminating evasive answer bias. J Am Stat Assoc 1965; 60(309): 63–69.

29.

Kairouz

Viswanath

. Extremal mechanisms for local differential privacy. In: Proceedings of the Advances in neural information processing systems (NIPS), Montréal, QC, Canada, 8–13 December 2014, pp.2879–2887. New York: Curran Associates Inc.

30.

Wang

Jha

. Locally differentially private frequent itemset mining. In: Proceedings of the 2018 IEEE symposium on security and privacy (SP), San Francisco, CA, 20–24 May 2018, pp.127–143. New York: IEEE.

31.

Wang

Blocki

et al. Locally differentially private protocols for frequency estimation. In: Proceedings of the 26th USENIX security symposium (USENIX), Vancouver, BC, Canada, 14–18 August 2017, pp.729–745. Berkeley: USENIX Association.

32.

Bassily

Local

Smith A.

, private, efficient protocols for succinct histograms. In: Proceedings of the forty-seventh annual ACM symposium on theory of computing, Portland, OR, 14–17 June 2015, pp.127–135. New York: ACM.

33.

Nguyên

Xiao

Yang

et al. Collecting and analyzing data from smart device users with local differential privacy, https://arxiv.org/abs/1606.05053

34.

Qin

Yang

et al. Heavy hitter estimation over set-valued data with local differential privacy. In: Proceedings of the 2016 ACM SIGSAC conference on computer and communications security, Vienna, 24–28 October 2016, pp.192–203. New York: ACM.

35.

Wang

Huang

Nie

et al. PrivSet: set-valued data analyses with locale differential privacy. In: Proceedings of the IEEE INFOCOM 2018-IEEE conference on computer communications, Honolulu, HI, 16–19 April 2018, pp.1088–1096. New York: IEEE.

36.

Zheng

Xie

. Geolife: a collaborative social networking service among user, location and trajectory. IEEE Data Eng Bulletin 2010; 33(2): 32–39.

Locally differentially private continuous location sharing with randomized response

Abstract

Keywords

Introduction

Motivations and contributions

Related work

Location privacy

Local differential privacy

Preliminaries and privacy definition

Mobility and inference model

Transition probability

Emission probability

Inference and evolution

LDP and RR mechanism

Definition 1: (Local differential privacy)

Randomized response

Composability

δ -location set

Privacy definition

Definition 2: ( ( ε , δ ) -Local differential privacy)

Proposed framework

GRR mechanism

Theorem 1

Proof

Continuous private location sharing framework

Theorem 2

Proof

Privacy and performance analysis

Theorem 3

Proof

Moving model construction and location inference

Experimental evaluation

Experimental setup

Datasets

Metrics

Evaluation results

Conclusion and future work

Footnotes

Declaration of conflicting interests

Funding

ORCID iD

References

$δ$ -location set

Definition 2: ( $(ε, δ)$ -Local differential privacy)