A Game Theory-Based Analysis of Data Privacy in Vehicular Sensor Networks

Abstract

Mobile traces, collected by vehicular sensor networks (VSNs), facilitate various business applications and services. However, the traces can be used to trace and identify drivers or passengers, which raise significant privacy concerns. Existing privacy protecting techniques may not be suitable, due to their inadequate considerations for the data accuracy requirements of different applications and the adversary's knowledge and strategies. In this paper, we analyze data privacy issues in VSNs with a game theoretic model, where a defender uses the privacy protecting techniques against the attack strategies implemented by an adversary. We study both the passive and active attack scenarios, and in each scenario we consider the effect of different data accuracy requirements on the performance of defense measures. Through the analysis results on real-world traffic data, we show that more inserted bogus traces or deleted recorded samples show a better performance when the cost of defense measures is small, whereas doing nothing becomes the best strategy when the cost of defense measures is very large. In addition, we present the optimal defense strategy that provides the defender with the maximum utility when the adversary implements the optimal attack strategy.

1. Introduction

With the advances and wide adoption of wireless communication technologies, vehicles are now often equipped with wireless devices that allow them to communicate with each other (V2V) as well as with roadside infrastructures (V2I). The V2V and V2I communications make driving more safe and improve a driver's driving experiences. Such communication networks are called Vehicular Ad Hoc Networks (VANETs). However, with the increasing needs for sensing and data acquisition in cities, VANETs have turned into Vehicular Sensor Networks (VSNs) [1]. VSNs exploit vehicles and passengers to capture the occurrence of events, such as traffic volume, road surface condition, chemical, and radiation. The location traces in the traffic-related data create various fresh new business applications and services, such as map drawing [2], traffic prediction [3], city planning, and mobile network analysis [4].

However, the places in these location traces that a driver or passenger has visited may reveal his/her sensitive information, such as traffic law violations, political affiliations, and medical conditions [5, 6]. Although the information about vehicular mobility traces are often collected in an anonymous way, an adversary can reidentify the true owner of a trace. Because the location information of drivers and passengers can be openly observed in public places, and also can be disclosed voluntarily or inadvertently by themselves, such as a casual conversation, or published media such as news articles or web blogs [4]. The adversary who has partial knowledge of the whereabouts of drivers or passengers (which are called victims), can infer the traces’ true owners with high probability by using vehicular mobility constraints and spatiotemporal correlation [4, 7].

To reduce spatiotemporal correlation, some frequently proposed privacy protecting techniques suggest reducing the resolution of the recorded data [8, 9] or introducing noise in the data [10–12]. However, these techniques may not be suitable for privacy preserving in VSNs due to their inadequate consideration for the adversary's knowledge and its attack strategies. On the other hand, these techniques can not meet the different data accuracy requirements of different applications and services [13]. To address these challenges, we must first analyze the effect of the knowledge and attack strategies and the different accuracy requirements on the performance of defense strategies.

In this paper, we use a game-theoretic model to study the effect of the adversary's knowledge and strategies and the data accuracy requirements on the performance of defense measures. More specifically, we first present location privacy issues in VSNs, including the ability and goal of the adversary and defender. Then, we define a game theoretic model the attack and defense game which models the strategy selection decision behavior of the adversary and defender. In this game, the adversary implements its attack strategies in both the passive and active attack scenarios; the defender uses the frequently proposed privacy protecting approaches to increase the hardness of the adversary to reidentify the victims. Finally, through analysis results on real-world traffic data, we show that attack strategies in different scenarios show different performance. More inserted bogus traces or deleted recorded samples show a better performance when the cost of defense measures is small, whereas doing nothing becomes the best strategy when the cost of defense measures is very large. We also present the optimal defense strategy for each attack strategy. The main contributions of this paper are as follows.

(i)

We define an attack and defense game model to capture the strategy selection decision behavior of the adversary and defender, and we show the effectiveness of defense strategies.

(ii)

We establish the attack and defense game based on real world traffic data. In particular, for different attack scenarios, we study both the complete information game (the defender knows the adversary's knowledge on whereabouts of victims) and the incomplete information game (the defender does not know the adversary's knowledge).

(iii)

Through the Nash equilibriums in these games, we show that the defender can balance the data accuracy and victims’ location privacy to obtain the maximum utility when an adversary implements the optimal attack strategy.

The rest of the paper is organized as follows. In Section 2, we discuss related work. Section 3 presents the network model of VSNs and the location privacy issues in VSNs including the ability and goal of the adversary and defender. In Section 4, we define the attack and defense game model and present the attack in different scenarios and defense strategies. In Section 5, we present our main analysis results in the complete and incomplete information game for different attack strategies. We conclude the paper in Section 6.

2. Related Work

Several recent studies [4, 7] have analyzed the privacy risk of mobile traces and found that omitting identifiers from mobile traces does not guarantee anonymity due to the spatio-temporal correlation. Ma et al. [4] show that an adversary who have a relatively small amount of drivers’ location snapshots, could infer the true owners of anonymous traces with high probability. Montjoye et al. [7] study fifteen months of human mobility data for one and a half million individuals and find that four spatio temporal points are enough to uniquely identify 95% of the individuals. Therefore, there is an raising need for stronger privacy protection mechanisms.

In general, the existing solutions can be divided into two categories: reducing the resolution of the recorded data [8, 14] and introducing noise in the data. Hoh et al. [8] propose a disclosure control algorithm called uncertainty-aware path cloaking algorithm that selectively reveals GPS samples to limit the maximum time-to-confusion for all vehicles. Nergiz et al. [14] adopt the notion of k-anonymity to trajectories. They find a representative trajectory in k trajectories so that every trajectory is indistinguishable from other $k - 1$ trajectories.

These approaches of introducing noise have also been extensively studied in [10, 12]. Lu et al. [10] create a mix area in social spots to achieve the provable location privacy in VANETs. Huang et al. [12] proposed a solution called silent period to provide user with location privacy preserving in wireless networks. However, these techniques may be not suitable for privacy preserving in VSNs for two reasons. First, these techniques rarely consider the effect of the adversary's knowledge and its attack strategies on the performance of defense strategies. Second, they cannot meet the different data accuracy requirements of different applications and services [13]. These issues are what we want to address in this paper.

Game theory provides the many needed mathematical frameworks for analysis, modeling, and decision processes for network security and privacy issues [15–17], so we adopt game theory to study the data privacy issues in VSNs. There are many works on using game theory in security aspects of VANETs. Raya et al. [18] model the revocation problem using a finite dynamic game with mobile nodes as players, who can detect misbehavior with a certain probability. Reidt et al. [19] design a distributed detection error tolerant revocation scheme called karmic-suicide by using a game theoretic approach. In [20], zerosum game, fuzzy game and fictitious play are applied to model the interaction of the attacker and defender. In [10], the authors use game theoretic techniques to prove the feasibility of their pseudonym changing strategy. Freudiger et al. [21] analyze the noncooperative behavior of mobile nodes by using a game theoretic model, where each player aims at maximizing its location privacy at a minimum cost. In this paper, we study a new aspect of privacy by evaluating the effect of the knowledge and attack strategies and the different accuracy requirements on the performance of defense strategies.

3. Preliminaries

In this section, we explain our network model, as well as our assumptions and the location privacy issues in VSNs. We conclude by sketching the problem this work aims at solving. In Table 1, we summarize the notations introduced throughout this paper.

Table 1

List of symbols.

Symbol	Definition
$L_{i}$	Trace of user i
Tr	Set of anonymous traces
R	Side information
$t_{k}$	Time instances
$τ_{k}$	Sampled times
$u_{i} (s_{i}, s_{- i})$	Payoff function
$s_{i}$	Pure-strategy of user i
$s_{- i}$	Pure-strategy of other users besides i
$θ_{i}$	Type of user
$f (θ_{i})$	Probability density function of type
$α, β$	System parameters

3.1. Network Model

In VSNs, vehicles and passengers act as sensors to capture the occurrence of events, such as traffic accidents, traffic distribution, and road weather information [22]. VSNs can perceive the traffic distribution in a city with efficiency and high accuracy, and thus they have been envisioned to have a great potential to revolutionize human's driving experiences and metropolitan-area traffic flow control. Figure 1 illustrates the network model of VSNs. Vehicles use Dedicated Short Range Communication (DSRC) [23] technology to transmit the traffic related information, that is, $〈$ time, position $〉$ pairs, to Roadside Unit (RSU) via single-hop or multihop communications. Then, RSUs upload the information to a traffic control center through wired networks, so the center can predict the traffic distribution with very low cost and high accuracy.

Figure 1

Network model of VSN.

We assumed that a set of traces, each of which recording intermittently the time and corresponding location of a mobile node, are used by various applications, such as traffic prediction, city planning, and mobile network evaluation. These traces are anonymous in that the true identity of a vehicle has been replaced by a random identifier, but one same true identity is always mapped to a same identifier. In the following, we will elaborate the privacy threat on these anonymous traces.

3.2. Threat Model

An adversary tries to identify the complete path histories of one or more victims (drivers or passengers) from the anonymous traces. We assume that the adversary can collect certain side information about one or more victims. Each piece of side information gives the location of a victim at a time instant, although the information may not be exact. In practice, the side information may be obtained through the following means. First, nodes are open to observations in public spaces. Hence, the adversary may obtain the side information directly through meeting the victim by chance or engineered encounters. This case is called an active attack scenario. Second, nodes may disclose information on their whereabouts either voluntarily or inadvertently [11], which is called passive attack scenario. For example, a casual conversation between Alice and Bob may involve where Alice was around 8 am, or it may involve another person's location.

We set the trace of node i to be $L_{i}$ , the set of all nodes’ traces is $Tr = {L_{i}}$ , the side information R of a victim as a map, $R : {t_{k}} \to Θ$ , where $t_{k}$ is the time instant at which some side information about the victim's locations is revealed, and $Θ$ is the set of all cell location IDs. Then, the ability of the adversary to reveal the victim's trace can be given in the form of conditional entropy as follows:

\begin{matrix} H (L_{T} | R) = - \sum_{i = 1}^{| Tr |} ‍ \sum_{k} ‍ Pr (L_{i}, R (t_{k})) lo g_{2} Pr (L_{i} | R (t_{k})), \end{matrix}

(1)

where $Pr (L_{i} | R (t_{k}))$ is the conditional probability that the victim's trace $L_{i}$ corresponds to the side information $R (t_{k})$ known by the attacker, and $Pr (L_{i}, R (t_{k}))$ is the joint probability of the trace $L_{i}$ and the side information $R (t_{k})$ .

The ability of the attacker is determined by side information $R (t_{k})$ and conditional entropy $H (L_{T} | R (t_{k}))$ because $H (L_{T} | R) = \sum_{k} ‍ Pr (R (t_{k})) H (L_{T} | R (t_{k}))$ . When $Pr (R (t_{k}))$ is fixed, the smaller the entropy $H (L_{T} | R (t_{k}))$ , the smaller the conditional entropy $H (L_{T} | R)$ , the stronger the attack power. When $H (L_{T} | R) = 0$ , the victim's trace can be uniquely determined.

3.3. Defense Measures

A defender tries to protect the privacy of victims, which is defined by the uncertainty of a node's trace. The trace uncertainty is referred to as the entropy of the probability distribution of a node's trace [9, 12]. Let $p_{i} = Pr (L_{T} = L_{i})$ , $\forall L_{i} \in Tr$ , denotes the probability that $L_{i}$ corresponds to the victim's trace $L_{T}$ , so the entropy of the probability distribution of $L_{T}$ can be defined as

\begin{matrix} H (L_{T}) = - \sum_{i = 1}^{| Tr |} ‍ p_{i} lo g_{2} p_{i} . \end{matrix}

(2)

The larger the entropy $H (L_{T})$ is, the more uncertainty the victim's trace $L_{T}$ is. The smaller the entropy $H (L_{T})$ , the less uncertainty the victim's trace $L_{T}$ , when $H (L_{T}) = 0$ , the victim's trace can be uniquely determined.

Defense objectives for protecting trace privacy can be measured by k-anonymity. An anonymity set is denoted as $V_{A}$ that includes the nodes with their traces indistinguishable from that of the victim T. The k-anonymity model for privacy protection as used in [24] essentially refers to an anonymity set $V_{A}$ with a minimum size k, where the victim is guaranteed to be not distinguishable from at least $k - 1$ nodes with respect to information related to the victim (such as location information). However, not all the nodes in the anonymity set are equally likely to be the victim since an adversary may be able to obtain side information on the nodes.

We use entropy to improve this model here. Let the defender want to provide an anonymity set with a minimum size k; then, the entropy of the distribution of the anonymity set is given by $H (V_{A}) = - \sum_{k} ‍ (1 / k) lo g_{2} (1 / k) = \log_{2} k$ . Hence, the mechanisms for trace privacy protection should meet as follows:

\begin{matrix} H (L_{T} | Ω) \geq H (V_{A}), \end{matrix}

(3)

where $Ω$ is the side information that the defender believes the attacker has known, and $H (L_{T} | Ω)$ is the uncertainty of the victim's trace in the case of $Ω$ .

Since privacy is a context-specific property and is socially and/or culturally defined [6, 25], the trace privacy needs of individual users may vary, and further different users may require different k-anonymity entropy $H (V_{A})$ . But here, we only consider the average of k-anonymity entropy for all users.

3.4. Problem Statement

Given the attacker's strategies and the defender's strategies, the problem is to find the relationship between the attack strategies and the performance of defense strategies, and the relationship between the data accuracy requirements and the performance of defense strategies. Then, based on these anlysis results, we find the optimal defense strategies for different attack scenarios.

In order to address these problems, we must consider (i) how to model the strategy selection decision behavior of the adversary and defender, and (ii) that the defender may not know the adversary's knowledge.

4. Attack and Defense Game

In this section, we introduce the attack and defense game model to capture the strategy selection decision behavior of the adversary and defender. We first define the game model and the concept of Nash Equilibrium (NE) throughout the paper, and then we present different attack strategies and defense strategies.

4.1. Game Model

The game G is defined as a triplet $(P, S, U)$ , where P is the set of players, S is the set of strategies, and U is the set of payoff functions [26].

Players. The set of players $P = {P_{i} | 0 < i \leq I}$ , here $I = 2$ , corresponds to the adversary and defender. There are two types of adversaries: Global Passive Adversary (GPA) and Local Active Adversary (LAA) [4].

Strategy. The set of strategies in the games is $S = {S_{i} | 0 < i \leq I}$ , where $S_{1}, S_{2}$ are the set of strategies of the adversary and defender, respectively. We will describe them in detail in Sections 4.3 and 4.4.

Payoff Function. When the defender knows the side information that the adversary has collected, we use complete information games. In a complete information game, the payoff function of the adversary is $u_{1} (s_{1}, s_{2}) = H (L_{T} | R) - γ_{1}$ , and the payoff function of the defender is $u_{2} (s_{1}, s_{2}) = H (L_{T} | Ω) - γ_{2}$ , where $γ_{1}, γ_{2}$ are the cost of attack and defense, respectively. In order to maximize attack power, $γ_{1}$ is set to 0. The defense cost $γ_{2}$ includes both the cost of implementing defensive strategies and the damage to data accuracy.

Typically, the defender does not know all the side information that the adversary has collected. Hence, we consider the suggestions proposed by Harsanyi [27]. We introduce a new player named Nature, which assigns a type θ to the adversary according to a prior distribution p. θ can be considered as the side information that the adversary has collected. Then, the payoff functions are expressed as $u_{i} (s_{1}, s_{2} (θ))$ .

4.2. Equilibrium Concepts

In complete information games, Nash equilibrium (NE) can be defined as follows.

Definition 1.

A strategy profile $s^{*} = (s_{i}^{*}, s_{- i}^{*})$ is a Nash equilibrium if, for each player i,

\begin{matrix} u_{i} (s_{i}^{*}, s_{- i}^{*}) \geq u_{i} (s_{i}, s_{- i}^{*}), \forall s_{i} \in S_{i} . \end{matrix}

(4)

In other words, in a NE, none of the players can unilaterally change his strategy to increase his payoff. A player can also play each of his pure strategies with some probability using mixed strategies. A mixed strategy $σ_{i}$ of player i is a probability distribution defined over the pure strategies $s_{i}$ .

In incomplete information games, we adopt the concept of Bayesian Nash equilibrium [21].

Definition 2.

A strategy profile $s^{*} = (s_{i}^{*} (θ_{i}), s_{- i}^{*} (θ_{- i}))$ is a pure-strategy Bayesian Nash equilibrium (BNE) if, for each player i,

\begin{matrix} s^{*} \in \underset{s_{i} \in S_{i}}{argmax} \sum_{θ_{- i}} ‍ p (θ_{- i}) \cdot u_{i} (s_{i}, s_{- i}^{*} (θ_{- i})), \forall θ_{i}, \end{matrix}

(5)

where $θ_{- i}$ is the type of player i's opponents, and $p (θ_{i})$ is the prior distribution of $θ_{i}$ .

4.3. Attack Strategies

4.3.1. Attack Scenarios

According to the two types of adversaries, an attack can be classified as two scenarios: passive attack and active attack.

Scenario A: Passive Attack. In this setting, the adversary is given the complete (anonymized) traces. The adversary's goal is, given some pieces of side information about a victim, to identify in some optimal fashion the complete path history of the chosen victim. The key assumptions are (i) the adversary is passive that it does not actively go out to seek encounters with potential victims and (ii) the side information given to the adversary contains noise. If sampled times ${τ_{k}}$ , at which the actual node locations are published, are equally spaced, and ${t_{k}}$ is the time instants at which some side information about the victim's locations are revealed, passive attack can be divided into the two following attacks.

Attack 1 (A1). The side information references time instants that coincide with sampled times in the trace only, we have ${t_{k} | k = 1,2, \dots} \subset {τ_{k} | k = 1,2, \dots}$ .

Attack 2 (A2). The side information references time instants between two consecutive sampled times in the set of traces, we have ${t_{k} | k = 1,2, \dots} ⊄ {τ_{k} | k = 1,2, \dots}$ , and for each $t_{k}$ , there exists $\tilde{k}$ such that,

\begin{matrix} t_{k} = λ_{k} τ_{\tilde{k}} + (1 - λ_{k}) τ_{\tilde{k} + 1}, 0 \leq λ_{k} < 1 . \end{matrix}

(6)

A2 is the more general attack. To some extent, A1 can be considered as a special case of A2, that is, for each k, $λ_{k} = 0$ . We assume that the adversary will attempt to use all known information in its inference strategy, by employing some form of Bayesian inference. In applying the Bayesian inference, the adversary can make use of some general knowledge, including constraints on nodal movements imposed by geography of the roads, and general movement preferences of the nodes.

Scenario B: Active Attack. The adversary is active in this scenario that it obtains side information about victims by encountering the victims. The adversary can obtain traces in a real time and gradual fashion, that is, as time progresses, the adversary is provided with the trace information together with the information acquired up to the real time instants. The goal here is to identify as many traces as possible. If $L_{M}$ is the trace of the adversary M, active attack can be divided into the following attacks.

Attack 1 (B1). The adversary stays at one fixed location, that is, for any j and k, $L_{M} (τ_{j}) = L_{M} (τ_{k})$ .

Attack 2 (B2). The adversary moves to maximize the amount of useful side information, that is, there exists at least one pair of i and j such that $L_{M} (τ_{j}) \neq L_{M} (τ_{k})$ .

We assume that after encountering a victim, the adversary will not attempt to follow the victim. Because the objective of the adversary is to identify as many trace identities as possible.

4.3.2. Strategies for A1 and A2

As noted before, the side information often contains noise. The adversary thus needs to perform Bayesian inference or use the maximum likelihood estimator to make the best guess. The goal is, given R, to find the $L_{i}$ that gives the best match. The formulation of such a procedure is described below. Given $R = {R (t_{k}) | k = 1,2, \dots}$ , compute

\begin{array}{l} Pr (L_{i} | R) = \frac{Pr (L_{i}, R (t_{k}))}{Pr (R (t_{k}))} \\ = \frac{Pr (R (t_{k}) | L_{i}) Pr (L_{i})}{\sum_{j = 1}^{N} ‍ Pr (R (t_{k}) | L_{j}) Pr (L_{j})}, k = 1,2, \dots . \end{array}

(7)

The goal of the maximum likelihood estimator is to find i which maximizes the expression (7). Note that the denominator is a constant. In addition, without any knowledge about how the victim is chosen, we set the priori distribution of the victim to be uniform: as follows, $P (L_{i}) = 1 / N, i = 1,2, \dots, N$ . Hence the solution of the maximum likelihood estimator is given by

\begin{matrix} \max_{i = 1,2, \dots, N} Pr (R (t_{k}) {| L}_{i}), k = 1,2, \dots . \end{matrix}

(8)

For A1 and A2, the expression (8) can be given in the following form.

Scenario A1. If the noise in the side information $Z_{k}$ is independent and identically distributed, and obey some given distribution ${Pr}_{Z}$ , the expression (8) can be written as

\begin{matrix} Pr (R (t_{k}) | L_{i}) = \prod_{k} P r_{Z} (R (t_{k}) - L_{i} (t_{k})), \end{matrix}

(9)

where $R (t_{k}) - L_{i} (t_{k}) = Z_{k}$ , this location difference is computed using the Cartesian distance between the two cells [4].

Scenario A2. If node mobility obeys the Markov model, (8) can be given by

\begin{array}{l} Pr (R (t_{k}) | L_{i}) \\ = Pr (R (t_{k}) | L_{i} ({τ_{k} | k = 1,2 \dots})) \\ = \frac{\prod_{k} ‍ [Pr (L_{i} (τ_{\tilde{k} + 1}) | R (t_{k})) \times Pr (R (t_{k}) | L_{i} (τ_{\tilde{k}}))]}{\prod_{k} ‍ Pr (L_{i} (τ_{\tilde{k} + 1}) | L (τ_{\tilde{k}}))}, \end{array}

(10)

where $t_{k} = λ_{k} τ_{\tilde{k}} + (1 - λ_{k}) τ_{\tilde{k} + 1}$ , and when $λ_{k}$ is a constant, the Markov process of node mobility is steady.

The expression (9) can be greatly simplified if the noise $Z_{k}$ obeys specific forms, such as normal distribution or uniform distribution. Therefore, the adversary can use some heuristic approaches [4] to identify the victim's trace. In the following we consider four strategies used by the adversary to identify the victim's trace from the published trace set. We first describe them for scenario A1 as follows.

(i)

Maximum likelihood estimation approach (MLE). This is the same as formulation (9), that is, the similarity value of trace i is given by

\begin{matrix} \prod_{k} P r_{Z} (R (t_{k}) - L_{i} (t_{k})), \end{matrix}

(11)

where the trace with the maximum similarity value is declared to be the victim's.

(ii)

Minimum square approach (MSQ). When the $Z_{k}$ 's takes normal distribution $N (0, σ^{2})$ , that is, $Pr (R (t_{k}) | L_{i}) = C \exp {- (\sum_{k} ‍ {| R (t_{k}) - L_{i} (t_{k}) |}^{2}) / (2 σ^{2})}$ , $k = 1,2, \dots$ , for some constant C. Hence, the maximum likelihood estimator is essentially the same as the following minimum square approach:

\begin{matrix} \min_{i} \sum_{k} ‍ {| R (t_{k}) - L_{i} (t_{k}) |}^{2}, \end{matrix}

(12)

where the trace with the least similarity value is declared to be the victim's.

(iii)

Basic approach (BAS). The adversary assumes that the noise is zero-mean and has a specific standard deviation (σ), but makes no assumption about its exact distribution. The adversary then computes the similarity value of trace i with the side information as follows:

\begin{matrix} \sum_{k = 1}^{M} ‍ I_{2 σ} (L_{i} (t_{k}), R (t_{k})), \end{matrix}

(13)

where $I_{2 σ} (x, y) = 1$ if $| x - y | \leq 2 σ$ and 0 otherwise. Hence, the adversary accepts a trace as a potential candidate if the trace owner appears in a radius of $2 \times σ$ of the revealed location. The trace with the maximum similarity value is declared to be the victim's.

(iv)

Weighted exponential approach (EXP). In this approach, which is proposed and analyzed in [11], the adversary does not know the type of noise or its magnitude. Similar to BAS, the adversary maximizes the similarity value of trace i as follows:

\begin{matrix} \sum_{k = 1}^{M} ‍ \frac{1}{w (R (t_{k}))} \exp {- \frac{1}{C} | L_{i} (t_{k}) - R (t_{k}) |}, \end{matrix}

(14)

where $w (R (t_{k}))$ is some weight assigned to the revealed cell $R (t_{k})$ and C is a constant.

The above formula can be easily modified for scenario A2. For convenience, the probability that the vehicle is on the cell l at time $t_{k}$ for trace j is defined as the function $P_{j} : Θ \times {t_{k} | k = 1,2, \dots} \to R_{+}$ as follows:

\begin{matrix} P_{j} (l, t_{k}) = \frac{Pr (L_{j} (τ_{\tilde{k} + 1}) | R (t_{k})) \times Pr (R (t_{k}) | L_{j} (τ_{\tilde{k}}))}{Pr (L_{j} (τ_{\tilde{k} + 1}) | L_{j} (τ_{\tilde{k}}))}, \end{matrix}

(15)

where $l = λ_{k} L_{j} (τ_{\tilde{k}}) + (1 - λ_{k}) L_{j} (τ_{\tilde{k} + 1})$ , $τ_{\tilde{k}} < t_{k} < τ_{\tilde{k} + 1}$ . Then we have

MLE₂

\begin{matrix} \prod_{k} (\sum_{l \in Θ} ‍ P_{j} (l, t_{k}) P r_{Z} (R (t_{k}) - l)), \end{matrix}

(16)

MSQ₂

\begin{matrix} \sum_{k} ‍ (\sum_{l \in Θ} ‍ P_{j} (l, t_{k}) {| R (t_{k}) - l |}^{2}), \end{matrix}

(17)

BAS₂

\begin{matrix} \sum_{k} ‍ (\sum_{l \in Θ} ‍ P_{j} (l, t_{k}) \times I_{2 σ} (l, R (t_{k}))), \end{matrix}

(18)

EXP₂

\begin{matrix} \sum_{k} ‍ (\sum_{l \in Θ} ‍ \frac{P_{j} (l, t_{k})}{w (R (t_{k}))} \exp {- \frac{1}{C} | l - R (t_{k}) |}) . \end{matrix}

(19)

The four strategies have the same computational complexity, which is linear in the number of pieces of side information and the number of nodes. Notice that we assume attack strategies that only collect the side information about one victim. However, the strategies can be easily extended to the case in which the adversary collects the side information about several victims. In particular, the MLE approach can be used directly without modification, while a properly picked threshold can be used for the other attack strategies to remove traces from consideration if their similarity to the victim's trace is lower than the threshold.

4.3.3. Strategies for B1 and B2

In the active attack scenario, the adversary observes the participants directly. The published traces are revealed in a real-time and synchronized way with respect to the information collected by the adversary. As there is no noise when additional information is acquired, the adversary does not need to use any inference strategy. Based on the idea of excluding the unmatched traces [4], attack algorithm can be described as follows.

Illustrated in Algorithm 1, the attack algorithm takes as input the traces that are published progressively. The algorithm first assumes that all the traces are candidate traces for each victim. A trace is said to be a candidate trace of a victim if it appears at the same set of times and locations as when/where the adversary meets the victim. As time evolves, the adversary removes candidate traces which do not agree with the observed information about each victim from the set for that victim. When a victim's trace is identified, the identified trace is removed from the candidate set of other victims. Notice that the adversary may not identify a participant at times they meet each other, but the identification can occur at a later time when all but one candidate traces are identified and removed. Hence, the adversary identifies a participant more efficiently when it tries to identify as many participants as possible.

In the scenario B1, r is fixed, because the adversary always stays at one position. While r changes as the adversary moves in the scenario B2. B2 has experimentally a better performance than B1. The reason is because mobile nodes typically obtain more side information than stationary nodes.

4.4. Defense Strategies

There are two types of defense strategies: anonymous and cloaking techniques. Anonymous techniques which hide the true identities of users can be implemented by several ways. For example, one user uses one or more pseudonyms [28–31], or a group of users share the same ID [32]. However, the true identity of each vehicle is only replaced by a random identifier in VSNs, so we will not discuss the anonymous techniques. Cloaking techniques, such as introducing noise data or reducing the recorded data, also can be used to protect users’ privacy. But using cloaking techniques will impact on the truth of the published information, so we should balance the users’ privacy and the data accuracy required by different applications.

In VSNs, the defender's (i.e., the traffic control center or other authorities) objective is to increase the hardness for the adversary to identify a trace in the anonymous published traces. The defender can insert some bogus traces or delete the recorded data at some sampled times to achieve its objective. If ${τ_{1}, τ_{2}, \dots, τ_{k}, \dots}$ is the set of the sampled times in the set of the published traces $Tr$ , the defense strategies can be expressed as follows.

Defense 1 (D1). m bogus traces are inserted into $Tr$ to form a new set of trace $T r^{'}$ , such that $T r^{'} \supset Tr$ and $| T r^{'} | = | Tr | + m$ .

Defense 2 (D2). n sampled times are deleted in ${τ_{1}, τ_{2}, \dots, τ_{k}, \dots}$ to form a new sequence ${τ_{1}^{'}, τ_{2}^{'}, \dots, τ_{k}^{'}, \dots}$ , such that ${τ_{1}, τ_{2}, \dots, τ_{k}, \dots} \supset {τ_{1}^{'}, τ_{2}^{'}, \dots, τ_{k}^{'}, \dots}$ .

Theorem 3.

D1 can provide nodes with a higher trace privacy level.

Proof.

Before the execution of D1, the trace privacy levels of nodes meet as follows:

\begin{array}{l} H (L_{T}) = - \sum_{i = 1}^{| Tr |} ‍ p_{i} lo g_{2} p_{i} \\ \leq \sum_{i = 1}^{| Tr |} ‍ \frac{1}{| Tr |} lo g_{2} | Tr | \\ = \log_{2} | Tr | . \end{array}

(20)

After the execution of D1, the trace privacy levels of nodes meet as follows:

\begin{array}{l} H (L_{T}) = - \sum_{i = 1}^{| T r^{'} |} ‍ p_{i} lo g_{2} p_{i} \\ \leq \sum_{i = 1}^{| T r^{'} |} ‍ \frac{1}{| T r^{'} |} lo g_{2} | T r^{'} | \\ = \log_{2} | T r^{'} | \\ = \log_{2} | Tr + m | . \end{array}

(21)

Since $\log_{2} | Tr + m | > \log_{2} | Tr |$ , D1 provides nodes with a higher trace privacy level.

Lemma 4.

If f is a concave function and X is a random variable,

\begin{matrix} E f (X) \leq f (E X) . \end{matrix}

(22)

Proof.

For a two-mass-point distribution, the inequality becomes

\begin{matrix} p_{1} f (x_{1}) + p_{2} f (x_{2}) \leq f (p_{1} x_{1} + p_{2} x_{2}), \end{matrix}

(23)

which follows directly from the definition of concave functions. Suppose that the lemma is true for distributions with $k - 1$ mass points. Then, writing $p_{i}^{'} = p_{i} / (1 - p_{k})$ , $i = 1,2, \dots, k - 1$ , we have

\begin{array}{l} \sum_{i = 1}^{k} ‍ p_{i} f (x_{i}) = p_{k} f (x_{k}) + (1 - p_{k}) \sum_{i = 1}^{k - 1} ‍ p_{i}^{'} f (x_{i}) \\ \leq p_{k} f (x_{k}) + (1 - p_{k}) f (\sum_{i = 1}^{k - 1} ‍ p_{i}^{'} x_{i}) \\ \leq f (p_{k} x_{k} + (1 - p_{k}) \sum_{i = 1}^{k - 1} ‍ p_{i}^{'} x_{i}) \\ = f (\sum_{i = 1}^{k} ‍ p_{i} x_{i}), \end{array}

(24)

where the first inequality follows from the induction hypothesis and the second follows from the definition of concavity.

Theorem 5.

If the probability density function of l is concave over the interval $[R (t_{k}) - ε, R (t_{k}) + ε]$ , D2 reduces the probability that the victim's trace is identified by the adversary.

Proof.

Let $τ_{\tilde{k}}$ be the deleted sampled time. When there exists $t_{k}$ such that $τ_{\tilde{k}} = t_{k}$ , A1 turns into A2. Since the probability density function of l is concave over $[R (t_{k}) - ε, R (t_{k}) + ε]$ , $P r_{Z} (R (t_{k}) - l)$ is concave over $[R (t_{k}) - ε, R (t_{k}) + ε]$ . Then, we have

\begin{array}{l} Pr (R (t_{k}) | L_{i}) = \prod_{k} P r_{Z} (R (t_{k}) - L_{i} (t_{k})) \\ \geq \prod_{k} (\sum_{l \in Θ} ‍ P_{i} (l, t_{k}) P r_{Z} (R (t_{k}) - l)) \\ = \prod_{k} (\frac{Pr (L_{i} (τ_{\tilde{k} + 1}) | R (t_{k})) Pr (R (t_{k}) |)}{Pr (L_{i} (τ_{\tilde{k} + 1}) | L_{i} (τ_{\tilde{k} - 1}))}) \\ = Pr (R (t_{k}) | L_{i} ({τ_{k}^{'} | k = 1,2, \dots})) \\ = Pr (R (t_{k}) | L_{i}^{'}) . \end{array}

(25)

When there is no such $t_{k}$ , the adversary is always in A2. Then, we have

\begin{array}{l} Pr (R (t_{k}) | L_{i}) = Pr (R (t_{k}) | L_{i} ({τ_{1}, τ_{2}, \dots, τ_{k}, \dots})) \\ \geq Pr (R (t_{k}) | L_{i} ({τ_{1}^{'}, τ_{2}^{'}, \dots, τ_{k}^{'}, \dots})) \\ = Pr (R (t_{k}) | L_{i}^{'}), \end{array}

(26)

where ${τ_{1}, τ_{2}, \dots, τ_{k}, \dots} \supset {{τ_{1}^{'}}_{}, {τ_{2}^{'}}_{}, \dots, {τ_{k}^{'}}_{}, \dots}$ .

So, in both cases, the probability that the victim's trace is identified by the adversary will be reduced.

Notice that it is reasonable to assume that the probability density function of l is concave over the interval $[R (t_{k}) - ε, R (t_{k}) + ε]$ , because the collected side information $R (t_{k})$ about l appears more frequently in the vicinity of l.

However, these defense strategies will bring some loss in data accuracy. The more the inserted bogus traces and the deleted sampled times, the more unreal the data. Hence, we assume that the defense cost γ is proportional to the inserted bogus traces and the deleted sampled times. We take into account the application requirements involved in the defense cost in a parameter γ. Then for A1, γ can be expressed as,

\begin{matrix} γ = α \frac{m}{| Trace |}, \end{matrix}

(27)

where α is a system parameter that indicates the requirement of an application, $| Trace |$ is the number of all the traces, m is the number of the inserted bogus traces.

Similarly to A1, γ for A2 can be expressed as:

\begin{matrix} γ = β \frac{n}{N}, \end{matrix}

(28)

where β is a system parameter, N is the number of all the sampled times, and n is the number of the deleted sampled times in the N sampled times.

5. Analysis of Attack and Defense Game

In this section, we study both the passive and active attack scenarios. In the passive attack scenarios (A1 and A2), we consider games of complete and incomplete information. As the adversary can obtain the victims’ positions accurately in the active attack scenarios (B1 and B2), the defender does not need to infer the side information the adversary obtained. Therefore, we only establish the complete information game in scenario B1 and B2.

The traffic data used in the experiments contains mobility traces of taxis in Beijing, China. It contains GPS coordinates of approximately 28,000 taxis collected in a month in Beijing [33]. The location updates are quite fine-grained the average time interval between two consecutive location updates is less than 30 sec. In the experiments, the adversary tries to identify the trace of one participant (randomly picked from all the participants) by gathering side information. The noisy randomly sampled $〈$ time, location $〉$ pairs from the trace are revealed to the adversary as side information, which the adversary utilizes to identify the complete movement history of the victim from the anonymous traces. The defender inserts some bogus traces (D1) or deletes the recorded data at some sampled times (D2) against the attack strategies implemented by the adversary.

5.1. Games for Scenario A1

5.1.1. Analysis of Complete Information Game

Strategic form for complete information game in A1 is depicted as matrices, as in Table 2. Each player chooses a strategy simultaneously, and has common knowledge about the side information. We assumed that the side information contains 18 pairs of $〈$ time, location $〉$ . In particular, since the adversary is still in scenario A1 when the defender uses strategy D2, the side information which is not at sampled times is not involved in calculating the payoff.

Table 2

Complete information game for A1.

(a) The defender chooses D1

	$m = 0$	$m = \| Trace \| / 8$	$m = \| Trace \| / 4$	$m = \| Trace \| / 2$

MLE	$0 . 35,0 . 35$	$0 . 41,0 . 41 - γ^{1}$	$0 . 56,0 . 56 - γ^{2}$	$0 . 77,0 . 77 - γ^{3}$
MSQ	$0.34, 0.34$	$0.40, 0.40 - γ^{1}$	$0 . 54,0 . 54 - γ^{2}$	$0 . 76,0 . 76 - γ^{3}$
BAS	$1.30, 1.30$	$1.49, 1.49 - γ^{1}$	$1.69,1 . 69 - γ^{2}$	$2 . 06,2 . 06 - γ^{3}$
EXP	$0 . 67,0 . 67$	$0.77, 0.77 - γ^{1}$	$0.92, 0.92 - γ^{2}$	$1.18, 1.18 - γ^{3}$

(b) The defender chooses D2

	$n = 0$	$n = N / 2$	$n = 3 N / 4$	$n = 7 N / 8$

MLE	$0.35, 0.35$	$1 . 13,1 . 13 - γ^{4}$	$2.31, 2.31 - γ^{5}$	$3 . 75,3 . 75 - γ^{6}$
MSQ	$0.34, 0.34$	$1 . 12,1 . 12 - γ^{4}$	$2 . 30,2 . 30 - γ^{5}$	$3 . 74,3 . 74 - γ^{6}$
BAS	$1.30, 1.30$	$2.69, 2.69 - γ^{4}$	$4.47, 4.47 - γ^{5}$	$5 . 64,5 . 64 - γ^{6}$
EXP	$0.67, 0.67$	$1 . 56,1 . 56 - γ^{4}$	$2 . 72,2 . 72 - γ^{5}$	$4.06,4 . 06 - γ^{6}$

We observe that Nash equilibriums depend on the value of the defense cost γ in this game. when the defense cost is zero, that is, $γ = 0$ , $(MSQ, m = | Trace | / 2)$ is a pure-strategy Nash equilibrium in the left strategic form. And $(MSQ, n = 7 N / 8)$ is a pure-strategy Nash equilibrium in the right strategic form. when $γ \neq 0$ , from (27) and (28), we have $γ^{1} = α / 8, γ^{2} = α / 4, γ^{3} = α / 2$ , $γ^{4} = β / 2, γ^{5} = 3 β / 4, γ^{6} = 7 β / 8$ . For the left strategic form, the Nash equilibrium is $(MSQ, m = | Trace | / 2)$ if $0 < α \leq 0.84$ and $(MSQ, m = 0)$ if $α > 0.84$ . For the right strategic form, the Nash equilibrium is $(MSQ, n = 7 N / 8)$ if $0 < β \leq 3.89$ and $(MSQ, n = 0)$ if $β > 3.89$ . In other words, $n = 7 N / 8$ or $m = | Trace | / 2$ is the optimal defense strategy when γ is small, whereas $n = 0$ or $m = 0$ becomes the optimal strategy when γ is very large. Because a larger γ means a higher data accuracy, which restricts the defense measures.

5.1.2. Analysis of Incomplete Information Game

Generally, the defender does not know how much side information obtained by the adversary. To solve the problem, Harsanyi [27] introduce a new player named Nature that turns an incomplete information game into a game with complete but imperfect information. We assume that Nature chooses the probability that side information contains 10 pairs of $〈$ time, location $〉$ is $p_{1}$ , the probability that side information contains 15 pairs of $〈$ time, location $〉$ is $p_{2}$ , and the probability that side information contains 20 pairs of $〈$ time, location $〉$ is $1 - p_{1} - p_{2}$ , as shown in Table 3. We observe that MSQ is a strictly dominated strategy for the adversary. Nash equilibriums are $(MSQ, m = | Trace | / 2)$ , $(MSQ, n = 7 N / 8)$ if $γ = 0$ .

Table 3

Incomplete information game for Al.

(a) D1 and 10 $〈 time, location 〉$ pairs

	$m = 0$	$m = \| Trace \| / 8$	$m = \| Trace \| / 4$	$m = \| Trace \| / 2$
MLE	$0.99, 0.99$	$1.04, 1.04 - γ^{1}$	$1.12, 1.12 - γ^{2}$	$1.59, 1.59 - γ^{3}$
MSQ	$0.97, 0.97$	$1.01, 1.01 - γ^{1}$	$1.10, 1.10 - γ^{2}$	$1.57, 1.57 - γ^{3}$
BAS	$2.47, 2.47$	$2.59, 2.59 - γ^{1}$	$2.79, 2.79 - γ^{2}$	$3.47, 3.47 - γ^{3}$
EXP	$1.43, 1.43$	$1.51,1 . 51 - γ^{1}$	$1.60, 1.60 - γ^{2}$	$2.05, 2.05 - γ^{3}$

(b) D2 and 10 $〈 time, location 〉$ pairs

	$n = 0$	$n = N / 2$	$n = 3 N / 4$	$n = 7 N / 8$

MLE	$0 . 99,0 . 99$	$2 . 08,2 . 08 - γ^{4}$	$3.32, 3.32 - γ^{5}$	$4.64, 4.64 - γ^{6}$
MSQ	$0 . 97,0 . 97$	$2.07,2 . 07 - γ^{4}$	$3.30, 3.30 - γ^{5}$	$4.62,4 . 62 - γ^{6}$
BAS	$2 . 47,2 . 47$	$4.05,4 . 05 - γ^{4}$	$5.64, 5.64 - γ^{5}$	$6.64, 6.64 - γ^{6}$
EXP	$1 . 43,1.43$	$2.51,2 . 51 - γ^{4}$	$3 . 74,3 . 74 - γ^{5}$	$4.92,4 . 92 - γ^{6}$

(c) D1 and 15 $〈 time, location 〉$ pairs

	$m = 0$	$m = \| Trace \| / 8$	$m = \| Trace \| / 4$	$m = \| Trace \| / 2$

MLE	$0.47, 0.47$	$0.62, 0.62 - γ^{1}$	$0.79, 0.79 - γ^{2}$	$0.99, 0.99 - γ^{3}$
MSQ	$0.46, 0.46$	$0.61, 0.61 - γ^{1}$	$0.77, 0.77 - γ^{2}$	$0.96, 0.96 - γ^{3}$
BAS	$1.60, 1.60$	$1.83, 1.83 - γ^{1}$	$2.06, 2.06 - γ^{2}$	$2.47, 2.47 - γ^{3}$
EXP	$0.83, 0.83$	$1.03, 1.03 - γ^{1}$	$1.18, 1.18 - γ^{2}$	$1.43, 1.43 - γ^{3}$

(d) D2 and 15 $〈 time, location 〉$ pairs

	$n = 0$	$n = N / 2$	$n = 3 N / 4$	$n = 7 N / 8$

MLE	$0.47,0 . 47$	$1.36, 1.36 - γ^{4}$	$2.64, 2.64 - γ^{5}$	$4.03, 4.03 - γ^{6}$
MSQ	$0.46,0 . 46$	$1.35, 1.35 - γ^{4}$	$2.63, 2.63 - γ^{5}$	$4 . 02,4 . 02 - γ^{6}$
BAS	$1.60,1.60$	$3.05,3 . 05 - γ^{4}$	$4.92, 4.92 - γ^{5}$	$6.27, 6.27 - γ^{6}$
EXP	$0 . 83,0 . 83$	$1.86, 1.86 - γ^{4}$	$3.06, 3.06 - γ^{5}$	$4.27, 4.27 - γ^{6}$

(e) D1 and 20 $〈 time, location 〉$ pairs

	$m = 0$	$m = \| Trace \| / 8$	$m = \| Trace \| / 4$	$m = \| Trace \| / 2$

MLE	$0.26, 0.26$	$0 . 36,0 . 36 - γ^{1}$	$0.44, 0.44 - γ^{2}$	$0.63, 0.63 - γ^{3}$
MSQ	$0.25, 0.25$	$0 . 34,0 . 34 - γ^{1}$	$0.42, 0.42 - γ^{2}$	$0.62, 0.62 - γ^{3}$
BAS	$1.12, 1.12$	$1.32,1.32 - γ^{1}$	$1.45, 1.45 - γ^{2}$	$1.84, 1.84 - γ^{3}$
EXP	$0.57, 0.57$	$0 . 67,0 . 67 - γ^{1}$	$0.78, 0.78 - γ^{2}$	$1.04, 1.04 - γ^{3}$

(f) D2 and 20 $〈 time, location 〉$ pairs

	$n = 0$	$n = N / 2$	$n = 3 N / 4$	$n = 7 N / 8$

MLE	$0.26, 0.26$	$1.00, 1.00 - γ^{4}$	$2.10, 2.10 - γ^{5}$	$3 . 33,3 . 33 - γ^{6}$
MSQ	$0.25, 0.25$	$0.99, 0.99 - γ^{4}$	$2.08, 2.08 - γ^{5}$	$3 . 32,3 . 32 - γ^{6}$
BAS	$1.12, 1.12$	$2.47, 2.47 - γ^{4}$	$4.05, 4.05 - γ^{5}$	$5.64,5 . 64 - γ^{6}$
EXP	$0.57, 0.57$	$1.43, 1.43 - γ^{4}$	$2 . 51,2 . 51 - γ^{5}$	$3 . 74,3 . 74 - γ^{6}$

When $γ \neq 0$ , we study which are the optimal defense strategies at different values of $α, β$ . As shown in Figure 2, the maximum point of each line corresponds to the optimal defense strategy. For example, In Table 3(c), the optimal defense strategy is $m = | Trace | / 2$ if $0 < α \leq 0.76$ , $m = | Trace | / 4$ if $0.76 < α \leq 1.24$ and $m = 0$ if $α > 1.24$ . in Table 3(d), $n = 7 N / 8$ if $0 < β \leq 4.07$ and $n = 0$ if $β > 4.07$ . In other words, when γ is small, more inserted bogus traces or deleted recorded samples show a better performance. When γ is very large, doing nothing becomes the optimal strategy.

Figure 2

Optimal defense strategies at different values of $α, β$ .

5.1.3. Discussion

In the complete information game, the adversary prefers the strategy MSQ. When the cost of defense measures is 0 or very low, we observe that the defense measures are more effective if there are more inserted bogus traces or deleted sampled times. The method of deleting sampled times is more effective than the method of inserting bogus traces. When the cost of defense cost is high, the defender prefers to do nothing.

In the incomplete information game, the adversary also prefers the strategy MSQ, while the optimal defense strategies depend on the values of $α, β$ . If the defense cost γ is small, that is, when $0 < α \leq 1.20$ , the optimal defense strategy is $m = | Trace | / 2$ , and when $0 < β \leq 3.51$ , the optimal defense strategy is $n = 7 N / 8$ . If the defense cost γ is high, that is, when $α > 1.24$ , the optimal defense strategy is $m = 0$ , and when $β > 4.17$ , the optimal defense strategy is $n = 0$ .

5.2. Games for Scenario A2

5.2.1. Analysis of Complete Information Game

Strategic form for complete information game in A2 is depicted as shown in Table 4. The defender has the knowledge about the side information obtained by the adversary. The side information contains 18 pairs of $〈$ time, location $〉$ . We observe that Nash equilibriums also depend on the value of the defense cost γ. When $γ = 0$ , the pure-strategy Nash equilibrium is $(ML E_{2}, m = | Trace | / 2)$ in Table 4(a), and $(ML E_{2}, n = 7 N / 8)$ in Table 4(b). When $γ \neq 0$ , the Nash equilibrium is $(ML E_{2}, m = | Trace | / 2)$ if $0 < α \leq 0.82$ and $(ML E_{2}, m = 0)$ if $α > 0.82$ in Table 4(a). In Table 4(b), the Nash equilibrium is $(ML E_{2}, n = 7 N / 8)$ if $0 < β \leq 1.21$ and $(ML E_{2}, n = 0)$ if $β > 1.21$ . From Table 2 and Table 4, we observe that the attacks in A2 have a better performance than that in A1, because the adversary can use the side information between two consecutive sampled times.

Table 4

Complete information game for A2.

(a) The defender chooses D1

	$m = 0$	$m = \| Trace \| / 8$	$m = \| Trace \| / 4$	$m = \| Trace \| / 2$

MLE₂	$0 . 37,0.37$	$0.43, 0.43 - γ^{1}$	$0 . 57,0 . 57 - γ^{2}$	$0.78, 0.78 - γ^{3}$
MSQ₂	$0.38, 0.38$	$0.44, 0.44 - γ^{1}$	$0 . 59,0 . 59 - γ^{2}$	$0.79, 0.79 - γ^{3}$
BAS₂	$0.80, 0.80$	$0.97, 0.97 - γ^{1}$	$1.10, 1.10 - γ^{2}$	$1.35, 1.35 - γ^{3}$
EXP₂	$0 . 67,0.67$	$0.80, 0.80 - γ^{1}$	$0 . 96,0 . 96 - γ^{2}$	$1.18, 1.18 - γ^{3}$

(b) The defender chooses D2

	$n = 0$	$n = N / 2$	$n = 3 N / 4$	$n = 7 N / 8$

MLE₂	$0.37,0.37$	$0 . 50,0 . 50 - γ^{4}$	$0.84, 0.84 - γ^{5}$	$1.43, 1.43 - γ^{6}$
MSQ₂	$0 . 38,0 . 38$	$0 . 58,0 . 58 - γ^{4}$	$1.11, 1.11 - γ^{5}$	$2 . 37,2 . 37 - γ^{6}$
BAS₂	$0.80,0.80$	$1.04, 1.04 - γ^{4}$	$1.58, 1.58 - γ^{5}$	$2 . 78,2 . 78 - γ^{6}$
EXP₂	$0 . 76,0 . 76$	$0 . 92,0 . 92 - γ^{4}$	$1.43, 1.43 - γ^{5}$	$2.44, 2.44 - γ^{6}$

5.2.2. Analysis of Incomplete Information Game

Table 5 depicts the incomplete information game in scenario A2. MLE₂ is a strictly dominated strategy for the adversary. $(ML E_{2}, m = | Trace | / 2)$ and $(ML E_{2}, n = 7 N / 8)$ are Nash equilibriums when $γ = 0$ .

Table 5

Incomplete information game for A2.

(a) D1 and 10 $〈 time, location 〉$ pairs

	$m = 0$	$m = \| Trace \| / 8$	$m = \| Trace \| / 4$	$m = \| Trace \| / 2$

MLE₂	$1.01,1.01$	$1 . 09,1 . 09 - γ^{1}$	$1.26, 1.26 - γ^{2}$	$1.59, 1.59 - γ^{3}$
MSQ₂	$1.02,1.02$	$1 . 11,1 . 11 - γ^{1}$	$1.27, 1.27 - γ^{2}$	$1.61, 1.61 - γ^{3}$
BAS₂	$1.56,1.56$	$1.69,1.69 - γ^{1}$	$1.89, 1.89 - γ^{2}$	$2.18, 2.18 - γ^{3}$
EXP₂	$1.45,1.45$	$1.56,1.56 - γ^{1}$	$1.78, 1.78 - γ^{2}$	$2.05, 2.05 - γ^{3}$

(b) D2 and 10 $〈 time, location 〉$ pairs

	$n = 0$	$n = N / 2$	$n = 3 N / 4$	$n = 7 N / 8$

MLE₂	$1.01, 1.01$	$1.15, 1.15 - γ^{4}$	$1.61, 1.61 - γ^{5}$	$2.50, 2.50 - γ^{6}$
MSQ₂	$1.02, 1.02$	$1.26, 1.26 - γ^{4}$	$1.84, 1.84 - γ^{5}$	$3 . 17,3 . 17 - γ^{6}$
BAS₂	$1.56, 1.56$	$1.86, 1.86 - γ^{4}$	$2.45, 2.45 - γ^{5}$	$3.65, 3.65 - γ^{6}$
EXP₂	$1.45, 1.45$	$1.75, 1.75 - γ^{4}$	$2.30, 2.30 - γ^{5}$	$3.25, 3.25 - γ^{6}$

(c) D1 and 15 $〈 time, location 〉$ pairs

	$m = 0$	$m = \| Trace \| / 8$	$m = \| Trace \| / 4$	$m = \| Trace \| / 2$

MLE₂	$0.51,0.51$	$0.64, 0.64 - γ^{1}$	$0.79, 0.79 - γ^{2}$	$1.01, 1.01 - γ^{3}$
MSQ₂	$0 . 53,0 . 53$	$0.65, 0.65 - γ^{1}$	$0.81, 0.81 - γ^{2}$	$1.03, 1.03 - γ^{3}$
BAS₂	$1.03,1.03$	$1.18, 1.18 - γ^{1}$	$1.36, 1.36 - γ^{2}$	$1.56, 1.56 - γ^{3}$
EXP₂	$0 . 89,0 . 89$	$1.03, 1.03 - γ^{1}$	$1.20, 1.20 - γ^{2}$	$1.45, 1.45 - γ^{3}$

(d) D2 and 15 $〈 time, location 〉$ pairs

	$n = 0$	$n = N / 2$	$n = 3 N / 4$	$n = 7 N / 8$

MLE₂	$0.51,0.51$	$0 . 67,0 . 67 - γ^{4}$	$1.04, 1.04 - γ^{5}$	$1.85, 1.85 - γ^{6}$
MSQ₂	$0.53,0 . 53$	$0 . 74,0 . 74 - γ^{4}$	$1.29, 1.29 - γ^{5}$	$2.58, 2.58 - γ^{6}$
BAS₂	$1.03,1.03$	$1.27, 1.27 - γ^{4}$	$1.80, 1.80 - γ^{5}$	$2.97, 2.97 - γ^{6}$
EXP₂	$0.89,0 . 89$	$1.16, 1.16 - γ^{4}$	$1.67, 1.67 - γ^{5}$	$2.68, 2.68 - γ^{6}$

(e) D1 and 20 $〈 time, location 〉$ pairs

	$m = 0$	$m = \| Trace \| / 8$	$m = \| Trace \| / 4$	$m = \| Trace \| / 2$

MLE₂	$0.27, 0.27$	$0.36, 0.36 - γ^{1}$	$0.43, 0.43 - γ^{2}$	$0.64, 0.64 - γ^{3}$
MSQ₂	$0.28, 0.28$	$0.37, 0.37 - γ^{1}$	$0.45, 0.45 - γ^{2}$	$0.67, 0.67 - γ^{3}$
BAS₂	$0.69, 0.69$	$0.82, 0.82 - γ^{1}$	$0.96, 0.96 - γ^{2}$	$1.18, 1.18 - γ^{3}$
EXP₂	$0.56, 0.56$	$0.69, 0.69 - γ^{1}$	$0.81, 0.81 - γ^{2}$	$1.03, 1.03 - γ^{3}$

(f) D2 and 20 $〈 time, location 〉$ pairs

	$n = 0$	$n = N / 2$	$n = 3 N / 4$	$n = 7 N / 8$

MLE₂	$0.27,0 . 27$	$0.39,0 . 39 - γ^{4}$	$0.70, 0.70 - γ^{5}$	$1.45, 1.45 - γ^{6}$
MSQ₂	$0 . 28,0 . 28$	$0.46,0 . 46 - γ^{4}$	$0.95, 0.95 - γ^{5}$	$2.18, 2.18 - γ^{6}$
BAS₂	$0 . 69,0 . 69$	$0.92,0 . 92 - γ^{4}$	$1.43, 1.43 - γ^{5}$	$2.56, 2.56 - γ^{6}$
EXP₂	$0 . 56,0 . 56$	$0.80,0 . 80 - γ^{4}$	$1.28, 1.28 - γ^{5}$	$2.29, 2.29 - γ^{6}$

Figure 3 depicts the optimal defense strategies at different values of $α, β$ . The maximum point of each line corresponds to the optimal defense strategy. In Figure 3(a), the optimal defense strategy is $m = | Trace | / 2$ if $0 < α \leq 1.16$ and $m = 0$ if $α > 1.16$ , and $n = 7 N / 8$ if $0 < β \leq 1.70$ and $n = 0$ if $β > 1.70$ in Figure 3(b). In Figure 3(c), the optimal defense strategy is $m = | Trace | / 2$ if $0 < α \leq 0.88$ , $m = | Trace | / 4$ if $0.88 < α \leq 1.12$ and $m = 0$ if $α > 1.12$ , and $n = 7 N / 8$ if $0 < β \leq 1.53$ and $n = 0$ if $β > 1.53$ in Figure 3(d). In Figure 3(e), the optimal defense strategy is $m = | Trace | / 2$ if $0 < α \leq 0.74$ and $m = 0$ if $α > 0.74$ , and $n = 7 N / 8$ if $0 < β \leq 1.35$ and $n = 0$ if $β > 1.35$ in Figure 3(f). From Figure 2 and Figure 3, we also observe that the attacks in A2 have a better performance than that in A1.

Figure 3

Optimal defense strategies at different values of $α, β$ .

5.2.3. Discussion

In the A2 complete information game, the adversary prefers the strategy MLE₂. When the cost of defense measures is 0 or very low, the defense measures are more effective if there are more inserted bogus traces or deleted sampled times. Compared with A1, the method of deleting sampled times in A2 has a lower performance, but still better than the method of inserting bogus trace. When the cost of defense cost is high, the defender also prefers to do nothing.

In the A2 incomplete information game, the adversary also prefers the strategy MLE₂, while the optimal defense strategies depend on the values of $α, β$ . If the defense cost γ is small, that is, when $0 < α \leq 0.74$ , the optimal defense strategy is $m = | Trace | / 2$ , and when $0 < β \leq 1.35$ , the optimal defense strategy is $n = 7 N / 8$ . If the defense cost γ is high, that is, when $α > 1.16$ , the optimal defense strategy is $m = 0$ , and when $β > 1.70$ , the optimal defense strategy is $n = 0$ .

5.3. Games for Scenario B1 and B2

Because the goal of the adversary is to identify as many traces as possible, the adversary does not need to use any inference strategy. We assume that the attack strategies are B1 and B2. The defender has two defense strategies, we set $m = 0, | Trace | / 8, | Trace | / 4, | Trace | / 2$ in D1, and $n = 0, N / 2,3 N / 4,7 N / 8$ in D2. Table 6 depicts the strategic form for complete information game in A2, where attack algorithm preformed 60, 100, 400, and 600 minutes. When $γ = 0$ , the Nash equilibriums are $(B 2, m = | Trace | / 2)$ and $(B 2, n = 7 N / 8)$ if the attack algorithm preformed 60, 100, and 400 minutes, and $(B 1, m = | Trace | / 2)$ and $(B 1, n = 7 N / 8)$ if the attack algorithm preformed 600 minutes.

Table 6

Complete information game for B1 and B2.

(a) D1 and 60 minutes

	$m = 0$	$m = \| Trace \| / 8$	$m = \| Trace \| / 4$	$m = \| Trace \| / 2$

Bl	$2 . 27,2 . 27$	$2 . 43,2 . 43 - γ^{1}$	$2.59, 2.59 - γ^{2}$	$2.85, 2.85 - γ^{3}$
B2	$1.37, 1.37$	$1 . 54,1 . 54 - γ^{1}$	$1.69, 1.69 - γ^{2}$	$1.96, 1.96 - γ^{3}$

(b) D2 and 60 minutes

	$n = 0$	$n = N / 2$	$n = 3 N / 4$	$n = 7 N / 8$

Bl	$2 . 27,2 . 27$	$3 . 27,3 . 27 - γ^{4}$	$4.26, 4.26 - γ^{5}$	$5.28, 5.28 - γ^{6}$
B2	$1.37,1.37$	$2 . 37,2 . 37 - γ^{4}$	$3.38, 3.28 - γ^{5}$	$4.37, 4.37 - γ^{6}$

(c) D1 and 100 minutes

	$m = 0$	$m = \| Trace \| / 8$	$m = \| Trace \| / 4$	$m = \| Trace \| / 2$

Bl	$1.68, 1.68$	$1.85, 1.85 - γ^{1}$	$2 . 01,2 . 01 - γ^{2}$	$2.27, 2.27 - γ^{3}$
B2	$1.27, 1.27$	$1.44, 1.44 - γ^{1}$	$1.59,1.59 - γ^{2}$	$1.85, 1.85 - γ^{3}$
(d) D2 and 100 minutes

	$n = 0$	$n = N / 2$	$n = 3 N / 4$	$n = 7 N / 8$

Bl	$1.68, 1.68$	$2.68, 2.68 - γ^{4}$	$3.68, 3.68 - γ^{5}$	$4 . 68,4 . 68 - γ^{6}$
B2	$1.27, 1.27$	$2.26, 2.26 - γ^{4}$	$3 . 27,3 . 27 - γ^{5}$	$4.27, 4.27 - γ^{6}$

(e) D1 and 400 minutes

	$m = 0$	$m = \| Trace \| / 8$	$m = \| Trace \| / 4$	$m = \| Trace \| / 2$

Bl	$0.37, 0.37$	$0 . 54,0 . 54 - γ^{1}$	$0.69, 0.69 - γ^{2}$	$0.96, 0.96 - γ^{3}$
B2	$0.35, 0.35$	$0.52, 0.52 - γ^{1}$	$0 . 67,0 . 67 - γ^{2}$	$0.94,0 . 94 - γ^{3}$

(f) D2 and 400 minutes

	$n = 0$	$n = N / 2$	$n = 3 N / 4$	$n = 7 N / 8$

Bl	$0 . 37,0 . 37$	$1.37, 1.37 - γ^{4}$	$2.38, 2.38 - γ^{5}$	$3 . 37,3 . 37 - γ^{6}$
B2	$0.35, 0.35$	$1.31, 1.34 - γ^{4}$	$2.35, 2.35 - γ^{5}$	$3.35, 3.35 - γ^{6}$

(g) D1 and 600 minutes

	$m = 0$	$m = \| Trace \| / 8$	$m = \| Trace \| / 4$	$m = \| Trace \| / 2$

Bl	$0 . 14,0 . 14$	$0.31, 0.31 - γ^{1}$	$0.46, 0.46 - γ^{2}$	$0.73, 0.73 - γ^{3}$
B2	$0 . 19,0 . 19$	$0.36, 0.36 - γ^{1}$	$0.51, 0.51 - γ^{2}$	$0 . 77,0 . 77 - γ^{3}$

(h) D2 and 600 minutes

	$n = 0$	$n = N / 2$	$n = 3 N / 4$	$n = 7 N / 8$

Bl	$0 . 14,0 . 14$	$1.14, 1.14 - γ^{4}$	$2 . 15,2 . 15 - γ^{5}$	$3 . 14,3 . 14 - γ^{6}$
B2	$0 . 19,0 . 19$	$1.19, 1.19 - γ^{4}$	$2 . 18,2 . 18 - γ^{5}$	$3 . 19,3 . 19 - γ^{6}$

When $γ \neq 0$ , the optimal defense strategies at different values of $α, β$ for different preformed time are depicted as shown in Figure 4. The maximum point of each line corresponds to the optimal defense strategy. From Figures 4(a)–4(h), we observe that the optimal defense strategy is $m = | Trace | / 2$ when $0 < α \leq 1.04$ , $m = | Trace | / 4$ when $1.08 < α \leq 1.20$ , $m = | Trace | / 8$ when $1.20 < α \leq 1.36$ , $m = 0$ when $α > 1.36$ , and $n = 7 N / 8$ when $0 < β \leq 3.43$ , $n = 0$ when $β > 3.43$ .

Figure 4

Optimal defense strategies at different values of α, β for different preformed time.

In the complete information game, the adversary prefers B2 if the time that the attack algorithm performs is short, B1 if the time that the attack algorithm performs is long. It is because that when time is long, the adversary at one position can also meet other victims with a high probability. When $γ = 0$ , the defense measures are more effective if there are more inserted bogus traces or deleted sampled times. When $γ \neq 0$ , the optimal defense strategies depend on the value of $α, β$ .

6. Conclusion

In this paper, we analyze data privacy aspects of VSNs by using a game-theoretic model. We first quantify attack power and defense objectives for recording and comparing to the performance. Then, we define an attack and defense game model which can capture the strategy selection decision behavior of the adversary and defender. We also show the effectiveness of defense strategies. Finally, we establish and analyze the complete information and incomplete information games for passive and active attack scenarios based on real world traffic data. Through the analysis results, we show that attack strategies in different scenarios show different performances. More inserted bogus traces or deleted recorded samples show a better performance when the cost of defense measures is small, whereas doing nothing becomes the best strategy when the cost of defense measures is very large. We also present the optimal defense strategy that provides the defender with the maximum utility when the adversary implements the optimal attack strategy. Therefore, our analysis results are useful for designing appropriate privacy protection mechanisms.

Algorithm 1: Attack algorithm for B1 and B2.

Input: ${L_{i} | i = 1,2, \dots, | Tr |}$

Output: Identified traces

(1) for $m = 0$ to $| Tr |$ do

(2) $Candidat e_{m} = {L_{i} | i = 1,2, \dots, | Tr |}$

(3) end for

(4) while $s_{k}$ not ended do

(5) for each node i met at $τ_{k}$ and $j \in {Candidate}_{i}$

(6) if met node i at location r at $τ_{k}$ and $L_{j} (τ_{k})! = r$ then

(7) remove trace j from ${Candidate}_{i}$ ;

(8) if $| Candidat e_{i} | = 1$ then

(9) $L_{j}$ in ${Candidate}_{i}$ is the identified trace and

remove it from other candidate set;

(10) end if

(11) $k + +$ ;

(12) end if

(13) end while

Footnotes

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

The authors would like to thank Xiang Lu from the Institute of Information Engineering, Chinese Academy of Science, for helping revise the typos and grammar mistakes throughout this paper. This study is supported in part by the China 973 under Grant no. 2011CB302902, the NSF China Major Program (60933011, 61202099), National High-Tech R&D Program of China (863) under Grant no. 2013AA011102, Scientific and Technological Pilot Project under Grant no. XDA06040100.

References

Zhang

Lin

P. H.

Shen

An efficient identity-based batch verification scheme for vehicular sensor networks

Proceedings of the 27th IEEE Conference on Computer Communications (INFOCOM '08)

April 2008

Phoenix, Ariz, USA

2-s2.0-51449098979

10.1109/INFOCOM.2007.58

Google Map

Collection of anonymous location data

https://support.google.com/gmm/answer/2839958?hl=en&ref_topic=2839910

Mershad

Artail

A framework for secure and efficient data acquisition in vehicular Ad Hoc networks

IEEE Transactions on Vehicular Technology 2013 62 2 536 551

10.1109/TVT.2012.2226613

C. Y. T.

Yau

D. K. Y.

Yip

N. K.

Rao

N. S. V.

Privacy vulnerability of published anonymous mobility traces

Proceedings of the 16th Annual International Conference on Mobile Computing and Networking (MobiCom '10)

2010

185 196

10.1145/1859995.1860017

Lin

Zhang

Zhu

P. H.

Shen

Security in vehicular Ad Hoc networks

IEEE Communications Magazine 2008 46 4 88 95

2-s2.0-42649121690

10.1109/MCOM.2008.4481346

Razzaque

M. A.

Salehi

S. A.

Cheraghi

S. M.

Security and privacy in vehicular Ad-Hoc networks: survey and the road ahead

Wireless Networks and Security 2013

New York, NY, USA

Springer

107 132 Signals and Communication Technology

10.1007/978-3-642-36169-2_4

de Montjoye

Y. A.

Hidalgo

C. A.

Verleysen

Blondel

V. D.

Unique in the crowd: the privacy bounds of human mobility

Nature Science Report 2013 3

Hoh

Gruteser

Xiong

Alrabady

Achieving guaranteed anonymity in GPS traces via uncertainty-aware path cloaking

IEEE Transactions on Mobile Computing 2010 9 8 1089 1107

2-s2.0-77953809080

10.1109/TMC.2010.62

Gruteser

Grunwald

Anonymous ge of lotion-based services through spatial and temporal cloaking

Proceedings of the 1st International Conference on Mobile Systems, Applications and Services (MobiSys '03)

2003

San Francisco, Calif, USA

42 31

10.1145/1066116.1189037

10.

Lin

Luan

T. H.

Liang

Shen

Pseudonym changing at social spots: an effective strategy for location privacy in VANETs

IEEE Transactions on Vehicular Technology 2012 61 1 86 96

2-s2.0-84856182115

10.1109/TVT.2011.2162864

11.

Narayanan

Shmatikov

Robust de-anonymization of large sparse datasets

Proceedings of the IEEE Symposium on Security and Privacy (SP '08)

May 2008

Oakland, Calif, USA

111 125

2-s2.0-50249142450

10.1109/SP.2008.33

12.

Huang

Matsuura

Yamane

Sezako

Towards modeling wireless location privacy

Proceedings of the 5th International Conference on Privacy Enhancing Technologies (PET '05)

2005

Vigo, Spain

59 77

13.

Shokri

Theodorakopoulos

Troncoso

Hubaux

J. P.

le Boudec

J. Y.

Protecting location privacy: optimal strategy against localization attacks

Proceedings of the 19th ACM Conference on Computer and Communications Security

October 2012

Raleigh, NC, USA

14.

Nergiz

M. E.

Atzori

Saygin

Bariş

Towards trajectory anonymization: a generalization-based approach

Transactions on Data Privacy 2009 2 1 47 75

2-s2.0-73949100139

15.

Alpcan

Başar

A game theoretic analysis of intrusion detection in access control systems

Proceedings of the 43rd IEEE Conference on Decision and Control (CDC '04)

December 2004

1568 1573

2-s2.0-14344257107

16.

Alpcan

Basar

An intrusion detection game with limited observations

Proceedings of the 12th International Symposium on Dynamic Games and Applications

July 2006

17.

Grossklags

Christin

Chuang

Secure or insure? A game-theoretic analysis of information security games

Proceedings of the 17th International Conference on World Wide Web (WWW '08)

April 2008

209 218

2-s2.0-57349198694

10.1145/1367497.1367526

18.

Raya

Manshaei

M. H.

Félegyházi

Hubaux

J. P.

Revocation games in ephemeral networks

Proceedings of the 15th ACM conference on Computer and Communications Security (CCS '08)

October 2008

199 210

2-s2.0-70349267771

10.1145/1455770.1455797

19.

Reidt

Srivatsa

Balfe

The fable of the bees: incentivizing robust revocation decision making in ad hoc networks

Proceedings of the 16th ACM Conference on Computer and Communications Security (CCS '09)

November 2009

291 302

2-s2.0-74049086451

10.1145/1653662.1653698

20.

Alpcan

Buchegger

Security games for vehicular networks

IEEE Transactions on Mobile Computing 2011 10 2 280 290

10.1109/TMC.2010.146

21.

Freudiger

Manshaei

M. H.

Hubaux

J. P.

Parkes

D. C.

On non-cooperative location privacy: a game-theoretic analysis

Proceedings of the ACM Conference on Computer and Communications Security (CCS '09)

November 2009

Chicago, Ill, USA

324 337

2-s2.0-74049097109

10.1145/1653662.1653702

22.

Chim

Yiu

Hui

VSPN: VANET-based secure and privacy-preserving navigation

IEEE Transactions on Computers 2012

10.1109/TC.2012.188

23.

Dedicated Short Range Communications (DSRC) Home http://www.leearmstrong.com/DSRC/DSRCHomeset.htm

24.

Zhang

Lin

P. H.

Shen

An efficient message authentication scheme for vehicular communications

IEEE Transactions on Vehicular Technology 2008 57 6 3357 3368

2-s2.0-57049129831

10.1109/TVT.2008.928581

25.

Nissenbaum

Privacy as contextual integrity

Washington Law Review 2004 79 1 119 158

2-s2.0-1842538795

26.

Fudenberg

Tirole

Game Theory 1991

Boston, Mass, USA

MIT Press

27.

Harsanyi

J. C.

Games with incomplete information played by “Bayesian” players, I–III. Part I. The basic model

Management Science 1967 14 3 159 182

28.

Sun

Lin

Shen

An efficient pseudonymous authentication scheme with strong privacy preservation for vehicular communications

IEEE Transactions on Vehicular Technology 2010 59 7 3589 3603

2-s2.0-77956735087

10.1109/TVT.2010.2051468

29.

Huang

Misra

Xue

Verma

PACP: an efficient pseudonymous authentication-based conditional privacy protocol for VANETs

IEEE Transactions on Intelligent Transportation Systems 2011 12 3 736 746

2-s2.0-80052372468

10.1109/TITS.2011.2156790

30.

Guizani

A novel ID-based authentication framework with adaptive privacy preservation for VANETs

Proceedings of the Computing, Communications and Applications Conference (ComComAp '12)

January 2012

Hong Kong, China

345 350

2-s2.0-84860436171

10.1109/ComComAp.2012.6154869

31.

IEEE draft standard for wireless access in vehicular environments—security services for applications and management messages

IEEE P1609.2-2013, 2013

32.

Lin

Sun

P. H.

Shen

GSIS: a secure and privacy-preserving protocol for vehicular communications

IEEE Transactions on Vehicular Technology 2007 56 6 3442 3456

2-s2.0-36749076982

10.1109/TVT.2007.906878

33.

Datatang Company Taxi GPS data of one city in North of China (200903), http://www.datatang.com/data/2987