Abstract
Mobile traces, collected by vehicular sensor networks (VSNs), facilitate various business applications and services. However, the traces can be used to trace and identify drivers or passengers, which raise significant privacy concerns. Existing privacy protecting techniques may not be suitable, due to their inadequate considerations for the data accuracy requirements of different applications and the adversary's knowledge and strategies. In this paper, we analyze data privacy issues in VSNs with a game theoretic model, where a defender uses the privacy protecting techniques against the attack strategies implemented by an adversary. We study both the passive and active attack scenarios, and in each scenario we consider the effect of different data accuracy requirements on the performance of defense measures. Through the analysis results on real-world traffic data, we show that more inserted bogus traces or deleted recorded samples show a better performance when the cost of defense measures is small, whereas doing nothing becomes the best strategy when the cost of defense measures is very large. In addition, we present the optimal defense strategy that provides the defender with the maximum utility when the adversary implements the optimal attack strategy.
1. Introduction
With the advances and wide adoption of wireless communication technologies, vehicles are now often equipped with wireless devices that allow them to communicate with each other (V2V) as well as with roadside infrastructures (V2I). The V2V and V2I communications make driving more safe and improve a driver's driving experiences. Such communication networks are called Vehicular Ad Hoc Networks (VANETs). However, with the increasing needs for sensing and data acquisition in cities, VANETs have turned into Vehicular Sensor Networks (VSNs) [1]. VSNs exploit vehicles and passengers to capture the occurrence of events, such as traffic volume, road surface condition, chemical, and radiation. The location traces in the traffic-related data create various fresh new business applications and services, such as map drawing [2], traffic prediction [3], city planning, and mobile network analysis [4].
However, the places in these location traces that a driver or passenger has visited may reveal his/her sensitive information, such as traffic law violations, political affiliations, and medical conditions [5, 6]. Although the information about vehicular mobility traces are often collected in an anonymous way, an adversary can reidentify the true owner of a trace. Because the location information of drivers and passengers can be openly observed in public places, and also can be disclosed voluntarily or inadvertently by themselves, such as a casual conversation, or published media such as news articles or web blogs [4]. The adversary who has partial knowledge of the whereabouts of drivers or passengers (which are called victims), can infer the traces’ true owners with high probability by using vehicular mobility constraints and spatiotemporal correlation [4, 7].
To reduce spatiotemporal correlation, some frequently proposed privacy protecting techniques suggest reducing the resolution of the recorded data [8, 9] or introducing noise in the data [10–12]. However, these techniques may not be suitable for privacy preserving in VSNs due to their inadequate consideration for the adversary's knowledge and its attack strategies. On the other hand, these techniques can not meet the different data accuracy requirements of different applications and services [13]. To address these challenges, we must first analyze the effect of the knowledge and attack strategies and the different accuracy requirements on the performance of defense strategies.
In this paper, we use a game-theoretic model to study the effect of the adversary's knowledge and strategies and the data accuracy requirements on the performance of defense measures. More specifically, we first present location privacy issues in VSNs, including the ability and goal of the adversary and defender. Then, we define a game theoretic model the attack and defense game which models the strategy selection decision behavior of the adversary and defender. In this game, the adversary implements its attack strategies in both the passive and active attack scenarios; the defender uses the frequently proposed privacy protecting approaches to increase the hardness of the adversary to reidentify the victims. Finally, through analysis results on real-world traffic data, we show that attack strategies in different scenarios show different performance. More inserted bogus traces or deleted recorded samples show a better performance when the cost of defense measures is small, whereas doing nothing becomes the best strategy when the cost of defense measures is very large. We also present the optimal defense strategy for each attack strategy. The main contributions of this paper are as follows.
We define an attack and defense game model to capture the strategy selection decision behavior of the adversary and defender, and we show the effectiveness of defense strategies.
We establish the attack and defense game based on real world traffic data. In particular, for different attack scenarios, we study both the complete information game (the defender knows the adversary's knowledge on whereabouts of victims) and the incomplete information game (the defender does not know the adversary's knowledge).
Through the Nash equilibriums in these games, we show that the defender can balance the data accuracy and victims’ location privacy to obtain the maximum utility when an adversary implements the optimal attack strategy.
The rest of the paper is organized as follows. In Section 2, we discuss related work. Section 3 presents the network model of VSNs and the location privacy issues in VSNs including the ability and goal of the adversary and defender. In Section 4, we define the attack and defense game model and present the attack in different scenarios and defense strategies. In Section 5, we present our main analysis results in the complete and incomplete information game for different attack strategies. We conclude the paper in Section 6.
2. Related Work
Several recent studies [4, 7] have analyzed the privacy risk of mobile traces and found that omitting identifiers from mobile traces does not guarantee anonymity due to the spatio-temporal correlation. Ma et al. [4] show that an adversary who have a relatively small amount of drivers’ location snapshots, could infer the true owners of anonymous traces with high probability. Montjoye et al. [7] study fifteen months of human mobility data for one and a half million individuals and find that four spatio temporal points are enough to uniquely identify 95% of the individuals. Therefore, there is an raising need for stronger privacy protection mechanisms.
In general, the existing solutions can be divided into two categories: reducing the resolution of the recorded data [8, 14] and introducing noise in the data. Hoh et al. [8] propose a disclosure control algorithm called uncertainty-aware path cloaking algorithm that selectively reveals GPS samples to limit the maximum time-to-confusion for all vehicles. Nergiz et al. [14] adopt the notion of k-anonymity to trajectories. They find a representative trajectory in k trajectories so that every trajectory is indistinguishable from other
These approaches of introducing noise have also been extensively studied in [10, 12]. Lu et al. [10] create a mix area in social spots to achieve the provable location privacy in VANETs. Huang et al. [12] proposed a solution called silent period to provide user with location privacy preserving in wireless networks. However, these techniques may be not suitable for privacy preserving in VSNs for two reasons. First, these techniques rarely consider the effect of the adversary's knowledge and its attack strategies on the performance of defense strategies. Second, they cannot meet the different data accuracy requirements of different applications and services [13]. These issues are what we want to address in this paper.
Game theory provides the many needed mathematical frameworks for analysis, modeling, and decision processes for network security and privacy issues [15–17], so we adopt game theory to study the data privacy issues in VSNs. There are many works on using game theory in security aspects of VANETs. Raya et al. [18] model the revocation problem using a finite dynamic game with mobile nodes as players, who can detect misbehavior with a certain probability. Reidt et al. [19] design a distributed detection error tolerant revocation scheme called karmic-suicide by using a game theoretic approach. In [20], zerosum game, fuzzy game and fictitious play are applied to model the interaction of the attacker and defender. In [10], the authors use game theoretic techniques to prove the feasibility of their pseudonym changing strategy. Freudiger et al. [21] analyze the noncooperative behavior of mobile nodes by using a game theoretic model, where each player aims at maximizing its location privacy at a minimum cost. In this paper, we study a new aspect of privacy by evaluating the effect of the knowledge and attack strategies and the different accuracy requirements on the performance of defense strategies.
3. Preliminaries
In this section, we explain our network model, as well as our assumptions and the location privacy issues in VSNs. We conclude by sketching the problem this work aims at solving. In Table 1, we summarize the notations introduced throughout this paper.
List of symbols.
3.1. Network Model
In VSNs, vehicles and passengers act as sensors to capture the occurrence of events, such as traffic accidents, traffic distribution, and road weather information [22]. VSNs can perceive the traffic distribution in a city with efficiency and high accuracy, and thus they have been envisioned to have a great potential to revolutionize human's driving experiences and metropolitan-area traffic flow control. Figure 1 illustrates the network model of VSNs. Vehicles use Dedicated Short Range Communication (DSRC) [23] technology to transmit the traffic related information, that is,

Network model of VSN.
We assumed that a set of traces, each of which recording intermittently the time and corresponding location of a mobile node, are used by various applications, such as traffic prediction, city planning, and mobile network evaluation. These traces are anonymous in that the true identity of a vehicle has been replaced by a random identifier, but one same true identity is always mapped to a same identifier. In the following, we will elaborate the privacy threat on these anonymous traces.
3.2. Threat Model
An adversary tries to identify the complete path histories of one or more victims (drivers or passengers) from the anonymous traces. We assume that the adversary can collect certain side information about one or more victims. Each piece of side information gives the location of a victim at a time instant, although the information may not be exact. In practice, the side information may be obtained through the following means. First, nodes are open to observations in public spaces. Hence, the adversary may obtain the side information directly through meeting the victim by chance or engineered encounters. This case is called an active attack scenario. Second, nodes may disclose information on their whereabouts either voluntarily or inadvertently [11], which is called passive attack scenario. For example, a casual conversation between Alice and Bob may involve where Alice was around 8 am, or it may involve another person's location.
We set the trace of node i to be
where
The ability of the attacker is determined by side information
3.3. Defense Measures
A defender tries to protect the privacy of victims, which is defined by the uncertainty of a node's trace. The trace uncertainty is referred to as the entropy of the probability distribution of a node's trace [9, 12]. Let
The larger the entropy
Defense objectives for protecting trace privacy can be measured by k-anonymity. An anonymity set is denoted as
We use entropy to improve this model here. Let the defender want to provide an anonymity set with a minimum size k; then, the entropy of the distribution of the anonymity set is given by
where
Since privacy is a context-specific property and is socially and/or culturally defined [6, 25], the trace privacy needs of individual users may vary, and further different users may require different k-anonymity entropy
3.4. Problem Statement
Given the attacker's strategies and the defender's strategies, the problem is to find the relationship between the attack strategies and the performance of defense strategies, and the relationship between the data accuracy requirements and the performance of defense strategies. Then, based on these anlysis results, we find the optimal defense strategies for different attack scenarios.
In order to address these problems, we must consider (i) how to model the strategy selection decision behavior of the adversary and defender, and (ii) that the defender may not know the adversary's knowledge.
4. Attack and Defense Game
In this section, we introduce the attack and defense game model to capture the strategy selection decision behavior of the adversary and defender. We first define the game model and the concept of Nash Equilibrium (NE) throughout the paper, and then we present different attack strategies and defense strategies.
4.1. Game Model
The game G is defined as a triplet
Players. The set of players
Strategy. The set of strategies in the games is
Payoff Function. When the defender knows the side information that the adversary has collected, we use complete information games. In a complete information game, the payoff function of the adversary is
Typically, the defender does not know all the side information that the adversary has collected. Hence, we consider the suggestions proposed by Harsanyi [27]. We introduce a new player named Nature, which assigns a type θ to the adversary according to a prior distribution p. θ can be considered as the side information that the adversary has collected. Then, the payoff functions are expressed as
4.2. Equilibrium Concepts
In complete information games, Nash equilibrium (NE) can be defined as follows.
Definition 1.
A strategy profile
In other words, in a NE, none of the players can unilaterally change his strategy to increase his payoff. A player can also play each of his pure strategies with some probability using mixed strategies. A mixed strategy
In incomplete information games, we adopt the concept of Bayesian Nash equilibrium [21].
Definition 2.
A strategy profile
where
4.3. Attack Strategies
4.3.1. Attack Scenarios
According to the two types of adversaries, an attack can be classified as two scenarios: passive attack and active attack.
Scenario A: Passive Attack. In this setting, the adversary is given the complete (anonymized) traces. The adversary's goal is, given some pieces of side information about a victim, to identify in some optimal fashion the complete path history of the chosen victim. The key assumptions are (i) the adversary is passive that it does not actively go out to seek encounters with potential victims and (ii) the side information given to the adversary contains noise. If sampled times
Attack 1 (A1). The side information references time instants that coincide with sampled times in the trace only, we have
Attack 2 (A2). The side information references time instants between two consecutive sampled times in the set of traces, we have
A2 is the more general attack. To some extent, A1 can be considered as a special case of A2, that is, for each k,
Scenario B: Active Attack. The adversary is active in this scenario that it obtains side information about victims by encountering the victims. The adversary can obtain traces in a real time and gradual fashion, that is, as time progresses, the adversary is provided with the trace information together with the information acquired up to the real time instants. The goal here is to identify as many traces as possible. If
Attack 1 (B1). The adversary stays at one fixed location, that is, for any j and k,
Attack 2 (B2). The adversary moves to maximize the amount of useful side information, that is, there exists at least one pair of i and j such that
We assume that after encountering a victim, the adversary will not attempt to follow the victim. Because the objective of the adversary is to identify as many trace identities as possible.
4.3.2. Strategies for A1 and A2
As noted before, the side information often contains noise. The adversary thus needs to perform Bayesian inference or use the maximum likelihood estimator to make the best guess. The goal is, given R, to find the
The goal of the maximum likelihood estimator is to find i which maximizes the expression (7). Note that the denominator is a constant. In addition, without any knowledge about how the victim is chosen, we set the priori distribution of the victim to be uniform: as follows,
For A1 and A2, the expression (8) can be given in the following form.
Scenario A1. If the noise in the side information
where
Scenario A2. If node mobility obeys the Markov model, (8) can be given by
where
The expression (9) can be greatly simplified if the noise
Maximum likelihood estimation approach (MLE). This is the same as formulation (9), that is, the similarity value of trace i is given by
where the trace with the maximum similarity value is declared to be the victim's.
Minimum square approach (MSQ). When the
where the trace with the least similarity value is declared to be the victim's.
Basic approach (BAS). The adversary assumes that the noise is zero-mean and has a specific standard deviation (σ), but makes no assumption about its exact distribution. The adversary then computes the similarity value of trace i with the side information as follows:
where
Weighted exponential approach (EXP). In this approach, which is proposed and analyzed in [11], the adversary does not know the type of noise or its magnitude. Similar to BAS, the adversary maximizes the similarity value of trace i as follows:
where
The above formula can be easily modified for scenario A2. For convenience, the probability that the vehicle is on the cell l at time
where
MLE2
MSQ2
BAS2
EXP2
The four strategies have the same computational complexity, which is linear in the number of pieces of side information and the number of nodes. Notice that we assume attack strategies that only collect the side information about one victim. However, the strategies can be easily extended to the case in which the adversary collects the side information about several victims. In particular, the MLE approach can be used directly without modification, while a properly picked threshold can be used for the other attack strategies to remove traces from consideration if their similarity to the victim's trace is lower than the threshold.
4.3.3. Strategies for B1 and B2
In the active attack scenario, the adversary observes the participants directly. The published traces are revealed in a real-time and synchronized way with respect to the information collected by the adversary. As there is no noise when additional information is acquired, the adversary does not need to use any inference strategy. Based on the idea of excluding the unmatched traces [4], attack algorithm can be described as follows.
Illustrated in Algorithm 1, the attack algorithm takes as input the traces that are published progressively. The algorithm first assumes that all the traces are candidate traces for each victim. A trace is said to be a candidate trace of a victim if it appears at the same set of times and locations as when/where the adversary meets the victim. As time evolves, the adversary removes candidate traces which do not agree with the observed information about each victim from the set for that victim. When a victim's trace is identified, the identified trace is removed from the candidate set of other victims. Notice that the adversary may not identify a participant at times they meet each other, but the identification can occur at a later time when all but one candidate traces are identified and removed. Hence, the adversary identifies a participant more efficiently when it tries to identify as many participants as possible.
In the scenario B1, r is fixed, because the adversary always stays at one position. While r changes as the adversary moves in the scenario B2. B2 has experimentally a better performance than B1. The reason is because mobile nodes typically obtain more side information than stationary nodes.
4.4. Defense Strategies
There are two types of defense strategies: anonymous and cloaking techniques. Anonymous techniques which hide the true identities of users can be implemented by several ways. For example, one user uses one or more pseudonyms [28–31], or a group of users share the same ID [32]. However, the true identity of each vehicle is only replaced by a random identifier in VSNs, so we will not discuss the anonymous techniques. Cloaking techniques, such as introducing noise data or reducing the recorded data, also can be used to protect users’ privacy. But using cloaking techniques will impact on the truth of the published information, so we should balance the users’ privacy and the data accuracy required by different applications.
In VSNs, the defender's (i.e., the traffic control center or other authorities) objective is to increase the hardness for the adversary to identify a trace in the anonymous published traces. The defender can insert some bogus traces or delete the recorded data at some sampled times to achieve its objective. If
Defense 1 (D1). m bogus traces are inserted into
Defense 2 (D2). n sampled times are deleted in
Theorem 3.
D1 can provide nodes with a higher trace privacy level.
Proof.
Before the execution of D1, the trace privacy levels of nodes meet as follows:
After the execution of D1, the trace privacy levels of nodes meet as follows:
Since
Lemma 4.
If f is a concave function and X is a random variable,
Proof.
For a two-mass-point distribution, the inequality becomes
which follows directly from the definition of concave functions. Suppose that the lemma is true for distributions with
where the first inequality follows from the induction hypothesis and the second follows from the definition of concavity.
Theorem 5.
If the probability density function of l is concave over the interval
Proof.
Let
When there is no such
where
So, in both cases, the probability that the victim's trace is identified by the adversary will be reduced.
Notice that it is reasonable to assume that the probability density function of l is concave over the interval
However, these defense strategies will bring some loss in data accuracy. The more the inserted bogus traces and the deleted sampled times, the more unreal the data. Hence, we assume that the defense cost γ is proportional to the inserted bogus traces and the deleted sampled times. We take into account the application requirements involved in the defense cost in a parameter γ. Then for A1, γ can be expressed as,
where α is a system parameter that indicates the requirement of an application,
Similarly to A1, γ for A2 can be expressed as:
where β is a system parameter, N is the number of all the sampled times, and n is the number of the deleted sampled times in the N sampled times.
5. Analysis of Attack and Defense Game
In this section, we study both the passive and active attack scenarios. In the passive attack scenarios (A1 and A2), we consider games of complete and incomplete information. As the adversary can obtain the victims’ positions accurately in the active attack scenarios (B1 and B2), the defender does not need to infer the side information the adversary obtained. Therefore, we only establish the complete information game in scenario B1 and B2.
The traffic data used in the experiments contains mobility traces of taxis in Beijing, China. It contains GPS coordinates of approximately 28,000 taxis collected in a month in Beijing [33]. The location updates are quite fine-grained the average time interval between two consecutive location updates is less than 30 sec. In the experiments, the adversary tries to identify the trace of one participant (randomly picked from all the participants) by gathering side information. The noisy randomly sampled
5.1. Games for Scenario A1
The adversary has four attack strategies: MLE, MSQ, BAS, and EXP. The defender has two defense strategies: D1, and D2. In strategy D1, we set
5.1.1. Analysis of Complete Information Game
Strategic form for complete information game in A1 is depicted as matrices, as in Table 2. Each player chooses a strategy simultaneously, and has common knowledge about the side information. We assumed that the side information contains 18 pairs of
Complete information game for A1.
We observe that Nash equilibriums depend on the value of the defense cost γ in this game. when the defense cost is zero, that is,
5.1.2. Analysis of Incomplete Information Game
Generally, the defender does not know how much side information obtained by the adversary. To solve the problem, Harsanyi [27] introduce a new player named Nature that turns an incomplete information game into a game with complete but imperfect information. We assume that Nature chooses the probability that side information contains 10 pairs of
Incomplete information game for Al.
When

Optimal defense strategies at different values of
5.1.3. Discussion
In the complete information game, the adversary prefers the strategy MSQ. When the cost of defense measures is 0 or very low, we observe that the defense measures are more effective if there are more inserted bogus traces or deleted sampled times. The method of deleting sampled times is more effective than the method of inserting bogus traces. When the cost of defense cost is high, the defender prefers to do nothing.
In the incomplete information game, the adversary also prefers the strategy MSQ, while the optimal defense strategies depend on the values of
5.2. Games for Scenario A2
In scenario A2, the adversary has four attack strategies: MLE2, MSQ2, BAS2, and EXP2, and the defender has two defense strategies. In addition, we assume that
5.2.1. Analysis of Complete Information Game
Strategic form for complete information game in A2 is depicted as shown in Table 4. The defender has the knowledge about the side information obtained by the adversary. The side information contains 18 pairs of
Complete information game for A2.
5.2.2. Analysis of Incomplete Information Game
Table 5 depicts the incomplete information game in scenario A2. MLE2 is a strictly dominated strategy for the adversary.
Incomplete information game for A2.
Figure 3 depicts the optimal defense strategies at different values of

Optimal defense strategies at different values of
5.2.3. Discussion
In the A2 complete information game, the adversary prefers the strategy MLE2. When the cost of defense measures is 0 or very low, the defense measures are more effective if there are more inserted bogus traces or deleted sampled times. Compared with A1, the method of deleting sampled times in A2 has a lower performance, but still better than the method of inserting bogus trace. When the cost of defense cost is high, the defender also prefers to do nothing.
In the A2 incomplete information game, the adversary also prefers the strategy MLE2, while the optimal defense strategies depend on the values of
5.3. Games for Scenario B1 and B2
Because the goal of the adversary is to identify as many traces as possible, the adversary does not need to use any inference strategy. We assume that the attack strategies are B1 and B2. The defender has two defense strategies, we set
Complete information game for B1 and B2.
When

Optimal defense strategies at different values of α, β for different preformed time.
In the complete information game, the adversary prefers B2 if the time that the attack algorithm performs is short, B1 if the time that the attack algorithm performs is long. It is because that when time is long, the adversary at one position can also meet other victims with a high probability. When
6. Conclusion
In this paper, we analyze data privacy aspects of VSNs by using a game-theoretic model. We first quantify attack power and defense objectives for recording and comparing to the performance. Then, we define an attack and defense game model which can capture the strategy selection decision behavior of the adversary and defender. We also show the effectiveness of defense strategies. Finally, we establish and analyze the complete information and incomplete information games for passive and active attack scenarios based on real world traffic data. Through the analysis results, we show that attack strategies in different scenarios show different performances. More inserted bogus traces or deleted recorded samples show a better performance when the cost of defense measures is small, whereas doing nothing becomes the best strategy when the cost of defense measures is very large. We also present the optimal defense strategy that provides the defender with the maximum utility when the adversary implements the optimal attack strategy. Therefore, our analysis results are useful for designing appropriate privacy protection mechanisms.
(1) (2) (3) (4) (5) for each node i met at (6) (7) remove trace j from (8) (9) remove it from other candidate set; (10) (11) (12) (13)
Footnotes
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
Acknowledgments
The authors would like to thank Xiang Lu from the Institute of Information Engineering, Chinese Academy of Science, for helping revise the typos and grammar mistakes throughout this paper. This study is supported in part by the China 973 under Grant no. 2011CB302902, the NSF China Major Program (60933011, 61202099), National High-Tech R&D Program of China (863) under Grant no. 2013AA011102, Scientific and Technological Pilot Project under Grant no. XDA06040100.
