Abstract
Due to the shared nature of wireless communication media, a powerful adversary can eavesdrop on the entire radio communication in the network and obtain the contextual communication statistics, for example, traffic volumes, transmitter locations, and so forth. Such information can reveal the location of the sink around which the data traffic exhibits distinctive patterns. To protect the sink-location privacy from a powerful adversary with a global view, we propose to achieve k-anonymity in the network so that at least k entities in the network are indistinguishable to the nodes around the sink with regard to communication statistics. Arranging the location of k entities is complex as it affects two conflicting goals: the routing energy cost and the achievable privacy level, and both goals are determined by a nonanalytic function. We model such a positioning problem as a nonlinearly constrained nonlinear optimization problem. To tackle it, we design a generic-algorithm-based quasi-optimal (GAQO) method that obtains quasi-optimal solutions at quadratic time. The obtained solutions closely approximate the optima with increasing privacy requirements. Furthermore, to solve k-anonymity sink-location problems more efficiently, we develop an artificial potential-based quasi-optimal (APQO) method that is of linear time complexity. Our extensive simulation results show that both algorithms can effectively find solutions hiding the sink among a large number of network nodes.
1. Introduction
With the increasing advances of sensing devices and wireless technology, wireless sensor networks (WSNs) have been interwoven into the fabric of our daily life. In particular, WSNs have been deployed to monitor personal health, track targets, and sense pollutants. Those sensor networks typically consist of many resource-constrained sensor nodes and one sink. Each sensor node monitors the underlying physical phenomenon and reports the measurements to the sink in a multihop manner.
In spite of their popularity, the viability and success of those sensor networks hinge on a variety of security and privacy threats. One of the most challenging threats is location privacy, since it cannot be addressed by traditional cryptographic mechanisms [1]. Due to the shared nature of wireless communication media, an attacker can easily eavesdrop on the radio communication either by purchasing her own sensor devices or by leveraging other radio devices capable of monitoring message transmission. Thus, no matter whether messages are encrypted or not, an adversary is able to identify contextual information: where the communication has occurred and who has participated in communication, without accessing the content of messages. For example, an adversary can identify the sender of a message by analyzing the angle of arrival [2], or he can determine the receiver in the similar fashion when the receiver relays a message [3].
Since an adversary can locate both the origin and destination of messages (i.e., sinks) purely by observing the contextual information, the WSN location privacy problem can be divided into two categories: source-location privacy and sink-location privacy. The source-location privacy problem is concerned with preventing attackers from discovering the locations of message sources, which may reveal sensitive position information of assets being monitored, for example, endangered animals. Much effort has been devoted to preserve source-location privacy against a wide variety of attackers, ranging from resource-constrained attackers [2] to powerful attackers that have a global view of network communications [1, 4].
In this study, we focus on preserving sink-location privacy against attackers with a global view. The sink node serves as the aggregating point for data collection and is crucial to assure the availability of a WSN. If the sink node is located and destroyed, the sensed data can no longer be relayed to a data center, rendering the entire WSN useless. Despite the great importance of sink node, the sink-location privacy problem has only been studied under the assumption of resource-constrained attackers [3, 5–8]. When a global adversary is involved, those strategies for resource-constrained attackers become inapplicable. Our work aims to fill in the absence in defending against powerful global adversaries.
To achieve the global view, an attacker can either deploy her own sensors [1, 4] or utilize powerful radio receivers with extremely sensitive antennas to pick up communications across the whole network [1]. As such, a global attacker can derive the location of sinks either by traffic-analysis attacks [5] or packet-tracing attacks [2, 3]. Traffic-analysis attacks utilize the fact that the closer a node is located towards the WSN sink, the higher the number of messages it needs to forward. Thus, moving towards a spot that exhibits a higher message volume can eventually lead the adversaries to find the sink. Packet-tracing attacks lead the adversary toward the travel direction of messages hop by hop till he reaches the sink.
Both traffic-analysis attacks and packet-tracing attacks require no access to the message content but message existence. Additionally, a global adversary can identify every node that has forwarded a message instantly, while most literature [4, 9] assumes that an adversary with a local view can only identify the sender when communication occurs within his observable range. We are unaware of any solutions that can defend against a global adversary, since it is virtually impossible to protect the network against a global eavesdropper [10]. Any local obfuscation created by fake messages cannot confuse a global adversary. For instance, fractional propagation [5] forks a fake message toward a random destination while the real message is forwarded towards the sink, which is likely to mislead an adversary with a local view. However, such an approach cannot deceive the adversary with a global view, since all real messages always arrive at the sink.
One naive defense strategy is to have each node send the same volume of messages as the sink (including both real and fake messages). However, such strategy imposes high energy consumption and is infeasible. To limit the energy conception while enhancing the privacy against a powerful adversary with a global view, we propose to achieve k-anonymity in the network so that at least k entities exhibit the same characteristics as the nodes located close to the sink. As such, they are indistinguishable even to the powerful attackers with regard to contextual communication information.
The concept of achieving k-anonymity [11] was originally proposed to protect personal identity while releasing person-specific data and has been studied extensively in the field of database and data mining. To our best knowledge, our work is the first attempt to apply this concept to preserving sink-location privacy in wireless sensor networks, and there are no other valid approaches dealing with the attacks of a global adversary. We summarize our contribution as follows.
We identify the absence of defense strategies to enhance sink-location privacy against global adversaries. To enhance sink-location privacy, we propose to achieve k-anonymity via an Euclidean minimum-spanning tree-based routing protocol, that is, create k designated nodes in the network. We show that positioning k designated nodes is complex as it affects two conflicting goals: the routing energy cost and the achievable privacy level, and both goals are determined by a non-analytic function. To strike a balance between those two goals, we formulate the problem of k-anonymity routing protocols as a nonlinearly constrained optimization problem. The nonlinearly constrained optimization problem is extremely challenging to solve. To tackle the problem, we design two quasi-optimal algorithms that can obtain the k-node locations closely approximating the optima, and our extensive simulations validate that both algorithms can effectively find solutions hiding the sink among a large number of network nodes.
The rest of the paper is organized as the following. In Section 2, we describe the network model, attack model, and formalize the problem of achieving k-anonymity as a nonlinear optimization problem. We present the routing algorithm for achieving k-anonymity in Section 3. In Section 4, we discuss two approximate algorithms that can obtain quasi-optimal solutions and show our validation effort. Finally, we discuss related work in Section 5 and provide concluding remarks in Section 6.
2. Problem Overview
A wide variety of WSNs have emerged as monitoring and controlling solutions for numerous applications. It is very hard, if even possible, to design a solution applicable to all types of WSNs and to address all attacks. In this section, we specify a popular type of WSNs, which were adopted by several work [12–16]. We formalize the problem below.
2.1. Network Model
We consider a network of wireless sensor nodes that is distributed throughout a bounded environment
2.1.1. Periodic Data Reporting
WSNs can be classified as event-driven or periodic. In an event-driven sensor network, only those sensors that have observed events will generate and deliver messages to sinks in a multihop manner while others remain silent. In a periodic network, each sensor will measure the underlying physical phenomena and will deliver its measurements periodically to sinks. We focus on periodic networks since in such networks, even aggregation cannot eliminate the data traffic accumulation towards the sink [9]. Further, we assume that no aggregation algorithms are applied to the networks.
2.1.2. Homogeneous Network with One Sink
We consider homogeneous sensor networks that consist of sinks and a large number of sensor nodes and are densely deployed in a square. Each sensor node is equipped with an omnidirectional antenna and transmits at the same transmission power level. Without loss of generality, we assume that one sink in the network collects data. We note that our scheme can be easily extended to a network with multiple sinks.
2.1.3. No ACK
We assume that the sensor networks do not rely on acknowledgement packets (ACKs) to achieve reliable communication, since the excessive number of ACKs transmitted by the sink will easily reveal its location. We assume that the sink only passively receives messages. Thus, the sink is hidden, and the adversary cannot pinpoint the location of the sink purely by relying on eavesdropping on ACKs.
2.1.4. End-to-End Data Encryption
We assume that messages are protected by an end-to-end encryption protocol using pairwise keys [17]. Due to the limitation of constrained resources, we do not consider the case where the messages are decrypted and re-encrypted at each hop. Therefore, a message exhibits the same cipher as it travels from the source to the sink.
2.2. Attack Model
We consider a powerful attacker who is able to eavesdrop on all communications across the whole network. The adversary does not actively interfere with regular communications in the network but passively eavesdrops on network communications. Her goal is to find the location of the sink and to compromise the sink via physical contacts. Additionally, according to Kerckhoffs’ Principal [18], we assume the adversary is aware of all protocols being used but does not know the established keys of the network and is unable to decrypt messages.
To find the sink physically, the adversary will perform a two-phase search: (1) the location-mining phase and (2) the visual searching phase. In the location-mining phase, the adversary eavesdrops on the network traffic and identifies a set of nodes that appear to be close to the sink. Given the information on nearby nodes, the adversary will find the sink physically in the visual searching phase.
2.2.1. Phase I: Location Mining
Let
Given
We consider a powerful attacker who is able to perform traffic-analysis attacks and traffic-tracing attacks. Particularly, he is able to obtain two traffic statistics
Consider the example depicted in Figure 1, where a tree-based routing protocol is used and a routing tree is formed with the sink node serving as the root of the tree. After one reporting period t, the adversary will conclude that

An example of a routing tree rooted at the sink. Each arrow points from a child to a parent.
2.2.2. Phase II: Visual Searching
Although
Continuing with the example depicted in Figure 1,
2.3. k-Anonymity
Our goal is to design a routing strategy that can enhance sink-location privacy. Essentially, the risk of breaching the sink-location privacy is caused by the observable asymmetric traffic pattern of the sensor networks. The message traffic volume is the largest at the nodes close to the sink, and the travel paths of messages always end there as well. The basic idea of our approach is to change the traffic pattern such that at least k nodes located at
The main design goal of the k-anonymity routing protocol is to enhance sink-location privacy, and it should also deliver messages without incurring high energy overhead. Therefore, we define a privacy measure and a network efficiency metric to evaluate a routing strategy.
The safety period The energy cost E is the average amount of energy consumed for transmitting one message from each sensor to the sink in one measurement period. Since for the routing strategy, the messages delivered to the sink are also transmitted to the nodes
An ideal routing protocol should provide a long safety period
Problem 1.
3. Routing Algorithm Description
3.1. Algorithm Model
In order to achieve k-anonymity, we propose an Euclidean minimum-spanning tree-based (EMST-based) routing algorithm to create at least k nodes whose traffic volumes are equally high. Consider a network deployed in a square Q, as depicted in Figure 2. The EMST-based routing algorithm partitions the square into k non-overlapping sub-regions

An illustration of the two-stage routing protocol: the intra-region routing and the inter-region routing.
Each message is forwarded in two stages, intraregion forwarding and interregion forwarding. During intra-region forwarding, messages originating from
Interestingly, as a result of constructing an EMST connecting k designated nodes, the number of nodes that exhibit similar traffic statistics as these k designated nodes is larger than k; that is,
3.2. Problem Elaboration
Before updating the problem definition according to two-stage routing, we define the length of the EMST as
According to the two-stage routing protocol, we elaborate the definition of the privacy and network efficiency metrics based on
3.2.1. Safety Period
Quantified by EMST (K)
In one reporting period, the number of messages transmitted by all nodes that are part of the EMST equals the total number of nodes in the network. Therefore,
3.2.2. Energy Cost E Quantified by Hop Counts
We define energy cost as the unit of hop counts. Assume the average hop size across the network is
The average total energy cost for each sensor node consists of intra-region communication
Accordingly, the routing optimization problem defined as Problem 1 can be precisely formulated as follows.
Problem 2.
3.3. Problem Reduction
Problem 2 defines a non-linear optimization problem that contains two variables: the locations of k designated nodes, that is, K, and the partition W. Solving such a nonlinear optimization problem is difficult. Thus, in this subsection we focus on reducing the problem to a simpler version.
We observe that the locations of k designated nodes will affect the inter-region communication energy cost
Next, we present a result showing that, for given locations K, the Voronoi partition is the optimal partition for Problem 2.
Lemma 1.
If
Proof.
We prove the lemma by contradiction. Without loss of generality, we examine the case
Combining the above four cases, we have

An illustration that the Voronoi partition minimizes
For the rest of the paper, we will use the following notation:
Additionally, to reflect the fact that
Problem 3.
4. Quasi-Optimal Solutions
Solving Problem 3 gives us the optimal solution of k-anonymity, that is, the positions of k designated nodes that minimize the total routing energy and guarantee the safety period requirement. However, solving Problem 3 is challenging. First, Problem 3 is related to the problem of finding a set of k points in a constrained planar region such that its Euclidean minimum spanning tree has the length of a given value. To the best of knowledge, such an problem has not been addressed in the literature so far, and it is unknown whether the problem is NP hard. Second, our Problem 3 seeks optimized locations for an energy cost function subject to an EMST constraint and thus creates more difficulties.
Popular methods for solving nonlinear optimization problems, such as the generalized reduced gradient [19], are inapplicable to solve Problem 3, because those methods leverage the first or second derivative of the objective function to search for the optimal solution and the derivative of
To facilitate discussion, we summarize the notation convention of optimal solutions to Problem 3 and its reduced subproblems in Table 1.
A summary of notations.
4.1. Minimizing
The objective function consists of two components:
Problem 4.
Problem 4 is still a nonlinear optimization problem with an objective function whose derivative is difficult to calculate. We choose to exploit the widely adopted genetic algorithms (GAs) to find the optimal solution. GA mimics Darwin's theory about evolution. It iteratively generates a set of solutions known as a population and selects a subset of solutions to form a new population based on each solution's “fitness.” The fitness level of a solution can be evaluated using the objective function of the optimization problem. “Fitter” solutions will be selected with higher probability while “weaker” solutions will still have chances to be selected. As a result, GA is likely to escape from local optima and evolves to the global optima with high probability. Thus, we call the solutions obtained by GA as optimal solutions.
We call our customized genetic algorithm that searches for optimal solutions of Problem 4 as

An illustration of the optimal locations
Remark 2.
From Figure 4, we observe that for each optimal layout the designated nodes are distributed almost uniformly across the network, and the network area Q is partitioned into regions with similar sizes. This observation can be intuitively explained by rewriting (12) as
Remark 3.
We depict

The routing efficiency measure: the intra-region energy

The privacy measure: a comparison of estimated
To estimate the relationship between
4.2. GA-Based Quasi-Optimal Algorithm
Analyzing Problem 4 utilizing GA provides important insights towards solving the original routing optimization problem defined in Problem 3. In this subsection, we introduce a GA-based quasi-optimal algorithm (GAQO) that can obtain an approximate optimal solution for Problem 3. In particular, the GAQO algorithm provides the quasi-optimal solution
Problem 5.
We will show that the quasi-optimal solutions for Problem 5 closely approximate the solutions for Problem 3 empirically. Intuitively, according to Remark 3, a slight change of
4.2.1. Approximation Evaluation Metric
To evaluate how close the solutions obtained by the GAQO algorithm approximates the optima, we define the approximation evaluation metric μ as the energy difference between
Lemma 4.
Proof.
(Second inequality.) By definition, for a given k,
(First inequality.) For a given k,
4.2.2. Algorithm Walk-Through
Searching optimum
(1) k = (2) (3) (4)
Step 1.
Call
Step 2.
For the given k, find an optimal layout
Step 3.
Shrink or expand
We note that the aforementioned GAQO algorithm, though not optimal, does approximate optimal solutions.
Example 5.
Here, we illustrate how the GAQO algorithm achieves k-anonymity for a given safety period

The red “⋄” points are the optimal designated node locations that minimize
4.2.3. Evaluation
To evaluate how close the solutions obtained by the GAQO algorithm approximate the optimal solution, we performed an empirical study. In particular, we used the same network setup as before and searched for the quasi-optimal solutions in a 2500-node network deployed in the

The algorithm approximation measure: the upper bound of the difference between the quasi-optimum (using the GAQO algorithm) and the global optimum (using
4.3. Artificial Potential-Based Quasi-Optimal Algorithm
The GAQO algorithm can obtain quasi-optimal solutions of the k-anonymity sink-location problem. However, our simulation study shows that the run time of
Artificial potential (AP) [20] (aka. artificial physics in some literature as opposed to natural physics) was originally developed for the purpose of obstacle avoidance. Later, it was used as a distributed control strategy to solve self-deployment problems of WSNs. The approach is simple enough to let each entity exert forces on other nearby entities and respond to forces from them; yet a uniform distribution will eventually emerge. Since the approach is largely independent of the number of entities, it scales well for large sets of entities. We take advantage of the linear time complexity of an AP-based method to solve the k-anonymity sink-location problem, since searching for optimal solutions of k designated nodes is equivalent to deploying nodes uniformly across the network (according to Remark 2).
We built our APQO algorithm on the AP-based self-deployment algorithm proposed by Ding et al. [21], whereby sensors are deployed into uniform lattices inside a bounded region. We start by assuming the k designated nodes can move to any position inside the network area and we denote
4.3.1. AP Definition
Two types of artificial potential functions are defined for every node i:
We define
The relationships between

In addition, we define the total potential as
To distribute k nodes approximately uniformly inside the network area is equivalent to finding
4.3.2. Algorithm Walk-Through
Overall, the APQO follows the similar framework as shown in Algorithm 1. For a given
We listed the pseudocode of
k; (1) (2) (3) (4) Error = (5) (6)
Step 1.
Initialize the locations of the k nodes
Step 2.
Obtain the gradients
Step 3.
Select the sensor nodes that are closest to
We use the following lemma to show that the
Lemma 6.
The AP-based algorithm is convergent; that is,
Proof.
Taking the derivative of V, we obtain
Let
4.3.3. Evaluation
Similar to the GAQO algorithm, we have defined an approximation evaluation metric

The algorithm approximation measure: the upper bound of the difference between the quasi optimum (using the APQO algorithm) and the global optimum (using
Performance Comparison
The length of

Comparison of

The quasi-optimal locations of k designated nodes which approximately minimize
Time Complexity Comparison
Since the majority of the run-time for the GAQO and APQO algorithms is contributed by executing
k-Anonymity Evaluation
We evaluated how effective the EMST-based routing protocol can change the traffic pattern around the sink. Let the node that is closest to the sink be

The number of nodes that exhibit the same traffic statistics as the nodes around the sink.
5. Related Work
Protecting the identity of traffic sources has been extensively studied in the context of general networks, where the usage of a series of intermediate mixes and onion routing [24] was proposed to cope with traffic analysis. The problems of tracking users’ paths in wireless networks with location-oriented services were studied by Gruteser and Grunwald [25] and Hoh and Gruteser [26], and they proposed a path perturbation algorithm to increase source location anonymity. Since sensor networks have constrained resources, those methods are not applicable there.
In the context of wireless sensor networks, both source-location privacy and sink-location privacy have attracted attention from the research community. Source location privacy focuses on protecting the message source, since such information can reveal sensitive position information of the target that is close to the message source. Preserving source-location privacy against a local adversary was first studied by Kamat et al. [2], where fake message injection and phantom routing are proposed to prevent a local eavesdropper from discovering the message source through hop-by-hop traces.
The problem of preserving source-location privacy under a global eavesdropper has been studied extensively [1, 4, 27, 28]. Mehta et al. [4] have proposed periodic collection and source simulation techniques to prevent the leakage of message source location, and Yang et al. [1] have introduced dummy traffic to hide the real message source. Ouyang et al. [27] have devised a set of privacy-preserving algorithms involving sending periodic maintainable messages to address a laptop-class attacker who has longer radio range and can eavesdrop on all communications in a sensor network. A notion of statistically strong source anonymity is proposed by Shao et al. [28], and a strategy called FitProbRate has been proposed to achieve statistically strong source anonymity with a reduced real event report latency.
In the areas of enhancing sink-location privacy, Deng et al. [9] have shown that traffic analysis can reveal the location of sinks and proposed several antitraffic analysis countermeasures to hide the direction of data flow and create fake sink locations that exhibit artificially high traffic. In their follow-up work [5], multiple parent routing, controlled random walk, random fake paths, and combinations of all three routing algorithms have been studied to generate randomness against traffic rate monitoring and traffic path direction attacks. Location privacy routing (LPR) [3] utilizes probabilistic routing and fake message injection to deceive an adversary from tracking the direction of traffic flow. Conner et al. [29] proposed the decoy sink protocol, whereby data are forwarded to a decoy sink for aggregation before they are relayed to the real sink. As a result, the traffic volume near the sink is reduced while decoy sinks exhibit high traffic volume, which makes traffic analysis attacks difficult. Liu and Xu [7] presented a zeroing-in attack that can be launched by resource constraint adversaries and proposed a random walk-based defense strategy. Gu et al. [6] proposed a privacy-preserving scheme which obfuscates the sink's location with dummy sink nodes and can help secure existing mobility control protocols against attacks. However, those strategies cannot cope with a global adversary.
To deal with global adversaries, Ngai [8] proposed randomized routing with hidden address (RRHA), whereby packets are routed from the source to the sink along a random path and the destination field is not included in the header of the packets. Such a routing protocol does provide sink anonymity, but the packet may not reach the sink at all. Additionally, Nezhad et al. [10] designed an anonymous routing protocol to preserve the sink-location privacy against a global adversary. However, their global adversaries are only capable of packet-tracing attacks not traffic-analysis attacks. In this paper, we focused on addressing the problem of enhancing sink-location privacy against a global adversary capable of both attacks, while assuring that messages will arrive at the sink.
Artificial potential was originally developed in Khatib [20] for the purpose of obstacle avoidance. Later, it was used as a distributed control strategy for a large number of entities to achieve certain geometric configurations, such as in coverage and connectivity problems of WSNs [21, 30] and formation and flocking problems of collective artificial agents [31]. Since the approach is largely independent of the size and number of entities, the results scale well to larger sets of entities. We take advantage of the linear time complexity of this method to solve a nonlinear optimization problem that defines the k-anonymity sink-location problem.
6. Concluding Remarks
Wireless sensor networks rely on the sink to collect the measurements across the entire network; thus it is essential to protect the location information of the sink. However, the traffic around the sink typically exhibits distinctive patterns, and an adversary with a global view can identify the location of the sink by measuring the traffic statistics of the entire network. In this study, we addressed such a threat, and we proposed an EMST-based two-phase routing algorithm that can achieve k-anonymity of the sink. In particular, the network is partitioned into k regions with each containing one designated node. Messages are first delivered to one designated node and then forwarded onto the EMST that interconnects all other designated nodes. The two-phase routing algorithms can effectively create many entities that exhibit the same traffic pattern as the nodes located close to the sink.
The positioning of k designated nodes affects two conflicting goals: the routing energy cost and the privacy level of the sink's location, and thus we formulated it as a nonlinear optimization problem. To tackle this problem, we first utilized a genetic algorithm to search for quasi-optimal solutions and developed a genetic algorithm-based quasi-optimal (GAQO) algorithm that can obtain solutions which closely approximate global optimal solutions. Further motivated by the observation that the quasi-optimal solution partitions the network into areas with similar sizes, we designed an artificial potential-based quasi-optimal (APQO) algorithm that can also obtain a quasi-optimal positioning of k nodes but which requires significantly reduced run-time. Our simulation results validated that both algorithms can effectively derive the positions of k designated nodes which meet the requirement of privacy at the minimum routing energy cost.
Footnotes
Acknowledgments
Thw authors thank Dr. Jianjun Hu for his feedback on the genetic algorithms. This work is partially supported by the National Science Foundation Grant CNS-0845671.
