Abstract
Providing source privacy is a critical security service for sensor networks. However, privacy preserving in sensor networks is a challenging task, particularly due to the limited resources of sensor nodes and the threat of node capture attack. On the other hand, existing works use either random walk path or fake packets injection, both incurring tremendous overhead. In this work, we propose a new approach, which separates the sensor nodes into groups. The source packet is randomly forwarded within and between the groups with elaborate design to ensure communication anonymity; furthermore, members of each group exchange encrypted traffic of constant packet length to make it difficult for the adversary to trace back. One salient feature of the proposed scheme is its flexibility of trading transmission for higher anonymity requirement. We analyze the ability of our proposed scheme to withstand different attacks and demonstrate its efficiency in terms of overhead and functionality when compared to existing works.
1. Introduction
Wireless Sensor Networks (WSNs) are explored to be used in many applications ranging from military strategy to civilian purposes. Although a sensor node has certain limitations like low processing capabilities and battery supplement, the smaller size and the cost of the sensor node helps to be unspectacular in the sensing area and is useful for tracking events without being recognized.
The sensor nodes usually adopt wireless communications mode when deployed in open and harsh environments, which makes data transmissions overheard by the vicinity sensor nodes or other wireless devices. Without precaution, the adversary can overhear packet transmission and trace back the packet to the source. This can lead to the leakage of source position and the time of event packet taken place. In view of a mission critical military application, any revelation of information such as time or event location can be beneficial to the adversary and costly to the network goal. Therefore, privacy of the monitored event holds great importance that requires providing privacy to source nodes.
Source privacy is usually compromised by the contextual information of a packet, but not by the actual content of the packet. Specifically, a packet consists of a payload and a header. The payload part can be encrypted with cryptography methods to prevent adversaries from learning the information carried in intercepted packets, while the header part has to be left in clear to provide multihop routing. Hence, attackers can learn packet original source and final destination from packet headers. A viable solution is allowing neighboring nodes to establish pairwise keys whereby to encrypt packet sources and destinations. However, this countermeasure may become invalid in the presence of compromised nodes. Therefore, source privacy cannot be addressed by encryption alone.
Traffic analysis [1] enables attackers to infer the network traffic pattern and real packet sources/destinations. If noticing that a few nodes often act as packet source, the adversary may think these nodes are important and launch targeted attacks on them. Providing communication anonymity has been regarded as an effective solution against traffic analysis [2–4] because the adversary can no longer distinguish true sources from intercepted packets.
Plenty of work is guided towards applying some form of simulating the source [5–9] or performing a random walk [5, 10] to guarantee source privacy. One primary drawback of these approaches is that a large amount of overhead is incurred to simulate a source or to redirect traffic randomly. Besides, recent works only consider eavesdropping attacks; however compromised nodes are not considered as part of the threat model where an adversary can easily capture any of sensor nodes. In this work, we consider a rather strong threat model in which the adversary can compromise nodes and is able to eavesdrop over the network communication. An example of such an adversary is a laptop class attacker who has more powerful capability to execute higher strength calculation. When a node is compromised, the adversary has access to all the cryptographic information along with data packets of past communications stored at the node.
This paper aims to protect source privacy against both passive attacks (global eavesdroppers) and active attackers (node compromise attack) as well as to guarantee fundamental security requirements such as confidentiality, data integrity, authentication, and nonrepudiation.
The contribution of the paper is as follows. We present an efficient and lightweight source privacy guaranteeing mechanism (ELSP), which applies Identity-Based Cryptography (IBC) method to complement other techniques. First, sensor nodes are organized into pairwise disjoint groups (or sets) to hide real packet sources among crowds of nodes. In order to conceal source identity, packet sources use pseudonyms instead of their real IDs so that any other nodes cannot ascertain the initiators of received packets. Our work differs from previous works in that pseudonyms generation does not introduce a central authority. Second, to provide strong communication anonymity, a random packet-forwarding strategy is presented. The source no longer sends packets along the shortest path to the destination. On the contrary, it randomly selects several forwarding nodes within each group to confuse the adversary. More importantly, members of each group exchange encrypted traffic of constant packet length to make the path untraceable. Each intermediate router (referred to as key node), only knowing its predecessor and successor, strips off one layer of encryption, and eventually, the receiver obtains the packet in plaintext. Finally, ELSP provides provably strong source anonymity against different attacks. One salient feature of ELSP is its flexibility of trading transmission for higher anonymity requirement. A detailed analysis of ELSP is provided, and a comprehensive comparison with existing schemes is presented to show its effectiveness and efficiency.
The rest of this paper is organized as follows. Section 2 reviews the existing work on source privacy, and Section 3 presents the models and design goals. Next, we describe the ELSP scheme in Section 4, followed by the detailed analysis and performance evaluation in Section 5. Finally, we have the conclusions in Section 6.
2. Related Work
Recently, source privacy in sensor networks has drawn widespread concern. Kamat et al. proposed a phantom routing protocol for flooding and single path routing [5]. They were the earliest researchers to study source privacy and present multiple techniques to guarantee the objective. One technique uses fake sources with nodes sending fake packets to mislead the adversary. The other technique is called phantom routing which takes a random walk before forwarding the packet towards the base station in order to increase the cost of the adversary to backtrack to the source. Although the schemes are robust, they have a large overhead involved and can not withstand the collaborative attacks.
Mehta et al. similarly propose two schemes called per iodic data aggregation and source simulation to overcome global eavesdropping attacks [6]. The source simulation scheme is similar to the fake sources technique proposed in [5]. In per iodic aggregation scheme, each node reports back to the base station per iodically, regardless of whether it detects an event or not. The drawback of per iodic collection scheme is the latency incurred as well as overhead, while in source simulation scheme, it is the overhead introduced.
Wang et al. [10] propose a parallel-routing protocol to maximize the time for adversary traceback to source. The packets from the same source are passed through different paths to the base station. Furthermore, a weighted random stride routing is presented that breaks the entire routing into rounds. Li et al. [11] adapt the conventional function of data mules to design a new protocol for securing source location privacy, namely, the Mules-Saving-Source (MSS) protocol, which provides a-angle anonymity. Although they are nice schemes, one of the prerequisites is that the sensor location should provided for determining the forwarding angle. Moreover, they be fail in protecting source privacy in case of a global eavesdropping adversary.
In [12], the authors propose a scalable hop-by-hop authentication scheme based on elliptic curve cryptography (ECC), which enables any intermediate node to transmit an unlimited number of messages without determining the degree of threshold suffering in the polynomial-based scheme [13, 14]. They further propose a scheme to provide source location privacy through routing to a randomly selected intermediate node and a network-mixing ring [15]. Although these two schemes can provide comparable source privacy, the first scheme brings another problem of handling the selection of the AS (ambiguity set), while the second scheme again involves taking a random walk.
Nezhad et al. devise a label-switching based technique to meet source and base station anonymity [16]. One of the restrictions is the requirement of global network information with which the base station constructs a routing tree. Each link of the routing path has a separate label when a node receives a packet, it changes the label of the packet to the upstream link and the packet gets propagated. Except for the base station demanding global knowledge of the network, each node has to perform exhaustive processing and reconstruction of the packets routing over them.
Shao et al. proposed FitProbRate [8], an exponentially distributed dummy traffic generation scheme, to maintain source anonymity. This work differs from other similar works with the dummy traffic generated at a dynamic rate decided by the Fitprob parameter. It is a great improvement over source simulation and fake sources but still has the drawback of having overhead due to dummy packet generation.
There is some other literature that considers privacy issue in WSNs. Chai et al. [17] provide the sink-location privacy against a powerful adversary with a global view, but it not considered the source privacy, which is the main focus of this paper. Chow et al. [18] focused on providing both location-monitoring services and source privacy. Di Pietro and Viejo [19] addressed the problem of querying by the base station to provide the MAX of sensor-stored readings and get a trade-off between accuracy of the result and overhead. Li and Hwang [20] propose a lightweight anonymous routing protocol in secure wireless ad hoc networks. Therefore, they are orthogonal to our paper.
All the works discussed by now just consider a passive adversary. Most consider a local eavesdropping adversary with a few providing solution about global eavesdropping adversary. In ELSP, we consider an eavesdropping adversary having node compromising capabilities. We present a packet-altering scheme, which has lesser overhead compared to existing schemes. Also, when compared to a label-switching scheme as [16], ELSP does not need every node on the path to perform packet transformation and does not need the base station to be aware of the network topology.
Our work is partly inspired by Mix Nets [2] and Crowds [4]. In particular, Crowds is used for intragroup communications; each packet travels a random path whose length follows a given geometric distribution. Since each packet may traverse a group through different number of nodes, each group in fact serves as a virtual mix through which packets are mixed.
3. Models and Design Goals
3.1. Network Model
We consider the sensor network comprised of homogeneous sensor nodes spread over a wide area. The sensor nodes are responsible for detecting events and reporting back to the base station. The base station is assumed to be secure and has unlimited resources when compared to the general sensor nodes. The occurrence of the events can be irregular and random in nature. ELSP can be used for many applications requesting source node privacy protection. A class of applications is used to track endangered animals or birds. The existences of such species need to be preserved from hunters or poachers as they have potential market value; meanwhile, they need to be studied. In a word, we consider the sensor network deployed in a wild for sensing endangered animals (e.g., South China tiger). It is a homogeneous network with small size sensor nodes dispersed over a vast area. Sensor nodes sense their environment for the presence of endangered species and report back to the base station. Note that multiple sensor nodes can detect the event simultaneously, and they will independently report it back to the base station.
The base station collects the packets and identifies the emergence of the tracked object and studies the hunting way and their living conditions. Given the tracked object being swoop and swift, we can have multiple nodes detecting the event in a specified time, and then not having any detection for a long time. We are based on the following consideration: if the animal appears somewhere, it did so to either prey or to have a rest, and this increases the possibility of the animal haunting to that location thereby needing source location privacy for the detecting packet. The event in some applications can be sporadic, but there are still other communications between the sensor nodes, resulting in the generation of packet traffic.
3.2. Attack Model
Since the high profits are brought from animal hunting, it is no wonder that the adversaries would equip themselves with advanced equipment, which means they have overwhelming technical advantages over general sensor nodes, such as sufficient energy resource, powerful computation capability, and large storage space. Therefore, the adversaries can overhear communication at much larger distances compared to a sensor node. They also possess the ability to compromise nodes. Specifically, the adversaries can operate the following two modes.
3.2.1. Global Eavesdropping Mode (External Attacks)
The adversary in this mode can carry out passive attacks, such as eavesdropping of the communications and can correlate the transmission of the packet over multiple hops. Although the adversary may not able to ascertain the contents of the packet, it has the capability to compare two packets. Note that the adversary will not interfere with the function of the network, such as destroying sensor nodes, altering the routing path, or modifying packets because such activities can be easily identified.
3.2.2. Stealth Mode (Internal Attacks)
The adversaries in this mode can compromise sensor nodes and get access to all the cryptographic information stored on them. They can further decode the packet and get information as available to any rightful sensor node. The adversary can compromise sensor nodes randomly from the network or geographically close to each other (e.g., neighboring nodes).
The goal of an attacker is to acquire the location and time of event occurrence, either by passive eavesdropping or active node compromise. In the worst case, the adversary may employ both.
3.3. Design Goals
The objective of this paper can be summarized as follows.
The adversary can not get the source information by analyzing the traffic pattern. The adversary can not get the source information, even though a few network nodes are compromised. Only the base station can distinguish the source location through the messages received. The recovery of the source information from the received message should be very efficient. The length of each message must be as short as possible to save the energy of sensor node.
3.4. Notations
For the sake of clarity and convenience for the readers, we list some major notations in Table 1 to be used throughout this paper.
List of used notations.
4. The Proposed ELSP Scheme
In this section, we first give the basic idea of ELSP, and then elaborate on its design.
4.1. Outline of ELSP
In ELSP, sensors are divided into pairwise disjoint sets called groups. We take source S and destination D as an example to present the basic idea of ELSP. Without loss of generality, we assume that S and D are in different groups.
To secretly send packets to D, node S randomly picks one node from every group which is called a key node through which every packet will be routed. ELSP adopts the idea of mix-nets [2]. Specifically, S packs the message for D by several layers of encryption. The packed message is then routed through the key nodes; each of them strips off one layer of encryption and then transmits to the next key node. By constructing the packet appropriately, we can guarantee that every key node knows neither other key nodes nor how many key nodes separate it between S and D. Source S is thus hidden from all the key nodes.
ELSP integrates several techniques to thwart both internal and external attackers. For example, to cover the transmission behavior, S first sends the packets to a randomly chosen group peer instead of directly sending it to the first key node. Then, after receiving a packet from its group peer, each node will send the packet to the first group with some probability, say ρ, or randomly choose another group peer to which the packet is sent. Finally, the packet will be sent to a randomly chosen proxy node in each group other than the first key node, which in turn forwards the packet to the first key node. We will show that this method assists in hiding the first key node from global eavesdropper. The packet will be forwarded in subsequent groups in a similar way. Furthermore, ELSP requires the members of every group to exchange encrypted packet of constant length to prevent attackers from tracing the packet. All these measures are attempted to make it difficult for the adversary to locate packet sources with tunable communication overhead.
An example is illustrated in Figure 1, where each group

A diagram of ELSP.
In the following section, we will give the design details of ELSP, including group formation, group traffic maintenance, key distribution, and packet forwarding.
4.2. Detailed Description of ELSP
4.2.1. Group Formation
It is quite common that sensor nodes are deployed in groups; that is, a group of sensors are deployed at a single deployment point, and the probability distribution function (e.g., a two-dimensional Gaussian distribution) of the final resident points of all the sensors in each batch (or group) are the same [21, 22]. We assume such a group-based deployment, and we model the deployment knowledge as follows.
We consider a sensor network with N sensor nodes. Before the network deployment, the network owner divides N sensor nodes into M pairwise disjoint groups denoted by
Sensor nodes in each group
4.2.2. Key Distribution and Agreement
As with other schemes, ELSP requires sensor nodes to establish appropriate cryptographic keys. In this work, we assume a key distribution scheme (e.g., [23]) based on Identity-Based Cryptography (IBC) [24]. However, ELSP can also rely on other suitable key distribution schemes by taking advantages of deployment knowledge of the deployed sensor nodes in a sensor network [25].
Since ELSP targets a single owner WSN, there exists a trusted authority (TA) to bootstrap the network. Before the network deployment, the TA selects a large prime q, a master secret key
Each sensor node, say V, is equipped with
In ELSP, any pair of sensor nodes, say U and V, can establish a shared key independently without communicating with each other. In particular, U and V compute
4.2.3. Group Traffic Maintenance
Generally, sensor node acts as a networking repeater to route packets to and from other nodes; it seems that WSN has natural source anonymity since the adversary is unable to distinguish whether a sensor node just forwarded a given packet or has initiated it. However, this argument holds only when the outgoing traffic rate of the node is not greater than its incoming traffic rate. Otherwise, the adversary can ascertain that the node has initiated some traffic, even if he cannot determine what packets originated from that node. For example, a sensor node intended to send more packets than others indicates that it probably detects animals' activities. ELSP thus must prevent this from happening.
In ELSP, in order to hide packet sources, we insert garbage data to keep the packet length constant all the time. Furthermore, any two nodes in group i, for
For example, nodes U and V are both in group i which exchange packets as follows:
Eavesdropper cannot ascertain whether sensor node U initiated a data packet, as he is unable to differentiate data from each other. ELSP also guarantees that even node V cannot ascertain whether DATA was just forwarded by U or actually originated from node U; this protects U from internal attackers if any. Intergroup traffic and intragroup traffic are of the same fixed length, which can prevent attackers from inferring any useful information from packet-length changes [1].
4.2.4. Packet Construction
Now we describe the construction of packets. Assume that source
Step 1.
Select one key node for each group. To complete this, node S has to rely on the normal routing protocol (e.g., min hop routing) to find the node IDs on the forwarding path between S and D. Fortunately, this can be done in the network initialization with the base station simply broadcasting a message to the whole network. The base station adds two fields to the header of this message, which are “node-in-route” (NIR) field and group ID (GID) field. Initially, these two fields are empty. Starting from the base station, whenever a node propagates the message to the next hop, the node ID and group ID of the upstream node are appended to the NIR and GID. Nodes included in NIR are excluded from the random pick at the next hop. This nonrepetitive propagation terminates until reaching every node. Finally, each node has a path node ID list (NIDL) and the corresponding group ID list (GIDL) between itself and the base station. Node S randomly picks one node from its NIDL for each group as the key node through which every packet will be routed.
Step 2.
Choose a unique random integer
Step 3.
Compute a shared key with each key node
We use the following approach to prevent each key node from knowing its distance from source S or destination D. Assume that destination D is in layer l,
Source S then derives
Suppose that S has another message for the same destination D; it can generate a new packet by just replacing the message part Ω. That is, the same set of key nodes can be used in multiple messages between S and D. However, S has to change the set of key nodes per iodically if it has many messages for D to prevent the predecessor attack [27]. Besides, different event packets from the same source should have different pseudonyms in case the two packets are correlated by the adversary.
4.2.5. Packet Forwarding and Processing
Packet
Once a node in
Then it attempts using
Furthermore,
The packet forwarding is terminated at the last key node
Since the same set of key nodes is used in delivering multiple messages from S to D, this can significantly reduce the computation overhead. Specifically, each
4.2.6. An Example
In order to have a better understanding with ELSP, we take the same example in Figure 1, where
Source S forms
Node
By virtue of pseudonym, each
5. Analysis and Performance Evaluation
In this section, we first give the communication and computation overhead of ELSP, and then its security about source anonymity. Finally, we conduct the simulation to demonstrate the efficiency of ELSP when compared with existing works.
5.1. Overhead Analysis
5.1.1. Communication Cost
In ELSP, a fixed packet size is used to prevent the attackers from inferring any useful information from packet-length changes. Each packet contains a constant length message part and α path objects of equal length. If the message is not long enough, it has to be padded to keep the fixed length. For convenience, we choose
The length of
Now we discuss T, the number of end-to-end packet transmissions cost by each message from S to D. Suppose for the sake of simplicity that no two consecutive packet forwarders are the same, and that no key node is selected as a proxy node. Let
5.1.2. Computation Cost
In ELSP, the most time-consuming task is undoubtedly the pairing
5.2. Security Analysis
Packets in ELSP use pseudonym instead of real source ID so that key nodes can not determine the initiators of received packets. Further, even the base station (a key node as well) cannot ascertain who sends it the message if the source node does not reveal its real identity in the message. This implies that the adversary can not directly determine packet sources. However, the adversary can assign a probability to each sensor node for being the source of a given packet. Therefore, we will investigate the resilience of ELSP against such probabilistic attacks.
We first use two entropy-based metrics to evaluate source anonymity. Then we describe the random forwarding approach of how to prevent internal attackers from determining key nodes. Finally, we evaluate the capability of ELSP providing source anonymity and show its efficiency compared with existing works.
5.2.1. Anonymity Metrics
Pfitzmann and Hansen [30] first defined anonymity as “the state of Indistinguishable within a set of all possible subjects, anonymity set.” Since then, anonymity set has become a popular metric to evaluate the anonymity in various anonymous communication systems. Generally, the greater the anonymity set is, the better anonymity achieved. However, it is unable to reflect the possibility that the adversary assigns different probabilities to each node as being the source of a given packet. This problem was solved by an entropy-based metric proposed in [31]. Briefly speaking, let
5.2.2. Efficacy of Random Forwarding
We denoted source S a key node by
We assume that a set of
Theorem 1.
Let one suppose that the first compromised node B in group i, for all
Proof.
Let
Let
Likewise, we have
At last, if B has no other information, all the noncompromised nodes in group i are equally likely to be the key node
Substituting (18), (19), and (20) into (17) and then (16), we finally obtain
Based on (21), we can further deduce that
when when
Note that all the other
5.2.3. Source Anonymity Measurement
It should be noted that besides the event packets, there are still other communications among the sensor nodes involving a large number of packets, which makes it infeasible for the adversary to precisely separate different communication sessions from eavesdropping [1]. This allows us to focus on the impact of internal attackers. For simplicity, we still consider the session from source S to destination D.
We first discuss the impact of consecutively compromised key nodes. Suppose that the adversary compromised C out of the overall N sensor nodes. Because of the layered encryption, the same information appears entirely different across packet layers so that only key nodes
We also consider an extreme case where there is at least one compromised node serving random packet forwarding in each group. For convenience, we assume that each group comprises the same number of n sensor nodes (i.e.,
Theorem 2.
Let the
Proof.
The proof is directly based on Theorem 1. Since B is in source group
Likewise, each noncompromised node except A in
Finally, we can obtain the source anonymity entropy using (14) and the source anonymity degree using (15).
5.3. Performance Evaluation
We first give numerical results to demonstrate the effectiveness of ELSP, and then make a comparison with existing schemes by simulations. Unless specified otherwise, we assume that each sensor node is compromised independently with probability 0.1, which is considered a severe situation. Due to space limitations, we omit the calculation details.
5.3.1. Numerical Results
Figure 2 illustrates the parameters α,

Impact of α, ρ, and
Figure 3 shows the impact of the total number of compromised nodes C on source anonymity, where

Impact of C on source anonymity.
The source privacy is also relevant to the group size n (cf. (24)). Figure 4 shows the impact of the group size n on source anonymity, where

Impact of n on source anonymity.
5.3.2. Comparison
In this subsection, we use both functional analysis and simulations to get a comprehensive comparison with DCARPS [16], random walk [5], and fake event packet generation [6, 8].
First, we compare the difference between DCARPS and ELSP from the technique implementation. In particular, DCARPS adopts label-switching approach, while ELSP uses the pseudonym and intermediate packet-altering scheme to conceal source traversal information. ELSP makes much less assumptions while considering a strong threat model as well as bearing the lightweight design in mind. DCARPS requires the base station to know the topology information of the network. ELSP only needs the information of intermediate nodes. In ELSP, only a small number of nodes are chosen to reconstruct the packet before forwarding, whereas in DCARPS, all the nodes involved in delivering the packet have to perform decryption and re-encryption operation with the new label. Further, in ELSP, this construction method allows the base station to verify the packet, while in DCARPS it is only used for routing. More importantly, node compromise attacks are not considered in DCARPS, which is the main attention we focus on ELSP.
Second, we conduct the simulation in NS2 to show the effectiveness of ELSP and compare it to random walk technique [5]. In Figure 5, we show the overhead (energy consumption) of random walk-based methods and ELSP. For fair comparison, sensor nodes in random walk-based methods are also organized into pairwise disjoint groups; the x-axis represents the longest path length of 50 hops between source and destination. Note that the overhead of random walk-based methods includes propagation over the random path and the journey towards the base station. The random walk lengths (denoted by H) are 10, 15, 20, and 25, respectively. The rabbling probability ρ is chosen as a stable status of 0.75 for ELSP, which is equivalent to

Overhead comparison between random walk-based schemes and ELSP.
In Figure 6, we set

Privacy comparison between random walk-based schemes and ELSP.
Finally, Figure 7 shows the overhead incurred in fake packet generation schemes [6, 8] compared to ELSP. We consider different numbers of fake packet-generating nodes (denoted by F), with the numbers ranging from 20% to 50% of all nodes. The x-axis represents the number of actual source nodes in the network as a percentage of all nodes. As shown in Figure 7, the larger the source nodes in the network are, the smaller the overhead incurred. We can conclude that any form of fake packet generation technique will have significantly higher overhead compared to ELSP. Even the conservative case of 20% of sensor nodes generating fake packets leads to higher overhead compared to ELSP. This is because the fake packet generation is an obfuscating technique which is only successful under the heavy load of fake packet generation. When a sensor node detects an event and generates the event reporting packet, there should be at least one more source node generating a fake event-reporting packet. On this occasion, we see that 50% of the traffic corresponds to fake traffic. Moreover, to increase security will incur more overhead in the form of fake packets. Consequently, the realization of such a system is dependent on the amount of fake packets generated which still provide very minimal source privacy and have more overhead than ELSP. In addition, the fake packet generation schemes do not consider node compromises.

Overhead comparison between fake packet generation schemes and ELSP.
6. Conclusions
Sensor node detecting the event is important, as it can reveal the event occurrence location and time occurrence; thus, needs to maintain source privacy. Previous works consider an eavesdropping adversary and do not provide countermeasures to an intrusive node compromise attack with global eavesdropping capabilities. In this paper, we presented the design and evaluation of a novel anonymous mechanism for WSNs. By utilizing the grouping, the self-generated pseudonym, and the Identity-Based Cryptography, the proposed protocol is demonstrated to achieve desired security objectives and efficiency. As future work, we will seek to analyze the security of our scheme under other adversary models.
Footnotes
Acknowledgments
The authors acknowledge support from the National Natural Science Foundation of China (Grant nos. 61173136, 61202462, 61173141, and 61232016), and the fund support of the key subject of Fujian province—Computer Application Technology.
