An Efficient and Lightweight Source Privacy Protecting Scheme for Sensor Networks Using Group Knowledge

Abstract

Providing source privacy is a critical security service for sensor networks. However, privacy preserving in sensor networks is a challenging task, particularly due to the limited resources of sensor nodes and the threat of node capture attack. On the other hand, existing works use either random walk path or fake packets injection, both incurring tremendous overhead. In this work, we propose a new approach, which separates the sensor nodes into groups. The source packet is randomly forwarded within and between the groups with elaborate design to ensure communication anonymity; furthermore, members of each group exchange encrypted traffic of constant packet length to make it difficult for the adversary to trace back. One salient feature of the proposed scheme is its flexibility of trading transmission for higher anonymity requirement. We analyze the ability of our proposed scheme to withstand different attacks and demonstrate its efficiency in terms of overhead and functionality when compared to existing works.

1. Introduction

Wireless Sensor Networks (WSNs) are explored to be used in many applications ranging from military strategy to civilian purposes. Although a sensor node has certain limitations like low processing capabilities and battery supplement, the smaller size and the cost of the sensor node helps to be unspectacular in the sensing area and is useful for tracking events without being recognized.

The sensor nodes usually adopt wireless communications mode when deployed in open and harsh environments, which makes data transmissions overheard by the vicinity sensor nodes or other wireless devices. Without precaution, the adversary can overhear packet transmission and trace back the packet to the source. This can lead to the leakage of source position and the time of event packet taken place. In view of a mission critical military application, any revelation of information such as time or event location can be beneficial to the adversary and costly to the network goal. Therefore, privacy of the monitored event holds great importance that requires providing privacy to source nodes.

Source privacy is usually compromised by the contextual information of a packet, but not by the actual content of the packet. Specifically, a packet consists of a payload and a header. The payload part can be encrypted with cryptography methods to prevent adversaries from learning the information carried in intercepted packets, while the header part has to be left in clear to provide multihop routing. Hence, attackers can learn packet original source and final destination from packet headers. A viable solution is allowing neighboring nodes to establish pairwise keys whereby to encrypt packet sources and destinations. However, this countermeasure may become invalid in the presence of compromised nodes. Therefore, source privacy cannot be addressed by encryption alone.

Traffic analysis [1] enables attackers to infer the network traffic pattern and real packet sources/destinations. If noticing that a few nodes often act as packet source, the adversary may think these nodes are important and launch targeted attacks on them. Providing communication anonymity has been regarded as an effective solution against traffic analysis [2–4] because the adversary can no longer distinguish true sources from intercepted packets.

Plenty of work is guided towards applying some form of simulating the source [5–9] or performing a random walk [5, 10] to guarantee source privacy. One primary drawback of these approaches is that a large amount of overhead is incurred to simulate a source or to redirect traffic randomly. Besides, recent works only consider eavesdropping attacks; however compromised nodes are not considered as part of the threat model where an adversary can easily capture any of sensor nodes. In this work, we consider a rather strong threat model in which the adversary can compromise nodes and is able to eavesdrop over the network communication. An example of such an adversary is a laptop class attacker who has more powerful capability to execute higher strength calculation. When a node is compromised, the adversary has access to all the cryptographic information along with data packets of past communications stored at the node.

This paper aims to protect source privacy against both passive attacks (global eavesdroppers) and active attackers (node compromise attack) as well as to guarantee fundamental security requirements such as confidentiality, data integrity, authentication, and nonrepudiation.

The contribution of the paper is as follows. We present an efficient and lightweight source privacy guaranteeing mechanism (ELSP), which applies Identity-Based Cryptography (IBC) method to complement other techniques. First, sensor nodes are organized into pairwise disjoint groups (or sets) to hide real packet sources among crowds of nodes. In order to conceal source identity, packet sources use pseudonyms instead of their real IDs so that any other nodes cannot ascertain the initiators of received packets. Our work differs from previous works in that pseudonyms generation does not introduce a central authority. Second, to provide strong communication anonymity, a random packet-forwarding strategy is presented. The source no longer sends packets along the shortest path to the destination. On the contrary, it randomly selects several forwarding nodes within each group to confuse the adversary. More importantly, members of each group exchange encrypted traffic of constant packet length to make the path untraceable. Each intermediate router (referred to as key node), only knowing its predecessor and successor, strips off one layer of encryption, and eventually, the receiver obtains the packet in plaintext. Finally, ELSP provides provably strong source anonymity against different attacks. One salient feature of ELSP is its flexibility of trading transmission for higher anonymity requirement. A detailed analysis of ELSP is provided, and a comprehensive comparison with existing schemes is presented to show its effectiveness and efficiency.

The rest of this paper is organized as follows. Section 2 reviews the existing work on source privacy, and Section 3 presents the models and design goals. Next, we describe the ELSP scheme in Section 4, followed by the detailed analysis and performance evaluation in Section 5. Finally, we have the conclusions in Section 6.

2. Related Work

Recently, source privacy in sensor networks has drawn widespread concern. Kamat et al. proposed a phantom routing protocol for flooding and single path routing [5]. They were the earliest researchers to study source privacy and present multiple techniques to guarantee the objective. One technique uses fake sources with nodes sending fake packets to mislead the adversary. The other technique is called phantom routing which takes a random walk before forwarding the packet towards the base station in order to increase the cost of the adversary to backtrack to the source. Although the schemes are robust, they have a large overhead involved and can not withstand the collaborative attacks.

Mehta et al. similarly propose two schemes called per iodic data aggregation and source simulation to overcome global eavesdropping attacks [6]. The source simulation scheme is similar to the fake sources technique proposed in [5]. In per iodic aggregation scheme, each node reports back to the base station per iodically, regardless of whether it detects an event or not. The drawback of per iodic collection scheme is the latency incurred as well as overhead, while in source simulation scheme, it is the overhead introduced.

Wang et al. [10] propose a parallel-routing protocol to maximize the time for adversary traceback to source. The packets from the same source are passed through different paths to the base station. Furthermore, a weighted random stride routing is presented that breaks the entire routing into rounds. Li et al. [11] adapt the conventional function of data mules to design a new protocol for securing source location privacy, namely, the Mules-Saving-Source (MSS) protocol, which provides a-angle anonymity. Although they are nice schemes, one of the prerequisites is that the sensor location should provided for determining the forwarding angle. Moreover, they be fail in protecting source privacy in case of a global eavesdropping adversary.

In [12], the authors propose a scalable hop-by-hop authentication scheme based on elliptic curve cryptography (ECC), which enables any intermediate node to transmit an unlimited number of messages without determining the degree of threshold suffering in the polynomial-based scheme [13, 14]. They further propose a scheme to provide source location privacy through routing to a randomly selected intermediate node and a network-mixing ring [15]. Although these two schemes can provide comparable source privacy, the first scheme brings another problem of handling the selection of the AS (ambiguity set), while the second scheme again involves taking a random walk.

Nezhad et al. devise a label-switching based technique to meet source and base station anonymity [16]. One of the restrictions is the requirement of global network information with which the base station constructs a routing tree. Each link of the routing path has a separate label when a node receives a packet, it changes the label of the packet to the upstream link and the packet gets propagated. Except for the base station demanding global knowledge of the network, each node has to perform exhaustive processing and reconstruction of the packets routing over them.

Shao et al. proposed FitProbRate [8], an exponentially distributed dummy traffic generation scheme, to maintain source anonymity. This work differs from other similar works with the dummy traffic generated at a dynamic rate decided by the Fitprob parameter. It is a great improvement over source simulation and fake sources but still has the drawback of having overhead due to dummy packet generation.

There is some other literature that considers privacy issue in WSNs. Chai et al. [17] provide the sink-location privacy against a powerful adversary with a global view, but it not considered the source privacy, which is the main focus of this paper. Chow et al. [18] focused on providing both location-monitoring services and source privacy. Di Pietro and Viejo [19] addressed the problem of querying by the base station to provide the MAX of sensor-stored readings and get a trade-off between accuracy of the result and overhead. Li and Hwang [20] propose a lightweight anonymous routing protocol in secure wireless ad hoc networks. Therefore, they are orthogonal to our paper.

All the works discussed by now just consider a passive adversary. Most consider a local eavesdropping adversary with a few providing solution about global eavesdropping adversary. In ELSP, we consider an eavesdropping adversary having node compromising capabilities. We present a packet-altering scheme, which has lesser overhead compared to existing schemes. Also, when compared to a label-switching scheme as [16], ELSP does not need every node on the path to perform packet transformation and does not need the base station to be aware of the network topology.

Our work is partly inspired by Mix Nets [2] and Crowds [4]. In particular, Crowds is used for intragroup communications; each packet travels a random path whose length follows a given geometric distribution. Since each packet may traverse a group through different number of nodes, each group in fact serves as a virtual mix through which packets are mixed.

3. Models and Design Goals

3.1. Network Model

We consider the sensor network comprised of homogeneous sensor nodes spread over a wide area. The sensor nodes are responsible for detecting events and reporting back to the base station. The base station is assumed to be secure and has unlimited resources when compared to the general sensor nodes. The occurrence of the events can be irregular and random in nature. ELSP can be used for many applications requesting source node privacy protection. A class of applications is used to track endangered animals or birds. The existences of such species need to be preserved from hunters or poachers as they have potential market value; meanwhile, they need to be studied. In a word, we consider the sensor network deployed in a wild for sensing endangered animals (e.g., South China tiger). It is a homogeneous network with small size sensor nodes dispersed over a vast area. Sensor nodes sense their environment for the presence of endangered species and report back to the base station. Note that multiple sensor nodes can detect the event simultaneously, and they will independently report it back to the base station.

The base station collects the packets and identifies the emergence of the tracked object and studies the hunting way and their living conditions. Given the tracked object being swoop and swift, we can have multiple nodes detecting the event in a specified time, and then not having any detection for a long time. We are based on the following consideration: if the animal appears somewhere, it did so to either prey or to have a rest, and this increases the possibility of the animal haunting to that location thereby needing source location privacy for the detecting packet. The event in some applications can be sporadic, but there are still other communications between the sensor nodes, resulting in the generation of packet traffic.

3.2. Attack Model

Since the high profits are brought from animal hunting, it is no wonder that the adversaries would equip themselves with advanced equipment, which means they have overwhelming technical advantages over general sensor nodes, such as sufficient energy resource, powerful computation capability, and large storage space. Therefore, the adversaries can overhear communication at much larger distances compared to a sensor node. They also possess the ability to compromise nodes. Specifically, the adversaries can operate the following two modes.

3.2.1. Global Eavesdropping Mode (External Attacks)

The adversary in this mode can carry out passive attacks, such as eavesdropping of the communications and can correlate the transmission of the packet over multiple hops. Although the adversary may not able to ascertain the contents of the packet, it has the capability to compare two packets. Note that the adversary will not interfere with the function of the network, such as destroying sensor nodes, altering the routing path, or modifying packets because such activities can be easily identified.

3.2.2. Stealth Mode (Internal Attacks)

The adversaries in this mode can compromise sensor nodes and get access to all the cryptographic information stored on them. They can further decode the packet and get information as available to any rightful sensor node. The adversary can compromise sensor nodes randomly from the network or geographically close to each other (e.g., neighboring nodes).

The goal of an attacker is to acquire the location and time of event occurrence, either by passive eavesdropping or active node compromise. In the worst case, the adversary may employ both.

3.3. Design Goals

The objective of this paper can be summarized as follows. (1)

The adversary can not get the source information by analyzing the traffic pattern.

(2)

The adversary can not get the source information, even though a few network nodes are compromised.

(3)

Only the base station can distinguish the source location through the messages received. The recovery of the source information from the received message should be very efficient.

(4)

The length of each message must be as short as possible to save the energy of sensor node.

3.4. Notations

For the sake of clarity and convenience for the readers, we list some major notations in Table 1 to be used throughout this paper.

Table 1

List of used notations.

Notation	Description
$S, D$	Source and destination (the basestation)
$g_{i}$	The ith group
$g_{i, j}$	The network ID of the jth node in group i
$G_{1}, G_{2}$	The additive group and multiplicative group of order q
$K_{V}$	The private key of node V
$K_{U V}$	The share key between nodes U and V
α	The total number of groups the packets pass through
${KN}_{i, z}$	The key node of group i, its group index is z
$P_{i, j}$	The proxy node of group i
$A_{S, i}$	The pseudonym of S for the ith group
$K_{i}$	The shared key between S and key node ${KN}_{i, z}$
$K_{D}$	The shared key between S and D
$Υ_{i}$	The ith path object
$Ω$	The payload of the packet
$Ψ_{i}$	The packet before entering the ith group

4. The Proposed ELSP Scheme

In this section, we first give the basic idea of ELSP, and then elaborate on its design.

4.1. Outline of ELSP

In ELSP, sensors are divided into pairwise disjoint sets called groups. We take source S and destination D as an example to present the basic idea of ELSP. Without loss of generality, we assume that S and D are in different groups.

To secretly send packets to D, node S randomly picks one node from every group which is called a key node through which every packet will be routed. ELSP adopts the idea of mix-nets [2]. Specifically, S packs the message for D by several layers of encryption. The packed message is then routed through the key nodes; each of them strips off one layer of encryption and then transmits to the next key node. By constructing the packet appropriately, we can guarantee that every key node knows neither other key nodes nor how many key nodes separate it between S and D. Source S is thus hidden from all the key nodes.

ELSP integrates several techniques to thwart both internal and external attackers. For example, to cover the transmission behavior, S first sends the packets to a randomly chosen group peer instead of directly sending it to the first key node. Then, after receiving a packet from its group peer, each node will send the packet to the first group with some probability, say ρ, or randomly choose another group peer to which the packet is sent. Finally, the packet will be sent to a randomly chosen proxy node in each group other than the first key node, which in turn forwards the packet to the first key node. We will show that this method assists in hiding the first key node from global eavesdropper. The packet will be forwarded in subsequent groups in a similar way. Furthermore, ELSP requires the members of every group to exchange encrypted packet of constant length to prevent attackers from tracing the packet. All these measures are attempted to make it difficult for the adversary to locate packet sources with tunable communication overhead.

An example is illustrated in Figure 1, where each group $g_{i} (1 \leq i \leq α)$ forms one layer. We denote by ${KN}_{i, j}$ the key node in each group; destination D is in group α and is actually ${KN}_{α, 3}$ . Moreover, nodes $p_{i, j}$ in each group are proxy nodes. As we can see, source S forms a packet which is forwarded through four random nodes and then passed to $P_{1,0}$ which, in turn, routes the packet to ${KN}_{1,1}$ . Only the key node can strip off one layer of encryption, and then the modified packet passes three random nodes to reach the next group. This process continues until the packet reaches D (the base station). ELSP can ensure that each key node cannot determine which packet layer it resides at, that is, its distance from the source or destination group. This process will gradually be clear when we come back to it in Section 4.2.6.

Figure 1

A diagram of ELSP.

In the following section, we will give the design details of ELSP, including group formation, group traffic maintenance, key distribution, and packet forwarding.

4.2. Detailed Description of ELSP

4.2.1. Group Formation

It is quite common that sensor nodes are deployed in groups; that is, a group of sensors are deployed at a single deployment point, and the probability distribution function (e.g., a two-dimensional Gaussian distribution) of the final resident points of all the sensors in each batch (or group) are the same [21, 22]. We assume such a group-based deployment, and we model the deployment knowledge as follows.

We consider a sensor network with N sensor nodes. Before the network deployment, the network owner divides N sensor nodes into M pairwise disjoint groups denoted by ${g_{0}, \dots, g_{M - 1}}$ . The relation between sensor nodes and groups should hold the following conditions: (1)

$g_{i} \cap g_{j} = ⌀, for all i, j \in {0, \dots, M - 1}, i \neq j, \sum_{i = 0}^{M - 1} | g_{i} | = N$ .

(2)

$V \in g$ (V denotes a sensor node, g denotes all groups set).

Sensor nodes in each group $g_{i}$ are indexed from 0 to $| g_{i} | - 1$ . Let $g_{i, j}$ denote the network ID of the jth node in group $g_{i}$ , where $0 \leq j \leq | g_{i} | - 1$ . Note that the groups may have different numbers of sensor nodes. Each sensor node is preloaded with the information regarding the affiliation it belongs to, the network ID of every other sensor node, and the index in that group.

4.2.2. Key Distribution and Agreement

As with other schemes, ELSP requires sensor nodes to establish appropriate cryptographic keys. In this work, we assume a key distribution scheme (e.g., [23]) based on Identity-Based Cryptography (IBC) [24]. However, ELSP can also rely on other suitable key distribution schemes by taking advantages of deployment knowledge of the deployed sensor nodes in a sensor network [25].

Since ELSP targets a single owner WSN, there exists a trusted authority (TA) to bootstrap the network. Before the network deployment, the TA selects a large prime q, a master secret key $κ \in z_{q}^{*}$ , an additive group $G_{1}$ of order q, a multiplicative group $G_{2}$ of order q, a bilinear map $\overset{\land}{e} : G_{1} \times G_{1} \to G_{2}$ , and a hash function $H : {0,1}^{*} \to G_{1}^{*}$ which maps arbitrary binary strings into nonzero points in $G_{1}$ . Improved Weil [24] and Tate [26] pairing are examples of the bilinear map $\overset{\land}{e}$ , and we refer the readers to [24, 26] for the detailed properties of $\overset{\land}{e}$ .

Each sensor node, say V, is equipped with $〈 q, G_{1}, G_{2}, \overset{\land}{e}, H 〉$ . Additionally, it has a unique public/private key pair, where the public key is the unique network ID, and the private key $K_{V} = κ H (V)$ is acquired from the TA. Note that the master secret κ cannot be linked from any public/private key pair like $V / K_{V}$ [23, 24, 26], so it is only known to the TA.

In ELSP, any pair of sensor nodes, say U and V, can establish a shared key independently without communicating with each other. In particular, U and V compute

\begin{matrix} K_{U V} = \overset{\land}{e} (K_{U}, H (V)), K_{V U} = \overset{\land}{e} (K_{V}, H (U)) . \end{matrix}

(1)

Here,

K_{U V} = K_{V U}

because of the symmetric and bilinear properties of

\overset{\land}{e}

[23, 24, 26]. Furthermore, each sensor node can create many pseudonyms and the corresponding private keys. For instance, V can choose a random integer

r_{V} \in z_{q}^{*}

to generate a pseudonym

r_{V} H (V) \in G_{1}

and the corresponding private key

r_{V} K_{V} = r_{V} κ H (V)

. Since

G_{1}

is a cyclic group of order q, multiplying

H (V)

r_{V}

can cloak

H (V)

and V [26]. In other words, it is unable to link to node V given only the pseudonym

r_{V} H (V)

4.2.3. Group Traffic Maintenance

Generally, sensor node acts as a networking repeater to route packets to and from other nodes; it seems that WSN has natural source anonymity since the adversary is unable to distinguish whether a sensor node just forwarded a given packet or has initiated it. However, this argument holds only when the outgoing traffic rate of the node is not greater than its incoming traffic rate. Otherwise, the adversary can ascertain that the node has initiated some traffic, even if he cannot determine what packets originated from that node. For example, a sensor node intended to send more packets than others indicates that it probably detects animals' activities. ELSP thus must prevent this from happening.

In ELSP, in order to hide packet sources, we insert garbage data to keep the packet length constant all the time. Furthermore, any two nodes in group i, for $0 \leq i \leq M - 1$ , exchange a traffic rate of λ packets/second, where λ is a public system parameter. To prevent attackers from distinguishing different packets, shared secret key should be established between any two nodes for encrypting the packet using (1).

For example, nodes U and V are both in group i which exchange packets as follows:

\begin{matrix} U ⟷ V : {t ∣ DATA}_{K_{U V}}, h_{K_{U V}} (prior - data), \end{matrix}

(2)

where

{\cdot}_{*}

denotes a fixed-length message encryption using the key on the subscript;

K_{U V}

is the shared key of U and V using (1); t is the timestamp for ensuring message freshness;

h_{K_{U V}} (\cdot)

is a one-way keyed hash function to guarantee message authenticity. On receiving the packet, node U (or V) first checks the message authentication code. If succeed, it then decrypts message using

K_{U V}

and adopts the method in Section 4.2.5 to process DATA.

Eavesdropper cannot ascertain whether sensor node U initiated a data packet, as he is unable to differentiate data from each other. ELSP also guarantees that even node V cannot ascertain whether DATA was just forwarded by U or actually originated from node U; this protects U from internal attackers if any. Intergroup traffic and intragroup traffic are of the same fixed length, which can prevent attackers from inferring any useful information from packet-length changes [1].

4.2.4. Packet Construction

Now we describe the construction of packets. Assume that source $S \in g_{0}$ determines to send information info to destination $D \in g_{α}$ . To construct a packet, S does the following procedure.

Step 1.

Select one key node for each group. To complete this, node S has to rely on the normal routing protocol (e.g., min hop routing) to find the node IDs on the forwarding path between S and D. Fortunately, this can be done in the network initialization with the base station simply broadcasting a message to the whole network. The base station adds two fields to the header of this message, which are “node-in-route” (NIR) field and group ID (GID) field. Initially, these two fields are empty. Starting from the base station, whenever a node propagates the message to the next hop, the node ID and group ID of the upstream node are appended to the NIR and GID. Nodes included in NIR are excluded from the random pick at the next hop. This nonrepetitive propagation terminates until reaching every node. Finally, each node has a path node ID list (NIDL) and the corresponding group ID list (GIDL) between itself and the base station. Node S randomly picks one node from its NIDL for each group as the key node through which every packet will be routed.

Step 2.

Choose a unique random integer $r_{i} \in z_{q}^{*}$ for each group i whereby to calculate a pseudonym $A_{S, i} = r_{i} H (S) \in G_{1}$ and the corresponding private key $r_{i} K_{S} = r_{i} H (S) = κ A_{S, i}$ . $A_{S, i}$ is used to cloak source S from key nodes at group i, as will be shown soon.

Step 3.

Compute a shared key with each key node ${KN}_{i, z}, i \in [1, α], as K_{i} = \overset{\land}{e}    (r_{i} K_{S}, H (K N_{i, z})) = \overset{\land}{e}    (r_{i} κ H (S), H (K N_{i, z}))$ , which is used to add (by S) or peel off (by ${KN}_{i, z}$ ) one layer of encryption.

We use the following approach to prevent each key node from knowing its distance from source S or destination D. Assume that destination D is in layer l, $l \in [1, α]$ . Source S computes a shared key with D as $K_{D} = \overset{\land}{e}    (r_{d} K_{S}, H (D))$ and then calculates

\begin{matrix} Ω = {\begin{cases} {info}_{K_{D}} & l = 1, \\ {{info}_{K_{D}}}_{K_{1}} & l = 2, \\ {{\dots {{info}_{k_{D}}}_{k_{l - 1}} \dots}_{k_{2}}}_{k_{1}}, & 2 < l \leq α . \end{cases} \end{matrix}

(3)

Source S then derives

\begin{matrix} Υ_{i} = {\begin{cases} PL D_{1} & i = 1, \\ {PL D_{2}}_{K_{1}} & i = 2, \\ {{\dots {{PL D_{i}}_{K_{i - 1}}}}_{K_{i - 2}} \dots}_{k_{1}}, & 3 \leq i \leq α, \end{cases} \end{matrix}

(4)

where

{PLD}_{i}

Mark ∣ A_{S, i} {∣ K N}_{i, z} ∣ x

. Here,

Υ_{i}

is called a path object which indicates the packet path. Mark is a predetermined string that explains the legitimacy of the

PL D_{i}

. Since

{\cdot}_{*}

is a fixed-length cipher, we have

| Υ_{1} | = | Υ_{2} | = \dots | Υ_{α} |

. Finally, S constructs the packet as

\begin{matrix} Ψ_{1} = 〈 Ω, Υ_{α}, Υ_{α - 1}, \dots, Υ_{2}, Υ_{1} 〉 . \end{matrix}

(5)

Suppose that S has another message for the same destination D; it can generate a new packet by just replacing the message part Ω. That is, the same set of key nodes can be used in multiple messages between S and D. However, S has to change the set of key nodes per iodically if it has many messages for D to prevent the predecessor attack [27]. Besides, different event packets from the same source should have different pseudonyms in case the two packets are correlated by the adversary.

4.2.5. Packet Forwarding and Processing

Packet $Ψ_{1}$ travels a random path in source group $g_{0}$ before entering $g_{1}$ . Specifically, source S randomly selects a node from $g_{0}$ and sends $Ψ_{1}$ to it in a standard packet length. When the chosen node receives $Ψ_{1}$ , it forwards $Ψ_{1}$ with probability $ρ \in [0,1]$ to a random node in $g_{0}$ and with probability $1 - ρ$ directly to $g_{1}$ . Here, ρ is a system parameter called the rabbling probability. In particular, the preloaded one-way hash function is denoted as $H (x)$ , where x is the hash seed. The mapping function is $f_{ρ} (y)$ with $y = H (x)$ . For the output range of the hash function, the mapping functions map to 1 with probability ρ and to 0 with probability $1 - ρ$ . The benefit of using two functions in the system is that it gives the base station a simple way to change the rabbling probability by just updating the mapping function via a broadcast. When a node receives a packet, it calculates $f_{ρ} (H (x))$ ; if the computation maps to 1, it forwards the packet to the next group. Otherwise, it forwards to another group peer. Such random-forwarding mechanism helps to prevent the attacker from identifying source S whose effectiveness has recently been discussed in [28]. Specifically, every packet forwarder (except S) cannot tell whether the group peer-sending $Ψ_{1}$ is the packet source or just a packet forwarder.

Once a node in $g_{0}$ , say F, decides to forward $Ψ_{1}$ to $g_{1}$ , F picks a random node $P_{1, j}$ in $g_{1}, j \in [0, | g_{1} | - 1]$ to which the following packet modified from $Ψ_{1}$ is sent:

\begin{array}{l} F ⟶ P_{1, j} : Ψ_{1} = 〈 Ω, Υ_{α}, Υ_{α - 1}, \dots, Υ_{2}, \\ Mark ∣ A_{S, 1} ∣ K N_{1, z} ∣ x 〉, \end{array}

(6)

it is possible that

P_{1, j}

itself happens to be

{KN}_{1, z}

in which case

j = z

. Note that

Ψ_{1}

should be sent in an encrypted packet to

P_{1, j}

which then directly forwards it to

{KN}_{1, z}

after verifying

Ψ_{1}

. Since eavesdroppers cannot distinguish this packet from others between

{KN}_{1, z}

and

P_{1, j}

, they cannot immediately determine that

{KN}_{1, z}

is the key node in

g_{1}

. If

P_{1, j}

is a compromised node, then the attackers confirm

{KN}_{1, z}

as the key node. But they still cannot ascertain whether

{KN}_{1, z}

is the packet destination or help them to identify the packet source. On receiving

Ψ_{1}

{KN}_{1, z}

assumes that it was originated from node

A_{S, i}

with which to compute a shared key as

K_{1, z} = \overset{\land}{e} (K_{K N_{1, z}}, A_{S, 1})

. There are four cases.

(i)

If ${KN}_{1, z}$ is a key node, then $K_{1, z} = \overset{\land}{e}  (K_{K N_{1, z}}, A_{S, 1}) = \overset{\land}{e}  (s K_{K N_{1, z}}, r_{1} H (S))$ which is equivalent to $K_{1}$ [23, 24, 26].

(ii)

If ${KN}_{1, z}$ is destination D, then $K_{1, z} = \overset{\land}{e}  (K_{D}, A_{S, 1}) = \overset{\land}{e}  (s H (D), r_{1} H (S))$ which is equivalent to $K_{D}$ [23, 24, 26].

(iii)

If the above conditions are all met, then we have $K_{1} = K_{D} = K_{1, z}$ .

(iv)

Otherwise, $K_{1, z}$ is neither equal to $K_{1}$ nor $K_{D}$ .

Then it attempts using $K_{1, z}$ to decrypt Ω in $Ψ_{1}$ ; if the decryption result has a predefined message format, then ${KN}_{1, z}$ realizes itself as the destination, or else, the decryption result can be ignored. Let ${msg}^{x : y}$ denote the output of decrypting msg using shared keys $K_{x}, K_{x + 1}, \dots, K_{y - 1}, K_{y}$ sequentially, which is executed by nodes $K N_{x, z}, K N_{x + 1, z}, \dots, K N_{y - 1, z}, K N_{y, z}$ sequentially. For example, $Ω^{1 : 1}$ and $Ω^{1 : 2}$ are separately the outputs of decrypting $Ω$ using the shared key $K_{1}$ by ${KN}_{1, z}$ , and using $K_{1}$ and $K_{2}$ by ${KN}_{1, z}$ and ${KN}_{2, z}$ sequentially. Based on (3), we have

\begin{matrix} Ω^{1 : 1} = {\begin{cases} random string & l = 1, K N_{1, z} \neq D, \\ info & l = 1, K N_{1, z} = D, \\ {info}_{K_{D}} & l = 2, \\ {{\dots {{info}_{K_{D}}}}_{K_{l - 1}} \dots}_{K_{2}} & 2 < l \leq α . \end{cases} \end{matrix}

(7)

Furthermore, ${KN}_{1, z}$ uses $K_{1, z}$ to decrypt $Υ_{2} = {Mark ∣ A_{S, 2} ∣ K N_{2, z} ∣ x}_{K_{1}}$ . Since $K_{1, z} = K_{1}$ , only ${KN}_{1, z}$ can make a successful decryption, which knows this after seeing Mark. Node ${KN}_{1, z}$ continues using $K_{1}$ to decrypt $Υ_{3}, Υ_{4}, \dots, Υ_{α}$ , and obtain $Υ_{3}^{1 : 1}, Υ_{4}^{1 : 1}, \dots, Υ_{α}^{1 : 1}$ . According to (4), it has $Υ_{i}^{1 : 1} = {\dots {PL D_{i}}_{K_{i - 2}} \dots}_{K_{2}}, for all i \in [3, α]$ . Finally, ${KN}_{1, z}$ forms a new packet

\begin{matrix} Ψ_{2} = 〈 Ω^{1 : 1}, R_{1}, Υ_{α}^{1 : 1}, \dots, Υ_{3}^{1 : 1}, Υ_{2}^{1 : 1} 〉, \end{matrix}

(8)

where

Υ_{2}^{1 : 1} = Mark ∣ A_{S, 2} ∣ K N_{2, z} ∣ x

, and

R_{1}

is a random garbage data of length

| Υ_{1} |

used to compensate for the deletion of

Υ_{1}

so that

Υ_{2}

and

Υ_{1}

are of the same format and length. Similar to

Ψ_{1}

, packet

Ψ_{2}

takes a random path (starting from

{KN}_{1, z}

) in

g_{1}

before reaching

g_{2}

. Generally, each key node

{KN}_{i, z}, i \in [1, α - 1]

generates packet

Ψ_{i + 1}

in which a random string

R_{i}

is added after the message part to keep a constant packet length.

R_{i} s

can not be differentiated from each other and will be treated as real path objects by subsequent key nodes. By doing so, each key node cannot identify at which layer of encryption it is.

The packet forwarding is terminated at the last key node ${KN}_{α, z}$ which receives the following packet:

\begin{matrix} Ψ_{α} = 〈 Ω^{1 : (α - 1)}, R_{α - 1}, \dots, R_{1}^{2 : (α - 1)}, Υ_{α}^{' 1 : (α - 1)} 〉, \end{matrix}

(9)

where

Υ_{α}^{' 1 : (α - 1)} = Mark ∣ A_{S, α} ∣ K N_{α, z} ∣ 0

, Here, x is replaced with zero of equal length by S. Then

{KN}_{α, z}

processes

Ψ_{α}

similarly as before.

{KN}_{α, z}

knows itself as the key node after decrypting

Υ_{α}^{' 1 : (α - 1)}

and finding Mark there. Then it uses

K_{α}

to decrypt Ω in

Ψ_{α}

. Since

K_{α} = K_{D}, {KN}_{α, z}

knows that it is the message destination because the decryption result info has a predefined message format.

Since the same set of key nodes is used in delivering multiple messages from S to D, this can significantly reduce the computation overhead. Specifically, each ${KN}_{i, z}$ knows whether it is the destination or a key node after processing the first message and can cache the corresponding source pseudonym $A_{S, i}$ .

4.2.6. An Example

In order to have a better understanding with ELSP, we take the same example in Figure 1, where $α = 3$ and the base station as the destination D which is actually the key node ${KN}_{3,2}$ in $g_{3}$ . For simplicity, we assume that all groups have the same size of 6.

Source S forms $Ψ_{1} = 〈 Ω, Υ_{3}, Υ_{2}, Υ_{1} 〉$ , where

\begin{matrix} Ω = {{info}_{K_{D}}}_{K_{1}}, \\ Υ_{3} = {{Mark ∣ A_{S, 3} ∣ K N_{3,2} ∣ x}_{K_{2}}}_{K_{1}}, \\ Υ_{2} = {Mark ∣ A_{S, 2} ∣ K N_{2,3} ∣ x}_{K_{1}}, \\ Υ_{1} = Mark ∣ A_{S, 1} ∣ K N_{1,1} ∣ x . \end{matrix}

(10)

Ψ_{1}

passes through four nodes before leaving group

g_{0}

. Node

{KN}_{1,1}

receives from its proxy node

P_{1, 0}

the packet

Ψ_{1} = 〈 Ω, Υ_{3}, Υ_{2}, Mark ∣ A_{S, 1} ∣ K N_{1,1} ∣ x 〉

and calculates a shared key

K_{1, 1}

based on the source pseudonym

A_{S, 1}

. Obviously, only

K_{1,1} = K_{1}

with which

{KN}_{1,1}

generates

Ψ_{2} = 〈 Ω^{1 : 1}, R_{1}, Υ_{3}^{1 : 1}, Υ_{2}^{1 : 1} 〉

, where

\begin{matrix} Ω^{1 : 1} = {info}_{K_{D}}, \\ Υ_{3}^{1 : 1} = {Mark ∣ A_{S, 3} ∣ K N_{3,2} ∣ x}_{K_{2}}, \\ Υ_{2}^{1 : 1} = Mark ∣ A_{S, 2} ∣ K N_{2,3} ∣ x . \end{matrix}

(11)

Node

{KN}_{1,1}

knows itself not the message destination after decrypting Ω with

K_{1,1}

and not finding Mark there. Because

K_{1,1} \neq K_{D}

, the decryption result is a random string which does not provide a predefined message format. So,

{KN}_{1,1}

uses mapping function

f_{ρ} (H (x))

to generate the next forwarding node for

Ψ_{2}

. As we can see from Figure 1,

Ψ_{2}

passes through three nodes before leaving

g_{1}

Node ${KN}_{2,3}$ receives from the proxy node $P_{2,2}$ a packet $Ψ_{2} = 〈 Ω^{1 : 1}, R_{1}, Υ_{3}^{1 : 1}, Mark ∣ A_{S, 2} ∣ K N_{2,3} ∣ x 〉$ and then calculates a shared key $K_{2,3}$ based on $A_{S, 2}$ . This time we have $K_{2,3} = K_{2}, {KN}_{2,3}$ then uses $K_{2,3}$ to generate $Ψ_{3} = 〈 Ω^{1 : 2}, R_{2}, R_{1}^{2 : 2}, Υ_{3}^{1 : 2} 〉$ , where $Υ_{3}^{1 : 2} = Mark ∣ A_{S, 3} ∣ K N_{3,2} ∣ x$ . Again, ${KN}_{2,3}$ knows that itself not the destination after decrypting $Υ_{3}^{1 : 1}$ with $Ω^{1 : 1}$ .

$Ψ_{3}$ goes by two nodes before leaving $g_{2}$ . Node ${KN}_{3,2}$ receives from its proxy node $P_{3,4}$ a packet $Ψ_{3} = 〈 Ω^{1 : 2}, R_{2}, R_{1}^{2 : 2}, Mark ∣ A_{3,2} ∣ K N_{3,2} ∣ 0 〉$ and then calculates $K_{3,2}$ to decrypts $Ω^{1 : 2}$ . Node ${KN}_{3,2}$ knows that it is the message destination, as the decryption result does provide a predefined message format so the packet forwarding process succeeded.

By virtue of pseudonym, each ${KN}_{i, z}$ considers that it receives the packet $Ψ_{i}$ from source $A_{S, i}$ . Besides, the addition of $R_{i}$ maintains constant packet length all the time. Hence, each ${KN}_{i, z}$ cannot identify which packet layer it resides in, and it could be at any layer $i \in [1, α - 1]$ with equal probability (D knows that it locates at the last layer α). This forces the attackers to consider all packets when tracking back, and it cannot discount any packet analysis requirement upon the packet length. Using garbage filler slightly increases the overhead, but it is still very less compared to existing random walk or fake packet generation schemes as shown in Section 5.

5. Analysis and Performance Evaluation

In this section, we first give the communication and computation overhead of ELSP, and then its security about source anonymity. Finally, we conduct the simulation to demonstrate the efficiency of ELSP when compared with existing works.

5.1. Overhead Analysis

5.1.1. Communication Cost

In ELSP, a fixed packet size is used to prevent the attackers from inferring any useful information from packet-length changes. Each packet contains a constant length message part and α path objects of equal length. If the message is not long enough, it has to be padded to keep the fixed length. For convenience, we choose $Ψ_{1} = 〈 Ω, Υ_{α}, Υ_{α - 1}, \dots, Υ_{2}, Υ_{1} 〉$ to analyze the packet overhead which is defined as the ratio of nonmessage part to the packet length len. Since $| Υ_{1} | = | Υ_{2} | = \dots = | Υ_{α} |$ , we only need to compute $| Υ_{1} |$ , where $| Υ_{1} | = Mark ∣ A_{S, 1} ∣ K N_{1, z} ∣ x$ .

The length of $A_{S, i}$ is an element in group $G_{1}$ and in fact a point on an elliptic curve over $F_{p}$ [23, 24, 26]. If the prime p is of 160 bits and other pairing parameters are properly selected, it can achieve a security level equivalent to that of 1024-bit RSA [26]. So we have $| A_{S, i} | = 160$ bits. Assume that each node ID is of $l_{ID}$ bits, the hash seed x is of $l_{x}$ bits, the packet overhead is as follows:

\begin{array}{l} packet overhead = \frac{α | Υ_{1} |}{len} \\ = \frac{α (| A_{S, i} | + | Mark | + | K N_{1, z} | + | x |)}{len} \\ = \frac{α (160 + | Mark | + l_{ID} + l_{x})}{len} \\ = \frac{α (160 + | Mark | + | K N_{1, z} | + | x |)}{| Ω | + α (160 + | Mark | + | K N_{1, z} | + | x |)} \\ = \frac{1}{ε + 1}, \end{array}

(12)

where

ε = | Ω | / (α (160 + | Mark | + l_{ID} + l_{x}))

. In ELSP, |Ω| is fixed and unchangeable. Since |Mark|,

l_{ID}

and

l_{x}

are also constant, the packet overhead is in direct ratio to α, the number of packet layers, or path objects.

Now we discuss T, the number of end-to-end packet transmissions cost by each message from S to D. Suppose for the sake of simplicity that no two consecutive packet forwarders are the same, and that no key node is selected as a proxy node. Let $L_{i}$ denote the number of times that packet $Ψ_{i} (for all i \in [1, α]$ ) is transmitted before entering group i. According to Section 4.2.5, $L_{i}$ satisfies the geometric distribution $Pr (L_{i} = k) = (1 - ρ) ρ^{k - 1}, for all k \geq 1$ , with mean $\bar{L_{i}} = 1 / (1 - ρ)$ . Each $Ψ_{i}$ also involves α intergroup transmissions, each for one proxy node in group i. According to the packet-forwarding process in Section 4.2.5, we can derive

\begin{matrix} T = \sum_{i = 1}^{α} (\bar{L_{i}} + 1) = \frac{α}{1 - ρ} + α . \end{matrix}

(13)

Note that T includes

α / (1 - ρ)

intragroup transmissions and α intergroup transmissions. The former are hidden by intragroup traffic rate of λ packets/second and can be dynamically adjusted as needed. So, each message only incurs α inter-group transmissions. Besides, given a certain value of λ, the larger T it is, the fewer concurrent sessions can be supported and vice versa.

5.1.2. Computation Cost

In ELSP, the most time-consuming task is undoubtedly the pairing $\overset{\land}{e}$ operation for shared-key establishment, which chooses Tate paring [26]. Zhang et al. [29] quantify the energy consumption of the Tate pairing. It assumes that the sensor CPU is a low-power 32-bit Intel PXA255 processor at 400 MHz. The computation of the Tate pairing roughly needs $33 / 400 \times 752 \approx 62.04$ ms, and the energy consumption is approximately 25.5 mJ. In fact, the pairing function in the protocol is executed relatively rarely. Specifically, each pair of nodes only needs to do $\overset{\land}{e}$ once on demand to establish a shared key whereby to encrypt and authenticate intergroup or intragroup traffic using efficient symmetric key ciphers, the remaining pairing operations can be precomputed and stored for all protocol instances. Moreover, if source S intends to transmit multiple messages to destination D, each key node only need execute $\overset{\land}{e}$ once to compute a shared key when transmitting the first message. All subsequent packet processing are implemented based on the shared keys using efficient symmetric-key ciphers. Therefore, the computation cost of ELSP is totally acceptable even on low-end sensor device.

5.2. Security Analysis

Packets in ELSP use pseudonym instead of real source ID so that key nodes can not determine the initiators of received packets. Further, even the base station (a key node as well) cannot ascertain who sends it the message if the source node does not reveal its real identity in the message. This implies that the adversary can not directly determine packet sources. However, the adversary can assign a probability to each sensor node for being the source of a given packet. Therefore, we will investigate the resilience of ELSP against such probabilistic attacks.

We first use two entropy-based metrics to evaluate source anonymity. Then we describe the random forwarding approach of how to prevent internal attackers from determining key nodes. Finally, we evaluate the capability of ELSP providing source anonymity and show its efficiency compared with existing works.

5.2.1. Anonymity Metrics

Pfitzmann and Hansen [30] first defined anonymity as “the state of Indistinguishable within a set of all possible subjects, anonymity set.” Since then, anonymity set has become a popular metric to evaluate the anonymity in various anonymous communication systems. Generally, the greater the anonymity set is, the better anonymity achieved. However, it is unable to reflect the possibility that the adversary assigns different probabilities to each node as being the source of a given packet. This problem was solved by an entropy-based metric proposed in [31]. Briefly speaking, let $Φ$ be a set of N nodes $(| Φ | = N)$ in an anonymous communication system; the anonymity entropy is defined as

\begin{matrix} Δ = - \sum_{X = Φ} P_{X} lo g_{2} (P_{X}) . \end{matrix}

(14)

Here,

P_{X}

represents the probability of the adversary assigned to node X being the source of a packet, and Δ denotes the uncertainty about which node is the source of a packet or the additional information that the adversary needs to determine the packet source. It follows that

0 \leq Δ \leq lo g_{2} (N)

[32]. The upper bound is achieved when each node

X \in Φ

is assigned an equal probability of

1 / N

to be the source as viewed from the adversary (the ideal case); the lower bound is attained when S is assigned a probability of one, while each node

X \in Φ ∖ {S}

is assigned a probability of zero. In addition, Diaz et al. [33] defined anonymity degree as

\begin{matrix} v = \frac{Δ}{lo g_{2} (N)}, \end{matrix}

(15)

which indicates the distance between the real anonymity entropy and the maximum anonymity entropy that a system can provide.

5.2.2. Efficacy of Random Forwarding

We denoted source S a key node by ${KN}_{0, z}$ . It is crucial in ELSP to prevent the attackers from identifying key nodes ${KN}_{i, z} (0 \leq i \leq α)$ . As discussed before, each packet $Ψ_{i + 1}$ takes a random path starting from key node ${KN}_{i, z}$ before entering group $i + 1$ ; this helps withstanding eavesdroppers (external attackers). However, there might be some sensor nodes compromised (internal attackers) in the transmission path. The countermeasure is to identify and isolate these compromised nodes via some effective solutions like reputation and trust-based mechanism as in [34] this can improve anonymous routing, but also increase the complexity of the system. In the following, we illustrate the efficacy of the random-forwarding mechanism against such internal attackers without other complements.

We assume that a set of $c \in [1, | g_{i} | - 1]$ compromised nodes in group $g_{i} (0 \leq i \leq α - 1)$ , among which at least one appears in the forwarding path of packet $Ψ_{i + 1}$ . The goal of internal attackers is to find out which noncompromised node in $g_{i}$ is ${KN}_{i, z}$ that initiated $Ψ_{i + 1}$ . Let B denote the first compromised node which received ${Ψ_{i}}_{+ 1}$ from a noncompromised node A. From the point of view of internal attackers, all the noncompromised nodes in $g_{i}$ other than A have equal probability to be ${KN}_{i, z}$ , but they are obviously less likely to be ${KN}_{i, z}$ than A. We need to analyze how confident the adversary can be that A is indeed ${KN}_{i, z}$ , or in other words, the probability that they assigned to A being ${KN}_{i, z}$ .

Theorem 1.

Let one suppose that the first compromised node B in group i, for all $i \in [0, α - 1]$ , received packet $Ψ_{i + 1}$ from node A; the probability that the adversary assigned to A as the key node ${K N}_{i, z}$ is $(n - (n - c - 1) ρ) / n$ , where $n = | g_{i} |$ and c is the number of compromised nodes in group i.

Proof.

Let $\vec{A B}$ be the event that B received packet $Ψ_{i + 1}$ from node A. The probability of A being ${KN}_{i, z}$ is

\begin{matrix} Pr (A = K N_{i, z} ∣ \vec{A B}) = \frac{Pr (A = K N_{i, z}, \vec{A B})}{Pr (\vec{A B})}, \end{matrix}

(16)

where

\begin{array}{l} Pr (\vec{A B}) = Pr (A = K N_{i, z}, \vec{A B}) + Pr (A \neq K N_{i, z}, \vec{A B}) \\ = Pr (A = K N_{i, z}) \cdot Pr (\vec{A B} ∣ A = K N_{i, z}) \\ + Pr (A \neq K N_{i, z}) \cdot Pr (\vec{A B} ∣ A \neq K N_{i, z}) . \end{array}

(17)

Let $Pos (B) = k$ be the event that B locates in the kth position of the path. Then $p_{k}^{'} = Pr (\vec{A B}, Pos (B) = k ∣ A = K N_{i, z})$ represents the probability that A is indeed ${KN}_{i, z}$ , and $Ψ_{i + 1}$ went through $k - 1$ noncompromised nodes before A. There are three cases: (i)

$k = 1$ : it shows that A selected B as the first packet forwarder; this occurs with probability $p_{k}^{'} = 1 / n$ because each packet forwarder is chosen randomly and uniformly from group $i (| g_{i} | = n$ ).

(ii)

$k = 2$ : it shows that A selected itself as the first packet forwarder and then B as the second one, which occurs with probability $p_{2}^{'} = ρ / n^{2}$ .

(iii)

$k > 2$ : it shows that packet $Ψ_{i + 1}$ passed $k - 2$ noncompromised nodes, and the last one selected A as the ( $k - 1$ )th packet forwarder who chose B as the kth packet forwarder, which occurs with $p_{k}^{'} = ρ^{k - 1} \cdot ((n - c) / n)^{k - 2} \cdot (1 / n^{2}) = (ρ / n^{2}) (((n - c) ρ) / n)^{k - 2}$ . It follows that

\begin{array}{l} Pr (\vec{A B} ∣ A = K N_{i, z}) = \frac{1}{n} + \frac{ρ}{n^{2}} \sum_{j = 0}^{\infty} {(\frac{(n - c) ρ}{n})}^{j} \\ = \frac{n - (n - c - 1) ρ}{n^{2} - n (n - c) ρ} . \end{array}

(18)

Likewise, we have

\begin{array}{l} Pr (\vec{A B} ∣ A \neq K N_{i, z}) = \sum_{k = 1}^{\infty} Pr (\vec{A B}, Pos (B) = k ∣ A \neq K N_{i, z}) \\ = 0 + \sum_{k = 2}^{\infty} ρ^{k - 1} \cdot {(\frac{n - c}{n})}^{k - 2} \cdot \frac{1}{n^{2}} \\ = \frac{ρ}{n^{2} - n (n - c) ρ} . \end{array}

(19)

At last, if B has no other information, all the noncompromised nodes in group i are equally likely to be the key node ${KN}_{i, z}$ , so we have

\begin{matrix} Pr (A = K N_{i, z}) = \frac{1}{n - c}, Pr (A \neq K N_{i, z}) = \frac{n - c - 1}{n - c} . \end{matrix}

(20)

Substituting (18), (19), and (20) into (17) and then (16), we finally obtain

\begin{matrix} Pr (A = K N_{i, z} ∣ \vec{A B}) = \frac{n - (n - c - 1) ρ}{n} . \end{matrix}

(21)

Based on (21), we can further deduce that (i)

when $n = c + 1$ , it indicates that all the nodes in $g_{i}$ other than A are compromised, $Pr (A = K N_{i, z} ∣ \vec{A B}) = 1$ . Thus, the adversary can make sure that A is indeed ${KN}_{i, z}$ ;

(ii)

when $n > c + 1$ , the larger ρ, the smaller $Pr (A = K N_{i, z} ∣ \vec{A B})$ , the better the key node is concealed from the internal attackers, and the larger the communication overhead T.

Note that all the other $n - c - 1$ noncompromised sensor nodes have equal probability of being ${KN}_{i, z}$ . Consequently, the probability distribution of each node in $g_{i}$ being ${KN}_{i, z}$ as viewed by the adversary is given by

\begin{matrix} Pr (X = K N_{i, z}) = {\begin{cases} \frac{n - (n - c - 1) ρ}{n} & X = A, \\ \frac{ρ}{n} & X \in g_{i}, X \neq A, \\ 0 & else . \end{cases} \end{matrix}

(22)

5.2.3. Source Anonymity Measurement

It should be noted that besides the event packets, there are still other communications among the sensor nodes involving a large number of packets, which makes it infeasible for the adversary to precisely separate different communication sessions from eavesdropping [1]. This allows us to focus on the impact of internal attackers. For simplicity, we still consider the session from source S to destination D.

We first discuss the impact of consecutively compromised key nodes. Suppose that the adversary compromised C out of the overall N sensor nodes. Because of the layered encryption, the same information appears entirely different across packet layers so that only key nodes ${KN}_{i, z} (0 \leq i \leq α - 1)$ that strip off one layer of encryption can correlate messages across two adjacent layers. Note that the last key node ${KN}_{α, z}$ does not further forward; the packet, therefore, is excluded here. We call a chain of $e (1 \leq e \leq α - 1)$ compromised key nodes participating in the same packet delivery as an e-chain. The e-chain can integrate the same message across $e + 1$ consecutive packet layers. For example, when $e = 1$ , no consecutive key nodes were compromised, and when $e = α - 1$ , the adversary can trace the packet from $g_{1}$ to $g_{α}$ . It is possible that there exist multiple disjoint chains, but the adversary cannot link them together, so we only need to consider the longest e-chain, referred to the $e^{*}$ -chain, which exposes the maximum information to the adversary. The $e^{*}$ -chain can begin with any of the $α - e^{*}$ key nodes, and the probability of it starting from ${KN}_{i, z} (1 \leq i \leq α - 1)$ is $1 / (α - e^{*})$ . In other words, there is at least one compromised node in $g_{i - 1}$ serving random packet forwarding; $g_{i - 1}$ is the source group $g_{0}$ with probability $1 / (α - e^{*})$ .

We also consider an extreme case where there is at least one compromised node serving random packet forwarding in each group. For convenience, we assume that each group comprises the same number of n sensor nodes (i.e., $N = M n$ ) and the adversary knows α. There might be no $e^{*}$ -chain; in this case, the anonymity of S will be definitely better than what we will demonstrate below. Theorem 2 gives the principle about the source anonymity.

Theorem 2.

Let the $e^{*}$ -chain beginning with ${K N}_{i, z} (i \in [1, α])$ and the first compromised node B in group $g_{i - 1}$ received packet $Ψ_{i}$ from node A, the probability that the adversary assigned to A as the packet source is $(n - (n - c - 1) ρ) / (n (α - e^{*}))$ , where c is the number of compromised nodes in $g_{i - 1}$ .

Proof.

The proof is directly based on Theorem 1. Since B is in source group $g_{0}$ with the probability of 1/( $a - e$ ), then its predecessor A is in the random-forwarding path being source S with the probability of

\begin{array}{l} Pr (A = S ∣ \vec{A B}) = Pr (A = S ∣ \vec{A B}, B \in g_{0}) \\ = \frac{n - (n - c - 1) ρ}{n} \cdot \frac{1}{(α - e^{*})} \\ = \frac{n - (n - c - 1) ρ}{(α - e^{*}) n} . \end{array}

(23)

Likewise, each noncompromised node except A in $g_{i - 1}$ has the equal probability of $ρ / ((α - e^{*}) n)$ being the source. Let us suppose that there are C compromised nodes among all the N sensor nodes in the network (excluding source S and destination D) the residue ( $M - 1) n - (C - c$ ) noncompromised nodes not in $g_{i - 1}$ equally hold the probability of $(α - e^{*} - 1) / (α - e^{*})$ being the source. In a word, the probability distribution of each node being the source from the adversary's point of view is given by

\begin{matrix} Pr (X = S)  =  {\begin{cases} \frac{n - (n - c - 1) ρ}{(α - e^{*}) n} & X = A, \\ \frac{ρ}{(α - e^{*})} & X \in g_{i - 1}, X \neq A, \\ \frac{α - β - 1}{(α - e^{*}) (M n - n - C + c)} & else . \end{cases} \end{matrix}

(24)

Finally, we can obtain the source anonymity entropy using (14) and the source anonymity degree using (15).

5.3. Performance Evaluation

We first give numerical results to demonstrate the effectiveness of ELSP, and then make a comparison with existing schemes by simulations. Unless specified otherwise, we assume that each sensor node is compromised independently with probability 0.1, which is considered a severe situation. Due to space limitations, we omit the calculation details.

5.3.1. Numerical Results

Figure 2 illustrates the parameters α, $e^{*}$ , and ρ impact on the source anonymity degree, where the network size $N = 400$ , the number of groups $M = 20$ , and $C = 40$ nodes are compromised. We can see that the larger α and ρ are, the higher the source anonymity is. In addition, given parameters α and ρ, increasing $e^{*}$ will decrease the source anonymity. The reason is straight: the larger $e^{*}$ is, the more key nodes are compromised by the adversary, which in turn, increases the probability of message disclosed. Note that with $ρ > 0$ , it is impossible for the adversary to ascertain the packet source even when the first $α - 1$ key nodes are all compromised. These results verify the efficiency of random-forwarding technique in improving source anonymity.

Figure 2

Impact of α, ρ, and $e^{*}$ on source anonymity.

Figure 3 shows the impact of the total number of compromised nodes C on source anonymity, where $N = 400, M = 20, α = 5, ρ = 0.7$ . It is obvious that the more compromised the nodes are, the lower the source anonymity is; this coincides with the intuition. For example, when $C = 200$ , namely, half of the sensor nodes are compromised; the source privacy are still higher than 0.9. Based on this figure, it can be concluded that ELSP is resilient to node compromise attack.

Figure 3

Impact of C on source anonymity.

The source privacy is also relevant to the group size n (cf. (24)). Figure 4 shows the impact of the group size n on source anonymity, where $N = 1000$ and $C = 100$ . As we can see, source anonymity increases as n increases; meanwhile, it increases the total intragroup traffic (communication overhead) and vice versa. In fact, various objects may have different security levels; in order to provide differentiated source anonymity requirement, one viable solution is to hide more important nodes with higher privacy requirements in relatively large groups while letting other groups be of small sizes. Therefore, ELSP has a nice property of flexibility of source anonymity that can be guaranteed by varying the group size n.

Figure 4

Impact of n on source anonymity.

5.3.2. Comparison

In this subsection, we use both functional analysis and simulations to get a comprehensive comparison with DCARPS [16], random walk [5], and fake event packet generation [6, 8].

First, we compare the difference between DCARPS and ELSP from the technique implementation. In particular, DCARPS adopts label-switching approach, while ELSP uses the pseudonym and intermediate packet-altering scheme to conceal source traversal information. ELSP makes much less assumptions while considering a strong threat model as well as bearing the lightweight design in mind. DCARPS requires the base station to know the topology information of the network. ELSP only needs the information of intermediate nodes. In ELSP, only a small number of nodes are chosen to reconstruct the packet before forwarding, whereas in DCARPS, all the nodes involved in delivering the packet have to perform decryption and re-encryption operation with the new label. Further, in ELSP, this construction method allows the base station to verify the packet, while in DCARPS it is only used for routing. More importantly, node compromise attacks are not considered in DCARPS, which is the main attention we focus on ELSP.

Second, we conduct the simulation in NS2 to show the effectiveness of ELSP and compare it to random walk technique [5]. In Figure 5, we show the overhead (energy consumption) of random walk-based methods and ELSP. For fair comparison, sensor nodes in random walk-based methods are also organized into pairwise disjoint groups; the x-axis represents the longest path length of 50 hops between source and destination. Note that the overhead of random walk-based methods includes propagation over the random path and the journey towards the base station. The random walk lengths (denoted by H) are 10, 15, 20, and 25, respectively. The rabbling probability ρ is chosen as a stable status of 0.75 for ELSP, which is equivalent to $1 / (1 - 0.5) = 4$ hops of intragroup random forwarding (cf. Section 5.1.1). As shown in Figure 5, the overhead of ELSP slightly increases with the path lengths, while the overhead involved in random walk-based methods is higher than ELSP.

Figure 5

Overhead comparison between random walk-based schemes and ELSP.

In Figure 6, we set $N = 1000$ , $C = 100$ , $n = 10$ for random walk-based methods and ELSP. We also define privacy in random walk based methods as the length of distance the random walk takes the source information away from the actual source when compared to the path length (say S-D distance). The source privacy achieved for ELSP and random walk based scheme with the same path lengths and random walk lengths are shown in Figure 5. As we can see, the source anonymity degree or privacy of ELSP is constant irrespective of the path lengths, while the random walk based methods' privacy decreases with the increases of path length. This is because we consider a rather strong threat model, where the adversary can either eavesdrop over a larger coverage area or compromise sensor nodes. Thus, once the attackers obtain data packets on compromised node, they can traceback to the source with higher probability. Conversely, in ELSP, each packet is modified en route by selected nodes to make it difficult for the adversary to trace back to the source.

Figure 6

Privacy comparison between random walk-based schemes and ELSP.

Finally, Figure 7 shows the overhead incurred in fake packet generation schemes [6, 8] compared to ELSP. We consider different numbers of fake packet-generating nodes (denoted by F), with the numbers ranging from 20% to 50% of all nodes. The x-axis represents the number of actual source nodes in the network as a percentage of all nodes. As shown in Figure 7, the larger the source nodes in the network are, the smaller the overhead incurred. We can conclude that any form of fake packet generation technique will have significantly higher overhead compared to ELSP. Even the conservative case of 20% of sensor nodes generating fake packets leads to higher overhead compared to ELSP. This is because the fake packet generation is an obfuscating technique which is only successful under the heavy load of fake packet generation. When a sensor node detects an event and generates the event reporting packet, there should be at least one more source node generating a fake event-reporting packet. On this occasion, we see that 50% of the traffic corresponds to fake traffic. Moreover, to increase security will incur more overhead in the form of fake packets. Consequently, the realization of such a system is dependent on the amount of fake packets generated which still provide very minimal source privacy and have more overhead than ELSP. In addition, the fake packet generation schemes do not consider node compromises.

Figure 7

Overhead comparison between fake packet generation schemes and ELSP.

6. Conclusions

Sensor node detecting the event is important, as it can reveal the event occurrence location and time occurrence; thus, needs to maintain source privacy. Previous works consider an eavesdropping adversary and do not provide countermeasures to an intrusive node compromise attack with global eavesdropping capabilities. In this paper, we presented the design and evaluation of a novel anonymous mechanism for WSNs. By utilizing the grouping, the self-generated pseudonym, and the Identity-Based Cryptography, the proposed protocol is demonstrated to achieve desired security objectives and efficiency. As future work, we will seek to analyze the security of our scheme under other adversary models.

Footnotes

Acknowledgments

The authors acknowledge support from the National Natural Science Foundation of China (Grant nos. 61173136, 61202462, 61173141, and 61232016), and the fund support of the key subject of Fujian province—Computer Application Technology.

References

Raymond

J.-F.

Traffic analysis: protocols, attacks, design issues, and open problemsin

Proceedings of the International Workshop on Design Issues in Anonymity and Unobservability

2000

Berkeley, Calif, USA

10 29

Chaum

D. L.

Untraceable electronic mail, return addresses, and digital pseudonyms

Communications of the ACM 1981 24 2 84 88

2-s2.0-0019532104

10.1145/358549.358563

Pfitzmann

Waidner

Networks without user observability

Computers and Security 1987 6 2 158 166

2-s2.0-0023331864

Reiter

Rubin

Crowds: anonymity for web transactions

ACM Transactions on Information and System Security (ITSSEC) 1998 1 1 66 92

Kamat

Zhang

Trappe

Ozturk

Enhancing source-location privacy in sensor network routing

Proceedings of the 25th IEEE International Conference on Distributed Computing Systems

June 2005

Columbus, Ohio, USA

599 608

2-s2.0-27944449345

Mehta

Liu

Wright

Protecting location privacy in sensor networks against a global eavesdropper

IEEE Transactions on Mobile Computing 2012 11 2 320 336

Ouyang

Liu

Ford

Makedon

Source location privacy against laptop-class attacks in sensor networks

Proceedings of the 4th International Conference on Security and Privacy in Communication Networks (SecureComm '08)

September 2008

Istanbul, Turkey

1 10

2-s2.0-70249120775

10.1145/1460877.1460884

Shao

Yang

Zhu

Cao

Towards statistically strong source anonymity for sensor networks

Proceedings of the 27th IEEE Communications Society Conference on Computer Communications (INFOCOM '08)

April 2008

Phoenix, Ariz, USA

466 474

2-s2.0-51349086896

10.1109/INFOCOM.2007.19

Yang

Shao

Zhu

Urgaonkar

Cao

Towards event source unobservability with minimum network traffic in sensor networks

Proceedings of the 1st ACM Conference on Wireless Network Security (WiSec '08)

April 2008

Alexandria, Va, USA

77 88

2-s2.0-56749163645

10.1145/1352533.1352547

10.

Wang

Sheng

Privacy-aware routing in sensor networks

Computer Networks 2009 53 9 1512 1529

2-s2.0-67349099926

10.1016/j.comnet.2009.02.002

11.

Raj

Liu

Wright

Das

S. K.

Using data mules to preserve source location privacy in wireless sensor networks

7129

Proceedings of the 13th International Conference on Distributed Computing and Networking (ICDCN '12)

2012

Hong Kong, Hong Kong

309 324 Lecture Notes in Computer Science

10.1007/978-3-642-25959-3_23

12.

Ren

Providing hop-by-hop authentication and source privacy in wireless sensor networks

IEEE Conference on Computer Communications (INFOCOM '12)

2012

Orlando, Fla, USA

3071 3075

10.1109/INFCOM.2012.6195761

13.

Zhang

Subramanian

Wang

Lightweight and compromise-resilient message authentication in sensor networks

Proceedings of the 27th IEEE Communications Society Conference on Computer Communications (INFOCOM '08)

April 2008

Phoenix, Ariz, USA

2092 2100

2-s2.0-51349136300

10.1109/INFOCOM.2007.200

14.

Albrecht

Gentry

Halevi

Katz

Attacking cryptographic schemes based on perturbation polynomials

Cryptology ePrint Archive, Report 2009/098, 2009, http://eprint.iacr.org/

15.

Ren

Quantitative measurement and design of source-location privacy schemes for wireless sensor networks

IEEE Transactions on Parallel and Distributed Systems 2012 23 7 1302 1311

16.

Nezhad

A. A.

Miri

Makrakis

Location privacy and anonymity preserving routing for wireless sensor networks

Computer Networks 2008 52 18 3433 3452

2-s2.0-55649095270

10.1016/j.comnet.2008.09.005

17.

Chai

Lin

Enhancing sink-location privacy in wireless sensor networks through k-anonymity

International Journal of Distributed Sensor Networks 2012 2012 16

648058

10.1155/2012/648058

18.

Chow

C. Y.

Mokbel

M. F.

A privacy-preserving location monitoring system for wireless sensor networks

IEEE Transactions on Mobile Computing 2011 10 1 94 107

19.

Di Pietro

Viejo

Location privacy and resilience in wireless sensor networks querying

Computer Communications 2011 34 3 515 523

2-s2.0-78751642439

10.1016/j.comcom.2010.05.014

20.

Hwang

A lightweight anonymous routing protocol without public key en/decryptions for wireless Ad Hoc networks

Information Sciences 2011 181 23 5333 5347

21.

Huyen

N. T. T.

Nguyen

T.-D.

Huh

E.-N.

A beneficial analysis of deployment knowledge for key distribution in wireless sensor networks

Security and Communication Networks 2012 5 5 485 495

22.

Zhou

Sun

Wang

An efficient and scalable pairwise key pre-distribution scheme for sensor networks using deployment knowledge

Computer Communications 2009 32 1 124 133

2-s2.0-57349158635

10.1016/j.comcom.2008.09.035

23.

Xiong

Wang

An efficient key management scheme for wireless sensor networks

International Journal of Distributed Sensor Networks 2012 2012 14

406254

10.1155/2012/406254

24.

Boneh

Franklin

Identity-based encryption from the weil pairing

SIAM Journal on Computing 2003 32 3 586 615

2-s2.0-0037623983

10.1137/S0097539701398521

25.

Das

A. K.

ECPKS: an improved location-aware key management scheme in static sensor networks

International Journal of Network Security 2008 7 3 358 369

26.

Barreto

Kim

Bynn

Scott

Efficient algorithms for pairing-based cryptosystems

Proceedings of the 22nd Annual International Cryptology Conference on Advances in Cryptology (CRYPTO '02)

2002

Santa Barbara, Calif, USA

354 368

27.

Wright

M. K.

Adler

Levine

B. N.

Shields

The predecessor attack: an analysis of a threat to anonymous communications systems

ACM Transactions on Information and System Security 2004 7 4 489 522

2-s2.0-14544273666

10.1145/1042031.1042032

28.

Danezis

Diaz

Kasper

Troncoso

The wisdom of crowds: attacks and optimal constructions

Proceedings of 14th European Symposium on Research in Computer Security (ESORICS '09)

2009

Saint-Malo, France

406 423

29.

Zhang

Liu

Lou

Fang

Location-based compromise-tolerant security mechanisms for wireless sensor networks

IEEE Journal on Selected Areas in Communications 2006 24 2 247 260

2-s2.0-33144476837

10.1109/JSAC.2005.861382

30.

Pfitzmann

Hansen

Anonymity, unobservability, and pseudonymity: a consolidated proposal for terminology

Draft v0.25, 2005

31.

Serjantov

Danezis

Towards an information theoretic metric for anonymity

Proceedings of the 2nd International Conference on Privacy Enhancing Technologies (PET '02)

2002

Heidelberg, Germany

41 53

32.

Cover

T. M.

Thomas

J. A.

Elements of Information Theory 2006 2nd

London, UK

Wiley

33.

Diaz

Seys

Claessens

Preneel

Towards measuring anonymity

Proceedings of the 2nd International Conference on Privacy Enhancing Technologies (PET '02)

Heidelberg, Germany

54 68

34.

Crosby

G. V.

Hester

Pissinou

Location-aware trust-based detection and isolation of compromised nodes in wireless sensor networks

International Journal of Network Security 2011 12 2 107 117