Mutual Defense Scheme for Secure Data Aggregation in Wireless Sensor Networks

Abstract

As a resource-saving technique, data aggregation has been intensively studied in wireless sensor networks (WSNs). However, existing methods of secure data aggregation in WSNs either (1) cause high communication overhead or (2) cannot defend against compromised member nodes and aggregator nodes at the same time. In this paper, we propose a mutual defense scheme for secure data aggregation, which includes two components, that is, secure sort-group-filter data aggregation algorithm (SSGF) and lightweight TDMA-based monitoring mechanism. The SSGF is to defend against injecting false readings by compromised member nodes, and the monitoring mechanism is to defend against altering the aggregation results by aggregator nodes. In addition, a secure data packet transmission scheme is also presented. Considering that the readings sensed by neighbor nodes exhibit temporal and spatial correlation, a constraint parameter, called maximum tolerant difference (MTD), is introduced and the quantitative criteria for abnormal readings evaluation are given. Both the theoretical analysis and comparative experiments show the feasibility and efficiency of the proposed method.

1. Introduction

The data aggregation technique is considered as one of the resource-saving strategies in wireless sensor networks (WSNs) to save the energy and minimize the medium access layer contention. However, in reality, WSNs are likely to be deployed in unattended areas, which make them vulnerable against attacks. When some nodes are compromised by attackers, they can launch a wide variety of attacks, such as injecting, altering, and selective forwarding. In this paper, we focus on the security issues of secure data aggregation, especially, on preventing the stealthy attack. In a stealthy attack, a compromised node may inject bogus raw readings or produce forge aggregation values with the aim of causing the base station to accept false aggregation values while not being detected by the user. In critical applications, using incorrect or maliciously modified aggregation results can have disastrous consequences.

Due to the importance of the secure data aggregation in WSNs, many innovative and intuitive methods have been proposed [1, 2]. These existing secure data aggregation methods can be classified into two main categories: cryptography-based schemes [3–13] and monitoring-based schemes [14–21]. In the first category, the methods mainly rely on pure cryptography techniques to provide end-to-end security, that is, to ensure the confidentiality, authentication, and integrity of data. But, these approaches cannot defend against all attacks and they especially cannot prevent attacks from compromised nodes, which can send false raw readings. In the second category, monitoring-based methods have been proposed as an attractive complement to cryptography in securing WSNs. These methods rely on monitoring mechanisms to monitor the behaviors of nodes and then filter out the untrusted nodes and the bogus readings. However, most of these solutions either (1) cause high communication overhead or (2) only consider the member nodes or the aggregator nodes to be the compromised nodes.

In this paper, to overcome the aforementioned drawbacks, a lightweight mutual defense scheme is proposed for secure data aggregation in WSNs. It includes a secure sort-group-filter (SSGF) data aggregation algorithm and a TDMA-based listening mechanism, which defend against compromised sensor nodes injecting false readings and aggregators sending incorrect aggregation results, respectively. It also provides security services including the integrity, freshness, and authentication via a secure data packet transmission scheme. Considering that the readings sensed by neighbor nodes exhibit temporal and spatial correlation, we define a constraint parameter, maximum tolerant difference (MTD), and give the quantitative criteria for abnormal readings evaluation. At the same time, we analyze and prove the feasibility of our method and present the worst aggregation results that compromised nodes can produce. Extensive experiments are conducted and the results show that the proposed mechanism can effectively and efficiently defend against and detect the malcompromised nodes. Compared with existing methods, our method can ensure higher accuracy of the aggregation results but with lower monitoring overhead.

The rest of this paper is organized as follows. Section 2 gives a summary of related work. Section 3 provides network model, attack model, and assumptions. Section 4 gives the details of our mutual defense scheme for secure data aggregation. The analysis of aggregation results affected by compromised nodes is introduced in Section 5 and simulation results are provided in Section 6. Finally, Section 7 concludes this paper.

2. Related Work

The secure data aggregation problem in WSNs has been investigated extensively during the past few years. Several surveys of these works are presented in [1, 2]. Wagner has studied the security of aggregation and proposed a mathematical framework for formally evaluating secure aggregation [22].

In [3, 4], a secure information aggregation (SIA) scheme was proposed to prevent the users from accepting incorrect aggregation results. By constructing efficient random sampling mechanisms and interactive proofs, SIA can let the home server verify the correctness of the aggregated data. However, as the aggregator not only needs to construct a Merkle hash tree-based commitment but also needs to deal with the check tasks from the home server, the computation and communication overhead of the aggregator is very high.

In [5–9], several homomorphic encryption-based end-to-end secure data aggregation schemes were proposed. The advantage of these approaches is that the ciphertexts can be directly aggregated. In [10], a pattern-based end-to-end secure data aggregation scheme was presented. In [11], a watermark-based end-to-end secure data aggregation approach was proposed. In [12], to provide end-to-end data confidentiality, a secure data aggregation scheme was proposed. It achieves data privacy through the secure channel. In [13], a signature-based data security technique was proposed to protect sensitive data aggregation. It makes use of the additive property of complex numbers. However, those approaches can only defend against external attacks and cannot prevent attacks from compromised nodes by injecting bogus raw readings.

In [23], to defend against the falsified subaggregate attack, in which a compromised node relays a false subaggregate to the parent node, a verification algorithm was presented. The base station can use it to determine if the computed aggregate includes any false contribution. However, it would fail to compute the aggregate in the presence of the attack. To address this problem, an attack-resilient computation algorithm was designed in [24]. In [25], a random sample consensus paradigm-based technique, called RANBAR, was designed to filter out outlier elements from a sample before an aggregation procedure. However, RANBAR does not consider the situation where the aggregator nodes are compromised.

In [14], a trust management scheme was presented to identify trustworthiness of sensor nodes. As the strategy of this method is to collect multiple and redundant readings and to crosscheck them for consistency, the communication overhead is also high.

In [15], a secure aggregation tree (SAT) was proposed to detect and prevent cheating. In SAT, as every child node needs to gather all the messages from their sibling nodes to the father node, the communication overhead is relatively high. At the same time, they do not consider the situation where the leaf nodes may also be compromised. In [16], a secure and reliable data aggregation protocol, called SELDA, was proposed. The basic idea of SELDA is that each sensor node updates trust levels for environment by monitoring actions of its neighboring nodes using Beta distribution function. However, SELDA ignores the situation where the aggregator nodes are compromised. In [17], a reputation-based secure data aggregation (RSDA) was proposed. In RSDA, as each node in a cell needs to compare its readings with the readings of its neighbors and perform redundant operations to monitor the actions of the cell representative, the communication and computation overhead is high. In [18], a RSDA-based representative aggregation tree (RAT) scheme was presented to reduce the data transmission overhead. However, this scheme is just mentioned to adopt the monitoring mechanism to prevent the injection of bogus information and forged aggregation values. In [19], a solution to detect the false readings during the data aggregation and recognize the attacking nodes was proposed. The main idea is to be monitored by children and judged by majority. However, it requires the dedicated external nodes to monitor the internal nodes, which wastes a lot of external nodes. In [20], a monitoring-based secure data aggregation method was proposed to prevent on-off attacks. In [21], a monitoring mechanism with two hierarchical levels was designed to ensure the integrity and the accuracy of aggregate result. In the first level monitoring, a principal supervisor node PSUP_L1 monitors the behavior of clusterhead, whereas, in the second level monitoring, the rest of nodes in the cluster monitor the behavior of both PSUP_L1 and clusterhead. However, since each node participates in aggregation function and gathers the data through passive listening, this scheme incurs a very high monitoring overhead.

3. System Model

We consider a static WSN with one sink node and N sensor nodes. The sink node is a powerful node and secure. We assume that the network is densely deployed and readings sensed by nodes exhibit temporal and spatial correlation, which is reasonable because all the nodes can sense similar physical phenomena at a specific time and area, such as applications to monitor the temperature, humidity, and lighting of an area.

Similar to other works in the literature, we consider the cluster-based network architecture for data aggregation; for instance, the network can be organized into a clustered structure through some secure clustering algorithms such as the protocol proposed in [26] or be divided into grids as [17, 18]. However, such preestablished network architecture is not suitable for data aggregation in some event detection applications such as intruder detection. Because it is very likely that in a cluster some nodes detect an event while others do not. Hence, for this kind of applications, it is very important to organize the collaboration of sensor nodes dynamically to generate reports once events are detected. In this paper, we do not consider this kind of applications. Figure 1 presents the logical architecture of the WSN considered. The clusterhead nodes, also called aggregator nodes in this paper, are responsible for aggregation of readings sent by their member nodes. They form a structure tree to transmit aggregation readings by multihopping through other clusterhead nodes. In this paper, we just consider the average aggregation operation. We assume that the effective key-based mechanisms are adopted, such as [27–29]. And the secure communications between member nodes and their clusterhead nodes are based on the symmetric keys. Each cluster has a group key, which is used by the clusterhead node to send aggregation results to the sink node or the next hop clusterhead node.

Figure 1

The cluster-based logical architecture for data aggregation.

We assume that both the clusterhead nodes and their member nodes are possibly compromised by attackers. When an attacker compromises a node, he or she can obtain its cryptographic keys and completely control it. Hence, the attacker may use the compromised node to launch a variety of active or passive attacks. However, in this paper, we focus on a passive attack. In such attack, a compromised node follows the normal network protocols and does not perform attacks, such as jamming and DoS attacks, to block the normal operations of the network. Using the compromised keys, it can inject forged or malmodified readings, which deviate from the normal readings. The purpose of attackers is to try to produce incorrect aggregation results without being detected. Note that, in this paper, we do not consider attacks based on colluding clusterhead nodes. In this attack, multiple compromised clusterhead nodes work in collusion to modify messages. When a colluding clusterhead node receives a message generated from its distant colleagues, it modifies this message to avoid being detected. Dealing with this attack is beyond the scope of this paper, and we will seek solutions to this issue in the future.

We classify the nodes in a cluster from two points of view: the cluster and the individual node. From the view of the cluster, we classify the nodes in a cluster as invalid cluster nodes and valid cluster nodes. The invalid cluster nodes are those nodes which have been excluded from the cluster while the valid cluster nodes are on the opposite. The number of the valid cluster nodes will be decreased if the majority of the valid cluster nodes mark some node as invalid. From the view of the individual node, we classify the nodes in a cluster as invalid cooperative nodes and valid cooperative nodes. For a specific node in a cluster, its invalid cooperative nodes are those nodes which have been marked as the malicious node by itself, while valid cooperative nodes are on the opposite. The number of the valid cooperative nodes of x will be decreased if x marks some node as invalid.

We assume that the number of compromised nodes is less than the number of well-behaving nodes in any cluster. We also classify the compromised nodes as invalid compromised nodes and valid compromised nodes. The invalid compromised nodes are those compromised nodes which have been excluded from the network while the valid compromised nodes are on the opposite. Notations summary lists some major notations and their specific meanings in this paper.

4. The Mutual Defense Scheme for Secure Data Aggregation

Our mutual defense scheme for secure data aggregation contains two aspects: the clusterhead nodes defending against their member nodes and the member nodes listening to their clusterhead nodes. It is based on a constraint parameter, called MTD, representing the maximum tolerant difference among the valid readings in a cluster. As mentioned earlier, the readings exhibit temporal and spatial correlation. For a specific application, we can predefine the MTD. The MTD is denoted by $Δ^{*}$ in this paper. The value of the MTD is determined by the specific applications and the size of the cluster. For example, in sensing temperature applications, all sense nodes can obtain very similar readings about the temperature in a cluster. Hence, the value of the MTD can be set to the tolerated measure error in a cluster. Note that the MTD is a system parameter which cannot be modified by attackers.

4.1. Solution Outline

To defend against compromised member nodes injecting bogus raw readings, clusterhead nodes aggregate the collected data using the secure sort-group-filter (SSGF) aggregation algorithm proposed in this paper, which will be presented in detail in the following subsection. At the same time, clusterhead nodes update the normal or abnormal information of each member node separately according to the aggregation results, the MTD, and the received data from each member node. Then clusterhead nodes send their aggregation results to their next hop clusterhead nodes or the sink node.

To defend against clusterhead nodes sending forged aggregation results, a listening mechanism based on the TDMA scheme is designed for member nodes monitoring their clusterhead nodes, which can conserve the energy of nodes effectively. Based on its readings and the MTD, each member node will update the normal or abnormal information of its clusterhead node.

4.2. Secure Data Packet Transmission Scheme

In this subsection, we introduce the packet formats in data packet transmission phase. They can provide security services including the integrity, freshness, and authentication.

The data packet sent from a member node u to its clusterhead node v is described as the following format:

\begin{matrix} {I D_{u}, I D_{v}, p r d - r n d, p a y l o a d 1, M A C_{K_{(u, v)}} (d a t a 1)}, \\ {p a y l o a d 1 = E_{K_{(u, v)}} (d_{u})}, \\ {d a t a 1 = I D_{u} ∥ I D_{v} ∥ p r d - r n d ∥ p a y l o a d 1}, \end{matrix}

(1)

where $p r d - r n d$ is a number, constructed by the period number and the round number, and is used to provide freshness service, $K_{(u, v)}$ is the shared key between u and v, $d_{u}$ is the reading of u, $E_{K} (D A T A)$ means the encrypted result of $D A T A$ using key K, and $M A C_{K} (D A T A)$ means the message authentication code (MAC) of $D A T A$ computed by using key K, which is used to provide integrity and authentication services.

The clusterhead node v sends the data aggregation packet to its next-hop node $N_{v}$ , which may be a clusterhead node or the sink node, by the following format:

\begin{array}{l} {I D_{v}, I D_{N_{v}}, p o s_{v}, p r d - r n d, p a y l o a d 2, \\ M A C_{G K_{v}} (d a t a 2), M A C_{K_{(_{v}, N_{v})}} (d a t a 3)}, \end{array}

(2)

\begin{array}{l} {p a y l o a d 2 = E_{G K_{v}} (a g g R_{v})}, \end{array}

(3)

\begin{array}{l} {d a t a 2 = p o s_{v} ∥ p r d - r n d ∥ p a y l o a d 2}, \end{array}

(4)

\begin{array}{l} {d a t a 3 = I D_{v} ∥ I D_{N_{v}} ∥ d a t a 2 ∥ M A C_{G K_{v}} (d a t a 2)}, \end{array}

(5)

where $p o s_{v}$ is the position information of the clusterhead node v, $G K_{v}$ is the group key shared between the sink node and the cluster, which the node v belongs to, $K_{(v, N_{v})}$ is the shared key between node v and node $N_{v}$ , and $a g g R_{v}$ is the aggregation result of node v. The first MAC and the second MAC in (2) are used to provide end-to-end integrity and authentication services and hop-to-hop integrity and authentication services, respectively. Note that the $d a t a 2$ and the first MAC in (2) do not change while delivering the aggregation reports between route nodes.

4.3. SSGF: Secure Sort-Group-Filter Aggregation

In this subsection, we focus on the SSGF. The SSGF algorithm consists of five steps, including sorting, grouping, filtering, aggregating, and updating.

4.3.1. Sorting Phase

In this step, a clusterhead node firstly sorts the collected data sent by its valid cooperative nodes. Assume that, after sorting, a clusterhead node v obtains an ascending data sequence $S_{v} = {d_{(1)}, d_{(2)}, \dots, d_{(m_{v})}}$ , where $m_{v}$ is the number of the valid cooperative nodes of v. Secondly, the clusterhead node computes $Δ_{i, i + 1} = d_{(i + 1)} - d_{(i)}$ $(1 \leq i \leq m_{v} - 1)$ and gets a difference sequence $Δ_{S_{v}} = {Δ_{1,2}, Δ_{2,3}, \dots, Δ_{m_{v} - 1, m_{v}}}$ .

For instance, considering that in a cluster with v as the clusterhead a set of data received by v from its valid cooperative nodes is {11.7, 10.5, 9.6, 10.1, 7.2, 11.5, 9.4, 11.0, 11.9, 11.1}. Then the sorted ascending data sequence $S_{v}$ is {7.2, 9.4, 9.6, 10.1, 10.5, 11.0, 11.1, 11.5, 11.7, 11.9} and the difference sequence $Δ_{S_{v}}$ is {2.2, 0.2, 0.5, 0.4, 0.5, 0.1, 0.4, 0.2, 0.2}. For convenient description in subsequent examples, we assume that the ten sensor nodes from $s_{1}$ to $s_{10}$ send the corresponding data in $S_{v}$ ; that is, $s_{1}$ sends the data 7.2; $s_{2}$ sends the data 9.4 and so on.

4.3.2. Grouping Phase

In this step, based on the obtained difference sequence $Δ_{S_{v}}$ and the MTD $Δ^{*}$ , the clusterhead node groups the data sequence $S_{v}$ . If $Δ_{i, i + 1} > Δ^{*} (1 \leq i \leq m_{v} - 1)$ , then it groups the sequence $S_{v}$ into two parts at index i. Normally, after this step, we can get only one group with the maximum number of data items if the number of normal nodes is more than that of the compromised nodes in the cluster. Assume that we obtain the group $g_{t \max} = {d_{(k)}, d_{(k + 1)}, \dots, d_{(*)}}$ with the maximum number of data items.

Extending the example at sorting phase, we assume that the MTD $Δ^{*}$ is 2.0, and the normal readings, in fact, belong to [10.0, 12.0]. That is to say, $s_{1}$ , $s_{2}$ , and $s_{3}$ send fake readings. Then, as $Δ_{1,2} = 2.2 > Δ^{*}$ , we group the $S_{v}$ into two parts at index 1. As a result, we can obtain the group $g_{t \max} = {9.4, 9.6, 10.1, 10.5, 11.0, 11.1, 11.5, 11.7, 11.9}$ .

4.3.3. Filtering Phase

In this step, based on the MTD $Δ^{*}$ , we filter out the abnormal extreme values $d_{(k)}$ and $d_{(*)}$ if $(d_{(*)} - d_{(k)}) > Δ^{*}$ . And repeating the above process until the difference between the extreme maximum value and the extreme minimum value is not larger than the $Δ^{*}$ , say $d_{(h)} - d_{(l)} \leq Δ^{*}$ , then we obtain the group $a g_{g} = {d_{(l)}, d_{(l + 1)}, \dots, d_{(h)}}$ for aggregation.

For the above example, $(d_{(10)} - d_{(2)}) = 11.9 - 9.4 = 2.5 > 2.0$ , and then we filter out the abnormal extreme values 9.4 and 11.9. Repeating this process, we can filter out $d_{(3)}$ and $d_{(9)}$ . As $(d_{(8)} - d_{(4)}) = 11.5 - 10.1 = 1.4 < 2.0$ , then we finish the filtering phase and obtain the group $a g_{g} = {10.1, 10.5, 11.0, 11.1, 11.5}$ .

4.3.4. Aggregating Phase

In this step, according to the aggregation function, the clusterhead node v aggregates the group $a g_{g}$ and gets the aggregation result $a g g R_{v}$ .

Continuing our example, considering the average aggregation function, we can get the aggregation result $a g g R_{v} = 10.84$ .

4.3.5. Updating Phase

In this step, based on the result $a g g R_{v}$ and $Δ^{*}$ , v updates the normal information $c n t_{normal}$ and the abnormal information $c n t_{abnormal}$ of its each valid cooperative node u separately by formulae (7) and (8):

\begin{array}{l} α = \frac{| d_{u} - a g g R_{v} |}{Δ^{*}}, \end{array}

(6)

\begin{array}{l} c n t_{normal} = {\begin{cases} c n t_{normal} + 1, & d_{u} \in [d_{(l)}, d_{(h)}]; \\ c n t_{normal}, & otherwise, \end{cases} \end{array}

(7)

\begin{array}{l} c n t_{abnormal} \\ =  {\begin{cases}  c n t_{abnormal}, &  d_{u} \in [d_{(l)}, d_{(h)}]; \\  c n t_{abnormal}  +  1, &  d_{u} \in [d_{(k)}, d_{(l)}) \cup (d_{(h)}, d_{(*)}], α \leq 1; \\  c n t_{abnormal}  +  λ^{α}, &  otherwise . \end{cases} \end{array}

(8)

In formula (8), λ is a punishment base and $λ > 1$ . From (8), we know that the larger the absolute difference between the data sent by one node and the aggregation result $a g g R_{v}$ is, the severer punishment it will get. If during a detection period the $c n t_{abnormal}$ for a specific node is satisfied, $c n t_{abnormal} > c n t_{abthr}$ , the clusterhead node v marks the corresponding member node as an invalid compromised node, called the direct case, and broadcasts an alarm message to notify its member nodes of the abnormal node. Note that in the direct case the number of the valid cooperative nodes of v will be decreased. The $c n t_{abthr}$ is a predefined detection threshold and $c n t_{abthr} = p \times R$ , where R is the number of rounds for a detection period and $p (0 < p < 1)$ is an adjusting factor, which is defined by users, for the detection threshold $c n t_{abthr}$ . One transmission for the aggregation result is looked as one round (i.e., each round consists of both SSGF and TDMA-based listening defense mechanism). For example, if $p = 0.2$ and $R = 100$ , then $c n t_{abthr} = 20$ , which means that for a specific node if $c n t_{abnormal} > 20$ during a detection period with 100 rounds, the clusterhead node will deem it as a compromised node. However, after a detection period finishes, if the $c n t_{abnormal}$ for a specific node is satisfied, $c n t_{abnormal} ⩽ c n t_{abthr}$ , the clusterhead node updates the corresponding node's $c n t_{abnormal}$ to zero.

The value of $c n t_{abthr}$ can affect the detection ratio $r_{g}$ and the false positive ratio $r_{b}$ in a detection period. In order to obtain a low $r_{b}$ , we can use the expected ratio $r_{m}$ of valid compromised nodes to normal nodes in a cluster to estimate the $c n t_{abthr}$ . $r_{m}$ is also the probability that a reading of a normal node is viewed as abnormal due to attacks by valid compromised nodes per round. Hence, for a given $r_{b}$ , we can use $r_{m}^{c n t_{abthr}} = r_{b}$ to estimate the $c n t_{abthr}$ . For example, if $r_{m} = 2 / 3$ and $r_{b} = 0.03 %$ , then $c n t_{abthr} = 20$ . Obviously, the higher the $c n t_{abthr}$ is, the lower the $r_{b}$ will be. However, this does not mean that the higher the $c n t_{abthr}$ is, the better the result will become. When the $c n t_{abthr}$ is very high, the $r_{g}$ may be very low in a detection period. Therefore, when we determine the $c n t_{abthr}$ , we should trade off between $r_{g}$ and $r_{b}$ .

Note that, in order to defend against the bad-mouthing attack, in which a compromised clusterhead node libels a normal node as an invalid compromised node, when a member node receives an alarm message from its clusterhead node, it only marks the corresponding node as a suspicious compromised node. A node x is viewed as an invalid compromised node by a node y only in two cases: the direct case mentioned above or the indirect case, in which node y receives alarm messages about node x from the majority of valid cluster nodes in the cluster. Note that in the indirect case the number of the valid cluster nodes will be decreased.

Going on our example at aggregation phase, the cluster node updates the $c n t_{normal}$ and the $c n t_{abnormal}$ for each of its valid cooperative nodes. For each node from $s_{4}$ to $s_{8}$ which sent the corresponding data in $a g_{g}$ , the corresponding $c n t_{normal}$ is increased by one. For each node, $s_{2}$ , $s_{3}$ , $s_{9}$ , and $s_{10}$ , which sent the corresponding data filtered out during filtering phase, the corresponding $c n t_{abnormal}$ is increased by one. This means that when the $c n t_{abnormal}$ for malnodes, $s_{2}$ and $s_{3}$ , is increased, it may also lead to the $c n t_{abnormal}$ for normal nodes, $s_{9}$ and $s_{10}$ , being increased. However, as we will prove in Section 5.1, the probability to deem a normal node as a compromised node is relatively small. For node $s_{1}$ that sent data, 7.2, and has been filtered out at grouping phase, its $c n t_{abnormal}$ is increased by $λ^{α} = λ^{1.82}$ . If $λ = 1.5$ , then its $c n t_{abnormal}$ is increased by 2.09. It shows that the higher the $α (α > 1)$ is, the severer punishment a node with an abnormal reading will get.

4.4. TDMA-Based Listening Defense Mechanism

The motivation behind the TDMA-based listening mechanism is to save the monitoring overhead per node. Since energy is a scarce resource in WSNs, if a node keeps the listening state all the time, a significant amount of energy will be consumed. Adopting the TDMA-based method can reduce the energy consumption caused by listening.

A TDMA-based mechanism contains two phases: assigning slots and sending messages in corresponding slots. For the TDMA-based listening defense mechanism, a clusterhead node v firstly assigns the slots to its valid cooperative nodes and itself. Secondly, each node sends messages to its clusterhead node in corresponding slot. A valid cooperative node will enter hibernation after it sends a data message, while it will wake up at the slot when its clusterhead node sends the aggregation result.

Based on the sensed reading $d_{u}$ and the listening aggregation result $a g g^{'}$ sent by its clusterhead node v, each valid cooperative node u of v updates the normal information $c n t_{normal}$ and the abnormal information $c n t_{abnormal}$ of v separately by formulae (10) and (11), where $m_{v}$ is the number of valid cooperative nodes of v at the current detection period. Consider

\begin{matrix} β = \frac{| d_{u} - a g g^{'} |}{Δ^{*}}, \end{matrix}

(9)

\begin{matrix} c n t_{normal} = {\begin{cases} c n t_{normal} + 1, & β \leq \frac{m_{v} - 1}{m_{v}}; \\ c n t_{normal}, & otherwise, \end{cases} \end{matrix}

(10)

\begin{matrix} c n t_{abnormal} = {\begin{cases} c n t_{abnormal} + λ^{β}, & β > \frac{m_{v} - 1}{m_{v}}; \\ c n t_{abnormal}, & otherwise . \end{cases} \end{matrix}

(11)

Theorem 1.

The upper bound of the β for the normal member nodes’ estimation is $β \leq (m_{v} - 1) / m_{v}$ .

Proof.

Without loss of generality, considering a normal cooperative node u and its reading $d_{u}$ , we can assume that $d_{u}$ is the maximum or minimum value among all readings in its cluster at the current round. Note that, in a realistic scenario, as the readings of all normal cooperative nodes cannot be exactly the same, we do not consider this extreme case.

If we assume that $d_{u}$ is the maximum value and at the same time assume that the reading of each other node is the minimum value $d_{u} - Δ^{*}$ , then the minimum aggregation result $a g g_{est}$ (min) can be estimated by formula (12) for node u:

\begin{matrix} a g g_{est} (\min) = d_{u} - \frac{m_{v} - 1}{m_{v}} Δ^{*} . \end{matrix}

(12)

Similarly, if we assume that $d_{u}$ is the minimum value and at the same time assume that the sensed data by each other node is the maximum value $d_{u} + Δ^{*}$ , then the maximum aggregation result $a g g_{est}$ (max) can be estimated by the formula

\begin{matrix} a g g_{est} (\max) = d_{u} + \frac{m_{v} - 1}{m_{v}} Δ^{*} . \end{matrix}

(13)

Hence, combining formulae (12) and (13), formula (9) becomes

\begin{matrix} β (\max) = \frac{| d_{u} - a g g_{est} (\min) |}{Δ^{*}} = \frac{m_{v} - 1}{m_{v}} \end{matrix}

(14)

\begin{matrix} β (\max) = \frac{| d_{u} - a g g_{est} (\max) |}{Δ^{*}} = \frac{m_{v} - 1}{m_{v}} . \end{matrix}

(15)

That is, we can obtain $β \leq (m_{v} - 1) / m_{v}$ .

Therefore, if $β \leq (m_{v} - 1) / m_{v}$ , node u can consider that the $a g g^{'}$ sent by its clusterhead node v is normal, otherwise abnormal. If the $c n t_{abnormal}$ for its clusterhead node is satisfied, $c n t_{abnormal} > c n t_{abthr}$ , then node u marks its clusterhead node v as an invalid compromised node and broadcasts an alarm message to its neighbors. Assume that the number of the valid cluster nodes is $m_{v}^{CL}$ in the cluster with v as clusterhead. When the number of alarm messages for v is above $⌈ (m_{v}^{CL} + 1) / 2 ⌉$ , a new clusterhead node will be reselected from those alarm nodes for that cluster and at the same time the new clusterhead node will notify the sink node of the abnormal node by sending an alarm message, which contains the $p r d - r n d$ field in (1) and signing information by each member node using their secret key shared with the sink node. However, after a detection period, if the $c n t_{abnormal}$ for its clusterhead node is satisfied, $c n t_{abnormal} ⩽ c n t_{abthr}$ , then thenode u updates $c n t_{abnormal}$ of v to zero.

Note that our method can alleviate or restrict a compromised clusterhead node to send incorrect aggregation results even if the number of compromised nodes becomes more than half of the number of its valid cooperative nodes via repeatedly excluding normal node(s) from the cluster by the compromised clusterhead. On the one hand, if the number of normal nodes excluded by the clusterhead node is above $⌈ (m_{v}^{CL} + 1) / 2 ⌉$ , as the abovementioned, a new clusterhead node will be selected and an alarm message will be sent to the sink node. Hence, for this case, the attack will fail. On the other hand, if the number of normal nodes excluded by the clusterhead node is below $⌈ (m_{v}^{CL} + 1) / 2 ⌉$ , then the compromised clusterhead sends incorrect aggregation results. For this case, as the MTD is a constant parameter, normal node(s) in the cluster will detect the abnormal results using our defense method. The larger the absolute difference between the sensed result of a normal node and the incorrect aggregation result is, the severer punishment the clusterhead will get; that is, the faster alarm message(s) will be broadcasted by normal node(s). In other words, if the attacker sends incorrect aggregation results freely, eventually, the number of alarm messages generated from the same cluster will be above $⌈ (m_{v}^{CL} + 1) / 2 ⌉$ . Then the compromised clusterhead node will be excluded from the network.

As nodes just need to listen at the slots of their clusterhead nodes, compared with other mechanisms, this mechanism can conserve plenty of energy at nodes, as shown in Table 1.

Table 1

Comparison of different listening mechanisms.

Listening mechanisms	Communication overhead for each member node	Criteria for abnormal readings evaluation
Wu et al. [15]	$m_{v}^{CL}$ -1 packets	N
Alzaid et al. [17]	$m_{v}^{CL}$ -1 packets	N
Qiu et al. [18]	$m_{v}^{CL}$ -1 packets	N
Boonsongsrikul et al. [19]	1 packet	N
Dong and Li [20]	Only $m_{v}^{CL}$ packets for monitoring nodes	N
Labraoui et al. [21]	$m_{v}^{CL}$ -1 packets	N
Our paper	1 packet	Y

5. Analysis for Aggregation Results under Attack

Without loss of generality, considering that a cluster has one clusterhead node v and $m_{v}^{CL}$ ( $m_{v}^{CL} \geq 3$ ) valid cluster nodes, the number of valid cooperative nodes of v is $m_{v}$ . Note that $m_{v}^{CL} \geq m_{v}$ , because some node(s) may be marked as invalid by v while not being excluded from the cluster. We assume that there are n normal member nodes in the cluster and $n \in [⌈ (m_{v}^{CL} + 1) / 2 ⌉, m_{v}^{CL}]$ . The readings set in this cluster are expressed as allSenData( $m_{v}^{CL}) = {d_{1}, d_{2}, \dots, d_{m_{v}^{CL}}}$ . For the average aggregation operation, if there is no attack, then the ideal aggregation result can be calculated by the formula

\begin{matrix} a g g_{ideal} = \frac{1}{m_{v}^{CL}} \sum_{i = 1}^{m_{v}^{CL}} d_{i} . \end{matrix}

(16)

Assume that, after sorting the readings sensed by the normal member nodes, we can obtain an ascending data sequence normalSenData $(n) = {d_{(1)}, d_{(2)}, \dots, d_{(n)}}$ .

5.1. Only Member Nodes Compromised

Considering a node w in the set W of the valid compromised member nodes and its reading $d_{w}$ , node w firstly modifies the $d_{w}$ and obtains $d_{w}^{*}$ and then sends the $d_{w}^{*}$ to its clusterhead node. To try not to be detected, at least $d_{w}^{*} \in [d_{w} - Δ^{*}, d_{w} + Δ^{*}]$ . Assume that $d_{w}^{*} = d_{w} + Δ_{w}$ , where $Δ_{w}$ is a modification value by a compromised node w. Therefore, if a cluster is being attacked by compromised member nodes but those compromised nodes have not been detected, then the aggregation result can be calculated by the formula

\begin{matrix} a g g_{bad} = \frac{1}{m_{v}^{CL}} [\sum_{i = 1}^{m_{v}^{CL}} d_{i} + \sum_{w \in W} Δ_{w}] . \end{matrix}

(17)

Combining (16), then (17) becomes

\begin{matrix} a g g_{bad} = a g g_{ideal} + \frac{1}{m_{v}^{CL}} \sum_{w \in W} Δ_{w} . \end{matrix}

(18)

From (18), we know that, in order to try to let the aggregation result deviate the ideal aggregation result, it needs $\forall Δ_{w} > 0$ or $\forall Δ_{w} < 0$ , and at the same time the absolute value of $Δ_{w}$ should be as large as possible. We have derived the theoretical upper bound for it.

Theorem 2.

The upper bound of the $| Δ_{w} |$ for node w to modify while probably not being detected is $| Δ_{w} | \leq 2 Δ^{*}$ .

Proof.

As mentioned earlier, the maximum and the minimum values sensed by normal member nodes are $d_{(1)}$ and $d_{(n)}$ , respectively. Based on $Δ^{*}$ , we can obtain the range of the readings at one round as shown in Figure 2.

From Figure 2, we know that $d_{w} \in [d_{(n)} - Δ^{*}, d_{(1)} + Δ^{*}]$ . In order to try not to be detected by its clusterhead node, after modifying the $d_{w}$ , $d_{w}^{*} \in [d_{(n)} - Δ^{*}, d_{(1)} + Δ^{*}]$ .

Consider the worst case, when $d_{w} = d_{(n)} - Δ^{*}$ or $d_{w} = d_{(1)} + Δ^{*}$ , the absolute value of $Δ_{w}$ can be maximum as shown in the formula

\begin{matrix} | Δ_{w} (\max) | = 2 Δ^{*} - (d_{(n)} - d_{(1)}) . \end{matrix}

(19)

Therefore, in the worst case, if $d_{(1)} = d_{(n)}$ , the $| Δ_{w} (\max) |$ can obtain the extreme value 2 $Δ^{*}$ .

Figure 2

The maximum $Δ_{w}$ for single modification.

Based on formulae (18) and (19), we have derived the theoretical upper bound of $| a g g_{bad} - a g g_{ideal} |$ .

Theorem 3.

The upper bound of the $| a g g_{b a d} - a g g_{i d e a l} |$ for compromised member nodes to affect the aggregation results while probably not being detected is $Δ^{*}$ if the number of compromised member nodes in the cluster with v as the clusterhead is not above $⌊ (m_{v}^{C L} - 1) / 2 ⌋$ .

Proof.

Similarly, in Theorem 2, in the worst case, each of the compromised member nodes modifies a maximum deviated value as shown in formula (19). Then formula (18) can be transformed to

\begin{matrix} | a g g_{bad} - a g g_{ideal} | = \frac{m_{v}^{CL} - n}{m_{v}^{CL}} [2 Δ^{*} - (d_{(n)} - d_{(1)})] . \end{matrix}

(20)

If $d_{(1)} = d_{(n)}$ and the number of compromised member nodes reaches the maximum $⌊ (m_{v}^{CL} - 1) / 2 ⌋$ , in other words, $n = ⌈ (m_{v}^{CL} + 1) / 2 ⌉$ , then we can transform formula (20) to

\begin{matrix} {| a g g_{bad} - a g g_{ideal} |}_{\max} = {\begin{cases} (1 - \frac{1}{m_{v}^{CL}}) Δ^{*}, & m_{v}^{CL} % 2! = 0; \\ (1 - \frac{2}{m_{v}^{CL}}) Δ^{*}, & otherwise . \end{cases} \end{matrix}

(21)

From formula (21), we know that if $m_{v}^{CL} \to + \infty$ , then ${| a g g_{bad} - a g g_{ideal} |}_{\max} \to Δ^{*}$ .

However, as mentioned earlier, as the communication between each member node with its clusterhead node adopts symmetric key mechanism, the compromised member node w cannot decrypt the data sent by normal member nodes. In order to stealthily modify its sensed data $d_{w}$ , from the above analysis, we know that node w obtains $d_{w}^{*} = d_{w} + Δ_{w}, | Δ_{w} | \in [0,2 Δ^{*}]$ . Due to the random feature of the modification result, if $d_{w}^{*} \notin [d_{(l)}, d_{(h)}]$ , then the clusterhead of node w can detect the abnormal modification.

According to the SSGF, affected by node w, the $d_{i}$ sent by a normal node i may also be deemed as abnormal data if $d_{i} \in [d_{(k)}, d_{(l)}) \cup (d_{(h)}, d_{(*)}]$ . However, the probability to deem a normal node as a compromised node is relatively small and we have derived the theoretical maximum value.

Theorem 4.

The average upper bound of the $c n t_{i}$ affected by compromised member nodes for normal node i is $c n t_{a b t h r}$ .

Proof.

Without loss of generality, considering a node w in the set Wof the valid compromised member nodes, assume that its $c n t_{abnormal}$ is $c n t_{w} = C_{w}$ . Then the average $c n t_{abnormal}$ of normal node i affected by w is $(1 / n) C_{w}$ . And the average $c n t_{abnormal}$ of normal node i affected by W is $c n t_{i} = (1 / n) \sum_{w \in W} C_{w}$ . Because $\forall C_{w} \leq c n t_{abthr}$ , $c n t_{i} \leq ((m_{v}^{CL} - n) / n) {c n t}_{abthr}$ .

Considering the worst case when the number of compromised member nodes is $⌊ (m_{v}^{CL} - 1) / 2 ⌋$ , in other words, $n = ⌈ (m_{v}^{CL} + 1) / 2 ⌉$ , $c n t_{i}$ can obtain the average maximum value $c n t_{i}$ (max), as shown in the formula

\begin{matrix} c n t_{i} (\max) = {\begin{cases} \frac{m_{v}^{CL} - 2}{m_{v}^{CL} + 2} c n t_{abthr}, & m_{v}^{CL} % 2 = 0; \\ \frac{m_{v}^{CL} - 1}{m_{v}^{CL} + 1} c n t_{abthr}, & otherwise . \end{cases} \end{matrix}

(22)

5.2. Clusterhead Node Compromised

To decide the compromised clusterhead node v as a malicious compromised node in a cluster, the number of the alarm nodes at least is equal to $⌈ (m_{v}^{CL} + 1) / 2 ⌉$ . For the purpose of trying to make the aggregation result $a g g^{'}$ deviate the ideal aggregation result $a g g_{ideal}^{'} = (1 / n) \sum_{i = 1}^{n} d_{(i)}$ and at the same time not being detected, the compromised clusterhead node can let $⌈ (m_{v}^{CL} + 1) / 2 ⌉ - 1$ member nodes detect the abnormal behaviors.

Note that although the compromised clusterhead node v may mark normal, uncompromised nodes as invalid, from the view of v, it does not introduce benefit to it by the following reasons. First, those removed normal nodes will report alarm messages against it if they are framed by v. Second, $m_{v}^{CL} \geq m_{v}$ , if v sends forged aggregation values; according to (11), the less $m_{v}$ is, the more easily the normal valid cooperative nodes of v will detect its abnormal aggregation values. Hence, in the analysis of this subsection, from the view of the compromised clusterhead node v, we consider the case when $m_{v} = m_{v}^{CL}$ .

If $m_{v}^{CL}$ is even, then we can compute the range of $a g g^{'}$ by formula (23) for the compromised clusterhead node, in which it may not be detected as a malicious node:

\begin{matrix} | d_{(m_{v}^{CL} / 2 + 1)} - a g g^{'} | \leq \frac{m_{v}^{CL} - 1}{m_{v}^{CL}} Δ^{*}, \\ | d_{(n - m_{v}^{CL} / 2)} - a g g^{'} | \leq \frac{m_{v}^{CL} - 1}{m_{v}^{CL}} Δ^{*} . \end{matrix}

(23)

Because $n - m_{v}^{CL} / 2 \leq m_{v}^{CL} / 2 + 1$ , according to formula (23), we can obtain

\begin{matrix} a g g^{'} \in [d_{(n - m_{v}^{CL} / 2)} - \frac{m_{v}^{CL} - 1}{m_{v}^{CL}} Δ^{*}, d_{(m_{v}^{CL} / 2 + 1)} + \frac{m_{v}^{CL} - 1}{m_{v}^{CL}} Δ^{*}] . \end{matrix}

(24)

Similarly, if $m_{v}^{CL}$ is odd, we can obtain

\begin{array}{l} a g g^{'} \in [d_{(n - (m_{v}^{CL} + 1) / 2 + 1)} - \frac{m_{v}^{CL} - 1}{m_{v}^{CL}} Δ^{*}, d_{((m_{v}^{CL} + 1) / 2)} \\ + \frac{m_{v}^{CL} - 1}{m_{v}^{CL}} Δ^{*}] . \end{array}

(25)

From the range of $a g g^{'}$ , we know that when $n = ⌈ (m_{v}^{CL} + 1) / 2 ⌉$ the range of $a g g^{'}$ can reach the maximum range:

\begin{matrix} a g g^{'} \in [d_{(1)} - \frac{m_{v}^{CL} - 1}{m_{v}^{CL}} Δ^{*}, d_{(n)} + \frac{m_{v}^{CL} - 1}{m_{v}^{CL}} Δ^{*}] . \end{matrix}

(26)

Based on the range of $a g g^{'}$ , we have derived the theoretical upper bound of $| a g g^{'} - a g g_{ideal}^{'} |$ .

Theorem 5.

The upper bound of $| a g g^{'} - a g g_{i d e a l}^{'} |$ affected by the compromised clusterhead node v is $| 2 - 1 / m_{v}^{C L} - 1 / ⌈ (m_{v}^{C L} + 1) / 2 ⌉ | Δ^{*}$ when it may not be detected as a malicious node.

Proof.

The ideal aggregation can be calculated by formula (27) or (28), where $Δ_{(i, j)} = d_{(i)} - d_{(j)}$ :

\begin{matrix} a g g_{ideal}^{'} = d_{(1)} + \frac{1}{n} \sum_{j = 2}^{n} Δ_{(j, 1)}, \end{matrix}

(27)

\begin{matrix} a g g_{ideal}^{'} = d_{(n)} - \frac{1}{n} \sum_{j = 1}^{n - 1} Δ_{(n, j)} . \end{matrix}

(28)

Consider the worst case when $n = ⌈ (m_{v}^{CL} + 1) / 2 ⌉$ ; one reading sensed by a normal member node is the maximum value $d_{\max}$ or minimum value $d_{\min}$ and the readings sensed by other normal member nodes are the same and equal to $d_{\max} - Δ^{*}$ or $d_{\min} + Δ^{*}$ . Then we can prove it as follows:

\begin{array}{l} | a g g^{'} - a g g_{ideal}^{'} | \\ \leq | d_{(1)} - \frac{m_{v}^{CL} - 1}{m_{v}^{CL}} Δ^{*} - d_{(1)} - \frac{1}{n} \sum_{j = 2}^{n} Δ_{(j, 1)} | \\ = | \frac{m_{v}^{CL} - 1}{m_{v}^{CL}} Δ^{*} + \frac{1}{n} \sum_{j = 2}^{n} Δ_{(j, 1)} | \\ \leq | \frac{m_{v}^{CL} - 1}{m_{v}^{CL}} Δ^{*} + \frac{n - 1}{n} Δ^{*} | = | 2 - \frac{1}{m_{v}^{CL}} - \frac{1}{n} | Δ^{*} \\ = | 2 - \frac{1}{m_{v}^{CL}} - \frac{1}{⌈ (m_{v}^{CL} + 1) / 2 ⌉} | Δ^{*}, \\ | a g g^{'} - a g g_{ideal}^{'} | \\ \leq | d_{(n)} + \frac{m_{v}^{CL} - 1}{m_{v}^{CL}} Δ^{*} - d_{(n)} + \frac{1}{n} \sum_{j = 1}^{n - 1} Δ_{(n, j)} | \\ = | \frac{m_{v}^{CL} - 1}{m_{v}^{CL}} Δ^{*} + \frac{1}{n} \sum_{j = 1}^{n - 1} Δ_{(n, j)} | \\ \leq | \frac{m_{v}^{CL} - 1}{m_{v}^{CL}} Δ^{*} + \frac{n - 1}{n} Δ^{*} | = | 2 - \frac{1}{m_{v}^{CL}} - \frac{1}{n} | Δ^{*} \\ = | 2 - \frac{1}{m_{v}^{CL}} - \frac{1}{⌈ (m_{v}^{CL} + 1) / 2 ⌉} | Δ^{*} . \end{array}

(29)

However, the upper bound in Theorem 5 can hardly happen, because it can only happen when one node obtains the maximum or minimum value and the other nodes obtain $d_{\max} - Δ^{*}$ or $d_{\min} + Δ^{*}$ at all rounds, which is impossible.

6. Simulation Results

We present the simulation results of SSGF, including the detection ratio, false positive ratio, the accuracy of aggregation results, the communication overhead for monitoring, and aggregation results for a slow poisoning attack. For comparison with SSGF, we also implemented SELDA in [16], RSDA+, which is based on RSDA in [17] and is used for clusterhead nodes monitoring their member nodes, and SAT+, which is based on SAT in [15] and is used for clusterhead nodes monitoring their member nodes only considering the temporal correlation as [15]. All the experiments are simulated with the Castalia simulator [30], which is a simulator for WSNs and other low power embedded devices networks and is based on the OMNeT++ [31] platform.

We consider a WSN with 50 member nodes and a clusterhead node. The locations of the member nodes are generated randomly within a 40-by-40 area, with a uniform distribution for their coordinates. The clusterhead node lies in the centre of the deployment area. For each member node u, a random value $d_{u}$ following uniform distribution from [10.0, 12.0] is generated to simulate its real-time reading and hence $Δ^{*} = 2.0$ . Note that, in real scenarios, the readings of a node may change the range to [16.0, 18.0] due to the variations about the monitored environment over time. However, the evaluated results are not affected by that evolution. Without a special mention, for each compromised member node w, the data sent by it is $d_{w} - f^{*} \times Δ^{*}$ , where $f^{*}$ is a random value controlled by a parameter f and generated from the uniform distribution [ $- f, 0$ ]. The higher the value of f is, the larger the distortion of the readings generated by malcompromised nodes will probably be. The punishment base is $λ = 2.0$ . A detection period contains 100 rounds. We repeated the experiment 50 times and all the results are obtained by computing the average of all corresponding results.

6.1. The Detection Ratio and False Positive Ratio in a Detection Period

In this subsection, we will present the experimental results of the detection ratio and the false positive ratio in one detection period with different numbers of compromised member nodes $c n$ and different f when $c n t_{abthr} = 20$ . The results are shown in Figures 3–7.

Figure 3

The detection ratio when $c n = 15$ , $f = 1$ , and $c n t_{abthr} = 20$ .

Figure 4

The detection ratio when $c n = 20$ , $f = 1$ , and $c n t_{abthr} = 20$ .

Figure 5

The detection ratio when $c n = 15$ , $f = 2$ , and $c n t_{abthr} = 20$ .

Figure 6

The detection ratio when $c n = 20$ , $f = 2$ , and $c n t_{abthr} = 20$ .

Figure 7

The false positive ratio when $c n = 20$ , $f = 1$ , and $c n t_{abthr} = 20$ .

Figures 3 and 4 show the results of the detection ratio when $f = 1$ and the fraction of compromised nodes is 30% and 40%, respectively. It can be seen that in both cases the detection ratios for our method gradually reach 100%, while the detection ratios for both SELDA and RSDA+ remain zero, at the end of a detection period. This means that when compromised nodes send malmodified readings in [8.0, 12.0], both SELDA and RSDA+ methods cannot detect any compromised nodes, while our method can detect all compromised nodes. This is because in both SELDA and RSDA+ methods a reading in $[d_{mid} - Δ^{*}, d_{mid} + Δ^{*}]$ is viewed as normal reading, where $d_{mid}$ is the median of the received readings at the clusterhead node. Hence, the clusterhead node cannot detect an abnormal behavior when a malmodified reading is in that range. At the same time, we can see that, with the increasing of the number of compromised member nodes, we need more rounds to detect all compromised nodes.

Figures 5 and 6 show the results for another attack behavior, in which the readings sent by compromised nodes are between 6.0 and 12.0. It can be seen that in both cases when the fraction of compromised nodes is 30% and 40%, respectively, our method is faster than both SELDA and RSDA+ methods in detecting all compromised nodes. This is because both SELDA and RSDA+ methods adopt the same punishment strategy for all abnormal readings. That is, if a reading sent from a node is detected as an abnormal reading, then its $c n t_{abnormal}$ will be increased by one. However, in our method, besides using the above strategy to punish small distortion, we also use a punishment base λ to severely punish a node when the sent reading from it is satisfied: $α > 1$ . This can also force compromised nodes to send more real readings in order to avoid being detected.

Figure 7 shows the false positive ratio when $c n = 20$ and $f = 1$ . In the other three cases, the false positive ratios remain zero. We can see that our method keeps the false positive ratio below 0.3% even when 40% of nodes are compromised and the distortion for readings is very small with $c n t_{abthr} = 20$ . We also repeated the similar experiments with $c n t_{abthr} = 30$ and observed that the false positive ratios remained zero while the detection ratio could reach 99% as shown in Figure 8.

Figure 8

The detection ratio when $c n = 20$ , $f = 1$ , and $c n t_{abthr} = 30$ .

In conclusion, compared with SELDA and RSDA+, our method outperforms them in terms of detection speed and/or detection ratio for different cases. At the same time, the false positive ratio can remain zero by setting an appropriate $c n t_{abthr}$ while keeping a high detection ratio.

6.2. Aggregation Results

In this subsection, we will present the aggregation results of the above experiments. The results are shown in Figures 9, 10, 11, and 12, in which “all” means that the clusterhead node takes all the data for aggregation and “good” is the result by aggregating the data only from normal member nodes.

Figure 9

Aggregation results when $c n = 15$ , $f = 1$ , and $c n t_{abthr} = 20$ .

Figure 10

Aggregation results when $c n = 20$ , $f = 1$ , and $c n t_{abthr} = 20$ .

Figure 11

Aggregation results when $c n = 15$ , $f = 2$ , and $c n t_{abthr} = 20$ .

Figure 12

Aggregation results when $c n = 20$ , $f = 2$ , and $c n t_{abthr} = 20$ .

From Figures 9–12, we know that the results of both our method and the comparison methods have a higher accuracy than the “all” situation. However, the results from our scheme are gradually consistent with the “good” situation with the increasing of the roundfor the four cases in Figures 9–12, while this happens for two comparison methods (i.e., SELADA and RSDA+) only when the distortion of the data sent by compromised nodes is high as shown in Figures 11 and 12. At the same time, the speed of our method to be consistent with the “good” situation is faster than SELADA and RSDA+. This is because our method can gradually detect and filter out all compromised nodes for the four cases in Figures 9–12, while SELADA and RSDA+ methods can only do this with a slower speed for the two cases in Figures 11 and 12, as described in Section 6.1. In conclusion, compared with SELADA and RSDA+, our method outperforms them in terms of the accuracy of the aggregation results for different cases on the whole.

At the same time, we have compared the resilient aggregation methods suggested by Wagner in [22] (i.e., trimming and median) and RANBAR in [25] with SSGF. The idea of trimming is that if there are some bogus readings, then we should ignore the highest 5% and the lowest 5% of the readings (5% trimming) and calculate the average of the remaining readings as the estimation of the real average. In other words, it only works well if the proportion of compromised readings stays below 5%. In our experiments, there are 30% and 40% of compromised nodes in the cluster for two situations, respectively. So, some bogus readings are considered as valid during the aggregation procedure; thus, the distortion of the aggregation results of trimming is relatively high. The median is defined as the middle element(s) of the sorted readings. In fact, it is the extreme case of the trimming method (i.e., 49% trimming). Although the median method excludes all the compromised readings, it also excludes the majority of the real readings. Hence, the accuracy of the results also declines. The idea of RANBAR is to construct the consensus set for aggregation by filtering out the readings which are not satisfied with some distribution model randomly established based on the raw readings. The size of the initial set is 1, the maximum permitted number of iterations is 15, and the error tolerance is 0.1. Since the method to construct the consensus set is random and the set is considered as valid if the majority of readings are included in it, some real readings may be not included in the set while some bogus reading may be included. Thus, the accuracy of the results also declines. In conclusion, compared with trimming, median, and RANBAR, SSGF also outperforms them on the whole.

6.3. The Communication Overhead for Monitoring

In this subsection, we will present the experimental results of the communication overhead for monitoring. The results are shown in Figures 13 and 14, in which “cmp- $f = 1$ ” and “cmp- $f = 2$ ” are the results from the SELDA and RSDA+ methods when $f = 1$ and $f = 2$ , respectively.

Figure 13

The communication overhead with different $c n t_{abthr}$ and $c n = 15$ .

Figure 14

The communication overhead with different $c n t_{abthr}$ and $c n = 20$ .

Figures 13 and 14 show the results with 30% and 40% compromised nodes, respectively. From the results, we can see that, according to our monitoring mechanism, the communication overhead for monitoring by each node is obviously less than that of the SELDA and RSDA+ methods for all cases. This is because in our method each member node only needs to monitor its clusterhead node to send aggregation results, while for the SELDA and RSDA+ methods each node needs to monitor other nodes in the same cluster.

6.4. Aggregation Results for a Slow Poisoning Attack

In this subsection, we will present the comparison experimental results with SAT+ about the accuracy of the aggregation results against a slow poisoning attack, in which the compromised nodes slowly change readings sent to the clusterhead node. The reading sent by a compromised node w at r round $d_{w} (r)$ depends on the reading $d_{w} (r - 1)$ sent by it at $r - 1$ round. More specifically, $d_{w} (r) = d_{w} (r - 1) - 0.1 \times f^{*} \times Δ^{*}$ . That is, this case only considers the temporal correlation of sensory data as [15].

Figure 15 shows the comparison results of our method with SAT+ when 40% of the nodes are compromised, $f = 1$ , and $c n t_{abthr} = 20$ . It can be seen that the results of our scheme are gradually close to the “good” situations after 20 rounds, while the results of SAT+ are gradually deviated from the result of the “good” situation. This is because our method considers both the temporal correlation and the spatial correlation while SAT+ only considers the former. This indicates that our method can defend against this kind of poisoning attack while SAT+ cannot.

Figure 15

Aggregation results for a slow poisoning attack when $c n = 20$ , $f = 1$ , and $c n t_{abthr} = 20$ .

7. Conclusions

In order to defend against compromised member nodes and aggregator nodes simultaneously during data aggregation in WSNs with low communication overhead, we proposed a mutual defense scheme for secure data aggregation. It contains a secure aggregation and defense mechanism SSGF for clusterhead nodes to defend against their member nodes injecting forged readings and a TDMA-based listening mechanism for member nodes to defend against their clusterhead nodes generating incorrect aggregation results. It also provides security services including the integrity, freshness, and authentication via a secure data packet transmission scheme. Considering that the readings sent by neighbor nodes exhibit temporal and spatial correlation, we defined the maximum tolerant difference (MTD) constraint parameter. Based on the MTD, we gave the quantitative criteria for abnormal readings evaluation. Moreover, we analyzed and proved the worst aggregation results that compromised nodes can produce. The extensive simulation results also indicated the feasibility and efficiency of our scheme. Compared with existing methods, our method can achieve higher accuracy of the aggregation results while being with lower communication overhead for monitoring.

There are a number of directions that are worth studying in the future. First, in this paper, we do not consider the colluding attacks launched by compromised clusterhead nodes. However, in practice, multiple compromised clusterhead nodes may be able to work in collusion to modify messages. This presents interesting challenges to our approach. Second, in this paper, it only covers the average aggregation operation and it is not suitable for event detection applications where the event happens contingently and can only be detected by a small number of nodes each time. Studying the average aggregation operation and other aggregation operations such as minimum, maximum, and counting for more broad applications will be an interesting research direction.

Footnotes

Notation Summary

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

This work was supported by the National Natural Science Foundation of China under Grant no. 60873199. The authors are grateful to the anonymous reviewers for their insightful comments.

References

Ozdemir

Xiao

Secure data aggregation in wireless sensor networks: a comprehensive overview

Computer Networks 2009 53 12 2022 2037

2-s2.0-67549118456

10.1016/j.comnet.2009.02.023

Alzaid

Foo

Nieto

J. M. G.

Park

D. G.

A taxonomy of secure data aggregation in wireless sensor networks

International Journal of Communication Networks and Distributed Systems 2012 8 1-2 101 148

2-s2.0-84857336498

10.1504/IJCNDS.2012.044325

Przydatek

Song

Perrig

SIA: secure information aggregation in sensor networks

Proceedings of the 1st International Conference on Embedded Networked Sensor Systems (SenSys ′03)

November 2003

255 265

2-s2.0-18844457825

Chan

Perrig

Przydatek

Song

SIA: secure information aggregation in sensor networks

Journal of Computer Security 2007 15 1 69 102

2-s2.0-33845808851

Girao

Westhoff

Schneider

CDA: concealed data aggregation for reverse multicast traffic in wireless sensor networks

Proceedings of the IEEE International Conference on Communications (ICC ′05)

May 2005

Seoul, Republic of Korea

3044 3049

2-s2.0-24144459865

Westhoff

Girao

Acharya

Concealed data aggregation for reverse multicast traffic in sensor networks: encryption, key distribution, and routing adaptation

IEEE Transactions on Mobile Computing 2006 5 10 1417 1431

2-s2.0-33748351402

10.1109/TMC.2006.144

Castelluccia

Mykletun

Tsudik

Efficient aggregation of encrypted data in wireless sensor networks

Proceedings of the 2nd Annual International Conference on Mobile and Ubiquitous Systems-Networking and Services (MobiQuitous ′05)

July 2005

109 117

2-s2.0-33749525209

10.1109/MOBIQUITOUS.2005.25

Castelluccia

Chan

A. C.-F.

Mykletun

Tsudik

Efficient and provably secure aggregation of encrypted data in wireless sensor networks

ACM Transactions on Sensor Networks 2009 5 3 1 36

2-s2.0-67651030465

10.1145/1525856.1525858

Zhou

Yang

An efficient secure data aggregation based on homomorphic primitives in wireless sensor networks

International Journal of Distributed Sensor Networks 2014 2014 11

962925

10.1155/2014/962925

10.

Çam

Özdemir

Nair

Muthuavinashiappan

Ozgur Sanli

Energy-efficient secure pattern based data aggregation for wireless sensor networks

Computer Communications 2006 29 4 446 455

2-s2.0-32644435647

10.1016/j.comcom.2004.12.029

11.

Zhang

Liu

Das

S. K.

Secure data aggregation in wireless sensor networks: a watermark based authentication supportive approach

Pervasive and Mobile Computing 2008 4 5 658 680

2-s2.0-51649105341

10.1016/j.pmcj.2008.05.005

12.

Wang

Qin

Liu

An energy-efficient and scalable secure data aggregation for wireless sensor networks

International Journal of Distributed Sensor Networks 2013 2013 11

843485

10.1155/2013/843485

13.

Yoon

Jang

Kim

Chang

A signature-based data security technique for energy-efficient data aggregation in wireless sensor networks

International Journal of Distributed Sensor Networks 2014 2014 10

272537

10.1155/2014/272537

14.

Hur

Lee

Hong

S. M.

Yoon

Trust management for resilient wireless sensor networks

Proceeding of the 8th International Conference Information Security and Cryptology (ICISC ’05)

December 2006

Seoul, Republic of Korea

56 68

15.

Dreef

Sun

Xiao

Secure data aggregation without persistent cryptographic operations in wireless sensor networks

Ad Hoc Networks 2007 5 1 100 111

2-s2.0-33749986533

10.1016/j.adhoc.2006.05.009

16.

Ozdemir

Secure and reliable data aggregation for wireless sensor networks

Proceedings of the 4th International Conference on Ubiquitous Computing Systems (UCS ’07)

November 2007

Tokyo, Japan

102 109

17.

Alzaid

Foo

Nieto

J. G.

RSDA: reputation-based secure data aggregation in wireless sensor networks

Proceedings of the 9th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT ′08)

December 2008

Dunedin, New Zealand

419 424

2-s2.0-58249109897

10.1109/PDCAT.2008.52

18.

Qiu

Zheng

Chen

Building representative-based data aggregation tree in wireless sensor networks

Mathematical Problems in Engineering 2010 2010 11

2-s2.0-77951493264

10.1155/2010/732892

732892

19.

Boonsongsrikul

Lhee

K.-S.

Hong

Securing data aggregation against false data injection in wireless sensor networks

Proceedings of the 12th International Conference on Advanced Communication Technology: ICT for Green Growth and Sustainable Development (ICACT ′10)

February 2010

Seoul, Republic of Korea

29 34

2-s2.0-77952590472

20.

Dong

Secure data aggregation approach based on monitoring in wireless sensor networks

China Communications 2012 9 6 14 27

2-s2.0-84866181065

21.

Labraoui

Gueroui

Aliouat

Petit

Reactive and adaptive monitoring to secure aggregation in wireless sensor networks

Telecommunication Systems 2013 54 1 3 17

2-s2.0-84879858983

10.1007/s11235-013-9712-3

22.

Wagner

Resilient aggregation in sensor networks

Proceedings of the ACM Workshop on Security of Ad Hoc and Sensor Networks (SASN ′04)

October 2004

Washington, DC, USA

78 87

2-s2.0-14844300164

23.

Roy

Conti

Setia

Jajodia

Secure data aggregation in wireless sensor networks

IEEE Transactions on Information Forensics and Security 2012 7 3 1040 1052

24.

Roy

Conti

Setia

Jajodia

Secure data aggregation in wireless sensor networks: filtering out the attacker's impact

IEEE Transactions on Information Forensics and Security 2014 9 4 681 694

25.

Buttyán

Schaffer

Vajda

RANBAR: RANSAC-based resilient aggregation in sensor networks

Proceedings of the 4th ACM Workshop on Security of ad hoc and Sensor Networks (SASN ′06)

October 2006

Alexandria, Va, USA

83 90

2-s2.0-34547471023

10.1145/1180345.1180356

26.

Sun

Peng

Ning

Wang

Secure distributed cluster formation in wireless sensor networks

Proceedings of the 22nd Annual Computer Security Applications Conference (ACSAC ′06)

December 2006

Washington, DC, USA

131 140

2-s2.0-38349181460

10.1109/ACSAC.2006.46

27.

Zhu

Setia

Jajodia

LEAP: efficient security mechanisms for large-scale distributed sensor networks

Proceedings of the 10th ACM Conference on Computer and Communications Security (CCS ′03)

October 2003

Washington, DC, USA

62 72

2-s2.0-10044284351

28.

Chadha

Yonghe

Das

S. K.

Group key distribution via local collaboration in wireless sensor networks

Proceedings of the 2nd Annual IEEE Communications Society Conference on Sensor and AdHoc Communications and Networks (SECON ′05)

September 2005

Santa Clara, Calif, USA

46 54

2-s2.0-33845507838

10.1109/SAHCN.2005.1556863

29.

Perrig

Szewczyk

Tygar

J. D.

Wen

Culler

D. E.

SPINS: security protocols for sensor networks

Wireless Networks 2002 8 5 521 534

2-s2.0-0036738266

10.1023/A:1016598314198

30.

Castalia Home: Castalia [EB/OL] 2013, http://castalia.research.nicta.com.au/index.php/en

31.

OMNeT++ Community. OMNeT++ [EB/OL] 2013, http://www.omnetpp.org