An Efficient Data Aggregation Protocol Concentrated on Data Integrity in Wireless Sensor Networks

Abstract

Wireless sensor networks consist of a great number of sensor nodes with strictly limited computation capability, storage, communication resources, and battery power. Because they are deployed in remote and hostile environments and hence are vulnerable to physical attacks, sensor networks face many practical challenges. Data confidentiality, data integrity, source authentication, and availability are all major security concerns. In this paper, we focus on the very problem of preserving data integrity and propose an Efficient Integrity-Preserving Data Aggregation Protocol (EIPDAP) to guarantee the integrity of aggregation result through aggregation in sensor networks. In EIPDAP, base station can immediately verify the integrity of aggregation result after receiving the aggregation result and corresponding authentication information. However, to check integrity, most existing protocols need an additional phase which will consume a lot of energy and cause network delay. Compared with other related schemes, EIPDAP reduces the communication overhead per node to $O (Δ)$ , where $Δ$ is the degree of the aggregation tree for the network. To the best of our knowledge, EIPDAP has the most optimal upper bound on solving the integrity-preserving data aggregation problem.

1. Introduction

Wireless sensor networks (WSNs) have many security-critical applications such as real-time traffic monitoring, wildfire tracking, or military surveillance. In a sensor network, thousands of low-cost sensor nodes collectively monitor an area within a certain range and report their own data to the base station which distributes a data query. However, this would incur high communication overhead which cannot be afforded by sensor nodes. Data aggregation [1, 2] mechanisms are proposed to reduce the power consumption. Data aggregation poses security threat; many secure data aggregation protocols [3, 4] have been emerging over these years, which prove to be secure and considerably improve the resource utilization.

Although data confidentiality could guarantee that legal parties obtain plain data without being leaked out to adversaries, it does not protect data from being altered [5–7]. In this paper, we focus on the problem of preserving data integrity through aggregation in sensor networks. Message authentication codes (MACs) are used in [8] to protect data integrity, while causing other problems, such as high communication overhead. In this paper, we present a provably secure sensor network integrity-preserving aggregation protocol based on the elliptic curve discrete logarithm problem for general networks with hierarchical aggregator topologies, assuming that adversaries are able to corrupt a (small) fraction of sensors. With the increasing of sensor node's computation capacity, public key cryptography, such as elliptic curve cryptosystems (ECC), is suitable for constrained environments such as WSN. In [9], authors propose secure data aggregation schemes using ECC to obtain data confidentiality and integrity in the data aggregation because of their smaller key size, faster computations, and reductions in processing power, storage space, and bandwidth. TinyECC is proposed by Liu and Ning [10] which provides ECC-based operations that can be flexibly configured and integrated into WSN applications.

An adversary can perform a variety of attacks. For example, a denial-of-service (DoS) attack can totally block the communication between sensor nodes and the base station. However, this attack is not concerned because it is detectable by the querier and solutions can be implemented to remedy this situation. In stealthy attack [4], the attacker's goal is to make the base station accept false aggregation results, which are significantly different from the true results determined by the measure values, while not being detected by the base station. Our goal is to prevent this kind of attack even when high-level aggregator is corrupted.

A number of protocols [11–13] have been proposed which focus on the problem that how can the base station obtain a good approximation of the aggregation result and how to obtain data integrity when a fraction of sensor nodes are compromised. One common sensor feature is the disproportionately high cost of transmitting information, as compared to performing local computation. For example, a Berkeley mote spends approximately the same amount of energy to compute 800 instructions as it does in sending a single bit of data. It thus becomes essential to reduce the number of bits forwarded by intermediate nodes, in order to extend the entire network's lifetime [14]. All the above schemes need to verify the integrity of aggregation result in an additional phase which consumes a lot of energy and causes network delay.

In this paper we propose EIPDAP, which can immediately verify the integrity of aggregation result after receiving the aggregation result and corresponding authentication information, hence significantly reducing energy consumption and communication delay which will be caused if the verification process is done through another query-and-forward phase.

The rest of the paper is organized as follows: in Section 2 we describe a survey of other approaches to integrity-preserving aggregation in sensor networks, in Section 3 more details about the problem we are trying to solve are discussed, in Section 4 we describe a new scheme that is, the centerpiece of our work, and in Section 5 the security properties and performance of our scheme are analyzed.

2. Related Work

There has been a number of works on preserving integrity in aggregation protocols for sensor networks. Many protocols have been proposed for the single-aggregator model [4, 13, 15]. But the aggregator in these schemes suffers from significantly high congestion and only reduces communications on the link between the aggregator and the base station. So this model is not scalable to large multihop sensor deployments.

Another significant work is introduced in [11]. The main idea of this approach is that each node sends its value, complement, and commitment up the aggregation tree and then a commitment would pass down the tree for a node to verify that if its value was added into the SUM aggregation and the complement of its data value was added into the COMPLEMENT aggregation. However, the scheme requires three phases. The delay aggregation strategy used in the second phase increases communication from $O (1)$ to $O (\log n)$ , computation from $O (1)$ to $O (q \log n)$ , where n is the number of nodes in the network and q is the number of forests in the commitment tree. The result-checking phase costs $O (Δ \log^{2} n)$ congestion. Frikken and Dougherty in [12] improves Chan's approach by reducing the maximum communication to $O (Δ \log n)$ .

A secure hop-by-hop data aggregation protocol SDAP for sensor networks is proposed in [13]. The authors believe that we should be more concerned about high-level nodes, since these nodes represent a large portion of the final result delivered to base station and there would be more catastrophic consequences if they are compromised. Hence, SDAP dynamically partitions the topology tree into multiple logical groups of similar sizes using a probabilistic approach, following the divide-and-conquer principle. In this way, fewer nodes are located under a high-level sensor node in a logical subtree resulting in reduced potential security threat by a compromised high-level node. SDAP introduces probability and attestation to the data result-checking; the communication required per node is $O (\log (n / n_{g}))$ . Because SDAP just let part nodes be attested, attestation algorithm cannot find all compromised nodes. By adding attestation paths can increase the detection probability, but it will increase communication cost.

Aggregate message authentication codes introduced by Katz and Lindell (CT-RSA 2008) [8] provided a new perspective of preserving integrity. In their construction, aggregating MAC simply computes the XOR of all the MACs into one value, the size of which is the same as an ordinary MAC. After receiving all the data and the final aggregate MAC, the base station uses secret keys shared with each node to compute a new aggregate MAC from these data and compares it with the received aggregate MAC. Although it remarkably reduces communication overhead we have seen in former protocols [11–13] and is easy to perform, it suffers from the “mix-and-match” attacks [16] in which the adversary can easily forge several types of aggregate combinations.

In [9], the authors proposed a new algorithm using homomorphic encryption and additive digital signatures to achieve confidentiality, integrity, and availability for in-network aggregation in WSN. However, the protocol cannot resist stealthy attack. We discuss concrete attack on the protocol due to Albath and Madria [9] in the appendix.

Besides, there have been several protocols designed for preserving the confidentiality of the aggregation results [17–19]. This issue is orthogonal to our work and is not considered in this paper.

3. Problem Model

This section contains the definitions of basic problems and includes discussion on the nodes' setup, the security infrastructure, and the attack model.

3.1. Network Assumptions

We assume a query-based sensor network with a large number of sensors and a powerful base station with transmission ranges covering the whole wireless sensor network can broadcast messages to all nodes directly. Before aggregation process, sensors will form a tree topology where base station locates at the root.

We further assume that the base station would broadcast an authenticated query before collecting data. If there is no aggregation tree, then an aggregation tree should be formed as the query has been sent to all nodes. Our protocol takes the structure of the aggregation tree as given. One method for constructing an aggregation tree is described in TaG [20].

Each node is sensing an integer value r that is in the range $(0, v]$ (we rule out “0” in defense of $θ_{i}$ which we will explain later) for some application-based value v. The goal is to return the SUM result with two tags proving that the SUM result has not been forged (even in the presence of malicious nodes). Due to resource constrains, all readings need to be aggregated by aggregators while being transmitted over a multihop path.

3.2. Security Infrastructure

We assume that each node i has a unique identifier $s_{i}$ , private keys $r_{i}, l_{i} \in Z_{p}$ and shares a private key ${s k}_{i}$ and a private point $θ_{i} \in$ cyclic elliptic group $E (Z_{p})$ with base station. ECC domain parameters including the generator point $G \in E (Z_{p})$ are preloaded in all nodes. In each node i we set two parameters $α_{i}$ and $β_{i}$ which will be used later:

\begin{matrix} α_{i} = r_{i} G, \\ β_{i} = r_{i} α_{i} . \end{matrix}

(1)

3.3. Attack Model and Security Goals

We consider a setting with a polynomially bounded adversary, which can physically access the sensors and read their interval values. The adversary is also restricted to corrupt a (small) fraction of nodes including the aggregators.

Once the adversary compromises a sensor node, it can obtain all the node's secret keys. An adversary can modify, forge, or discard messages or simply transmit false aggregate results, and its goal is to forge valid aggregate result to be accepted by the base station. The higher false aggregate result level is, more catastrophic consequence will be caused.

In this setting, we focus on stealthy attacks [4] where the attacker's goal is to make the base station accept false aggregate results while not being detected by base station. And our security goal is to prevent stealthy attacks. In particular, we want to guarantee that once the aggregate result has been accepted by the base station, it is indeed the real result aggregated by honest nodes.

Definition 1 (integrity-preserving aggregation algorithm).

An aggregation algorithm is integrity-preserving if, by tampering with the aggregation process, an adversary is unable to induce the base station to accept any forged aggregate result.

Since if a sensor node is compromised, the adversary can obtain all its confidential information (e.g., cryptographic keys) and send false data without being detected. In this paper we will focus on the situation where an aggregator is compromised and see whether it can forge a valid aggregate result.

In this paper, however, we do not address the denial-of-service attack where the adversary prevents the querier from getting any aggregation result at all; because nodes' not responding queries clearly indicate that something is wrong and solutions can be implemented to remedy this situation.

4. Our Work

In this section, we present a new approach, especially aiming to preserve integrity of the aggregation result. We first give an overview of this approach and then present the details.

4.1. Overview

The design of our algorithm is based on the elliptic curve discrete logarithm problem. The overall algorithm consists of three main phases: query dissemination, aggregation-commit, and result-checking.

In query dissemination phase, the base station broadcasts the query to the network. An aggregation tree, or a directed spanning tree over the network topology with the base station at the root, is formed as the query sent to all the nodes, if one is not already present in the network. Then the path-keys and edge-key for each node encrypted with the secret key shared between base station and node are sent to the corresponding node. Path-key and edge-key are calculated by the base station according to the network topology. We show the detail of the calculation of the path-key in Section 4.2.

In aggregation-commit phase, each sensor node collects raw data and computes two corresponding tags before sending them to their own parent node in the aggregation tree. After receiving all the messages from all child nodes, aggregator performs modulo addition operations over the three items and forwards the result to high-level aggregators until the base station.

In the result-checking phase, the base station verifies the integrity of the SUM aggregation with two aggregation tags. Compared with Chan's and Keith's approach, ours does not require any dissemination from top root node down to the leaf nodes which causes congestion $O (Δ \log^{2} n)$ in Chan's approach and $O (Δ \log n)$ in Keith's approach and energy cost in this phase.

4.2. The SUM Approach

4.2.1. Query Dissemination Phase

Before aggregation, the base station broadcasts an authenticated query to the network. The query request message contains a nonce N to prevent replay attack [1]. If there is no aggregation tree, an aggregation tree with the base station at the root will be formed as the query has been sent to all nodes. Then the tree information will be reported back to the base station. After the base station receives the tree information, it calculates the path-key for each node: for each aggregator or sensor node i, base station generates a key $bs$ , and calculates edge key according to one-way hash function F, where

\begin{matrix} k_{i - j} = F_{bs} (s_{i}, s_{j}, N), \end{matrix}

(2)

and node i is the parent of node j.

s_{i}

and

s_{j}

are unique identifiers of node i and node j.

For each sensor node i, base station also calculates two path-keys $k_{i, 1}$ and $k_{i, 2}$ as follows:

\begin{matrix} k_{i, 1} = \frac{θ}{k_{path}}, \end{matrix}

(3)

\begin{matrix} k_{i, 2} = \frac{l}{k_{path}}, \end{matrix}

(4)

where θ is a point in

E (Z_{p})

, l is an integer and they are both chosen by base station to enable data aggregation and prevent stealthy/replay attacks.

If the path from base station to sensor node i is 1-2-3-i, then $k_{path} = k_{1 - 2} k_{2 - 3} k_{3 - i}$ .

Finally the base station with transmission ranges covering the whole wireless sensor network directly broadcasts to node i the path-keys and edge-keys encrypted with the secret key ${s k}_{i}$ .

4.2.2. Data Aggregation Phase

In the query dissemination phase, each node has already identified their parents and the base station has the overall view of the aggregation tree.

In Figure 1, take paths BS-10-8-3-1 and BS-10-8-3-2, for instance, as sensor node, nodes 3, 1, and 2 each has a message, that is, to be passed to their parents. And the message has the following format:

\begin{matrix} 〈 x_{i}^{#}, α_{i}^{#}, β_{i}^{#} 〉, \end{matrix}

(5)

where

x_{i}^{#}

is the SUM aggregation over all sensor nodes in the subtree;

α_{i}^{#}

and

{β_{i}^{#}}_{}

are the first and second tag, respectively.

Figure 1

Aggregation phase. The nodes 1, 2, 3, 4, 5, 6, and 7 are sensor nodes, and the nodes 8, 9, and 10 are aggregators while node 3 works as both sensor node and aggregator. Without losing generality, we assume that every intermediate node is able to sense raw data and performs aggregation like node 3 does.

For nodes 1 and 2:

\begin{matrix} x_{1}^{#} = x_{1}, x_{2}^{#} = x_{2}, \\ α_{1}^{#} = x_{1} k_{1,1} + x_{1} β_{1} k_{1, 2}, \\ α_{2}^{#} = x_{2} k_{2, 1} + x_{2} β_{2} k_{2, 2}, \\ β_{1}^{#} = x_{1} β_{1} + θ_{1}, β_{3}^{#} = x_{3} β_{3} + θ_{3} . \end{matrix}

(6)

For aggregator/sensor node 3 with data $x_{3}$ , it first computes $α_{3}^{'}$ and $β_{3}^{'}$ as a sensor node:

\begin{matrix} x_{3}^{'} = x_{3}, α_{3}^{'} = x_{3} k_{3,1} + x_{3} β_{3} k_{3,2}, \\ β_{3}^{'} = x_{3} β_{3} + θ_{3} . \end{matrix}

(7)

Then node 1 and 2 send their data and tags to node 3. After receiving all messages from its subtree, node 3 works as aggregator to perform aggregation:

\begin{matrix} x_{3}^{#} = x_{1} + x_{2} + x_{3}^{'}, \\ α_{3}^{#} = k_{3 - 1} α_{1}^{#} + k_{3 - 2} α_{2}^{#} + α_{3}^{'}, \\ β_{3}^{#} = β_{1}^{'} + β_{2}^{'} + β_{3}^{'}, \end{matrix}

(8)

and sends

〈 x_{3}^{#}, α_{3}^{#}, β_{3}^{#} 〉

to node 8.

Aggregators 8 and 10 perform corresponding tasks:

\begin{matrix} x_{8}^{#} = x_{3} + x_{4} + x_{5}, x_{10}^{#} = x_{8}^{#} + x_{9}^{#}, \\ α_{8}^{#} = k_{8 - 3} α_{3}^{#} + k_{8 - 4} α_{4}^{#} + k_{8 - 5} α_{5}^{#}, \\ α_{10}^{#} = k_{10 - 8} α_{8}^{#} + k_{10 - 9} α_{9}^{#}, \\ β_{8}^{#} = β_{3}^{#} + β_{4}^{#} + β_{5}, β_{10}^{#} = β_{8}^{#} + β_{9}^{#}, \end{matrix}

(9)

where node 8 sends

〈 x_{8}^{#}, α_{8}^{#}, β_{8}^{#} 〉

to node 10, and node 10 sends

〈 x_{10}^{#}, α_{10}^{#}, β_{10}^{#} 〉

to base station.

4.2.3. Result-Checking Phase

The purpose of result-checking phase is to enable base station to verify that the integrity of SUM $x_{10}^{#}$ has not been violated. The verification is performed as follows.

Base station checks if

\begin{matrix} β_{10}^{#} - l^{- 1} k_{bs- 10} α_{8}^{#} = \sum_{}^{} θ_{i} - l^{- 1} θ x_{10}^{#}, \end{matrix}

(10)

where

〈 x_{10}^{#}, α_{10}^{#}, β_{10}^{#} 〉

is sent by node 10 to base station; θ, l,

k_{bs- 10}

θ_{i}

(i: from 1 to 7) are only known to base station and

l^{- 1}

is the inverse of l modulo which is the order q of the elliptic curve group

E (Z_{p})

Since

\begin{array}{l} β_{10}^{#} = β_{8}^{#} + β_{9}^{#} \\ = β_{3}^{'} + β_{4}^{#} + β_{5}^{#} + β_{6}^{#} + β_{7}^{#} \\ = β_{1}^{#} + β_{2}^{#} + β_{3}^{#} + β_{4}^{#} + β_{5}^{#} + β_{6}^{#} + β_{7}^{#} \\ = x_{1} β_{1} + θ_{1} + x_{2} β_{2} + θ_{2} + x_{3} β_{3} + θ_{3} + x_{4} β_{4} \\ + θ_{4} + x_{5} β_{5} + θ_{5} + x_{6} β_{6} + θ_{6} + x_{7} β_{7} + θ_{7} \\ = (x_{1} β_{1} + x_{2} β_{2} + x_{3} β_{3} + x_{4} β_{4} + x_{5} β_{5} + x_{6} β_{6} + x_{7} β_{7}) \\ + (θ_{1} + θ_{2} + θ_{3} + θ_{4} + θ_{5} + θ_{6} + θ_{7}), \\ l^{- 1} k_{bs- 10} α_{10}^{#} \\ = l^{- 1} k_{bs- 10} (k_{10 - 8} α_{8}^{#} + k_{10 - 9} α_{9}^{#}) \\ = l^{- 1} k_{bs- 10} \\ \times (k_{10 - 8} (k_{8 - 3} α_{3}^{#} + k_{8 - 4} α_{4}^{#} + k_{8 - 5} α_{5}^{#}) \\ + k_{10 - 9} (k_{9 - 6} α_{6}^{#} + k_{9 - 7} α_{7}^{#})) \\ = l^{- 1} k_{bs- 10} \\ \times (k_{10 - 8} (k_{8 - 3} (α_{3}^{'} + k_{3 - 1} α_{1}^{#} + k_{3 - 2} α_{2}^{#}) \\ + k_{8 - 4} α_{4}^{#} + k_{8 - 4} α_{5}^{#}) \\ + k_{10 - 9} (k_{9 - 6} α_{6}^{#} + k_{9 - 7} α_{7}^{#})) \\ = l^{- 1} k_{bs- 10} k_{10 - 8} k_{8 - 3} \\ \times (k_{3 - 1} (\frac{l}{k_{bs- 10} k_{10 - 8} k_{8 - 3} k_{3 - 1}}) x_{1} β_{1} \\ + k_{3 - 2} (\frac{l}{k_{bs- 10} k_{10 - 8} k_{8 - 3} k_{3 - 2}}) x_{2} β_{2} \\ + (\frac{l}{k_{bs- 10} k_{10 - 8} k_{8 - 3}}) x_{3} β_{3}) \\ + l^{- 1} k_{bs- 10} k_{10 - 8} \\ \times (k_{8 - 4} (\frac{l}{k_{bs- 10} k_{10 - 8} k_{8 - 4}}) x_{4} β_{4} \\ + k_{8 - 5} (\frac{l}{k_{bs- 10} k_{10 - 8} k_{8 - 5}}) x_{5} β_{5}) \\ + l^{- 1} k_{bs- 10} k_{10 - 9} \\ \times (k_{9 - 6} (\frac{l}{k_{bs- 10} k_{10 - 9} k_{9 - 6}}) x_{6} β_{6} \\ + k_{9 - 7} (\frac{l}{k_{bs- 10} k_{10 - 9} k_{9 - 7}}) x_{7} β_{7}) \\ = (x_{1} β_{1} + x_{2} β_{2} + x_{3} β_{3} + x_{4} β_{4} + x_{5} β_{5} + x_{6} β_{6} + x_{7} β_{7}) \\ + l^{- 1} θ (x_{1} + x_{2} + x_{3} + x_{4} + x_{5} + x_{6} + x_{7}) . \end{array}

(11)

We can say that base station accepts the SUM aggregation

x_{10}^{#}

, or the two tags will only verify if all tags are generated by honest nodes and aggregated correctly along the path.

Again, note that $l^{- 1}$ is the inverse of l modulo q which is the order of the elliptic curve group $E (Z_{p})$ , and θ, l, $k_{bs- 10}$ are only known to base station.

5. Analysis

This section discusses the security and congestion complexity of EIPDAP.

5.1. Overview

Once a node has been compromised, it is under the full control of the adversary which can record and inject messages as will. We also assume that the adversary can only corrupt a (small) fraction of nodes including the aggregators. Also, we do not concern denial-of-service attack. The following is the proof for security of EIPDAP.

Definition 2 (sensor node inconsistency).

Let $〈 x_{t}^{#}, α_{t}^{#}, β_{t}^{#} 〉$ be a message sent by sensor node t. There is an inconsistency at node t if either (1)

$α_{t}^{#} \neq x_{t} k_{t, 1} + x_{t} β_{t} k_{t, 2} or$

(2)

$β_{t}^{#} \neq x_{t} β_{t} + θ_{t}$ .

Definition 3 (sensor node forgery).

An adversary eavesdropping on sensor node i successfully forges a new message $〈 x_{i}^{*}, α_{i}^{*}, β_{i}^{*} 〉$ if (1)

$x_{i}^{*} \neq x_{i}^{#}$ ,

(2)

$α_{i}^{*} = x_{i}^{*} k_{i, 1} + x_{i}^{*} β_{i} k_{i, 2}$ ,

(3)

$β_{i}^{#} = x_{i}^{*} β_{i} + θ_{i}$ .

Since once a sensor node is compromised, the adversary can obtain all its confidential information (e.g., cryptographic keys) and send false data without being detected; however, we do not address this kind of forgery here.

Lemma 4.

Let the final SUM aggregation received by the base station be $x_{f i n a l}^{#}$ , then $S_{L} + μ \leq x_{f i n a l}^{#} \leq S_{L} + μ v$ where $S_{L}$ is the sum of the data values of all the legitimate nodes, and μ is the total number of corrupted nodes.

Proof .

As the conclusion is obvious here, so we do not prove in detail.

Lemma 5.

If elliptic curve discrete logarithm problem is hard, then it is not possible to forge a valid message as an honest sensor node for all eavesdropping probabilistic, polynomial adversaries.

Proof.

Let $〈 x_{i}^{#}, α_{i}^{#}, β_{i}^{#} 〉$ be an internal message sent by sensor node i to its parent.

Say adversary is eavesdropping on node i. In order to forge a valid message $〈 x_{i}^{*}, α_{i}^{*}, β_{i}^{*} 〉$ for $x_{i}^{*}$ , A can easily compute a valid $α_{i}^{*} = x_{i}^{*} x_{i}^{# - 1} α_{i}^{#}$ .

As $β_{i}^{#} = x_{i} β_{i} + θ_{i} = x_{i} y G$ for some integer y, a valid $β_{i}^{*}$ should be computed as

\begin{array}{l} β_{i}^{*} = x_{i}^{*} β_{i} + θ_{i} = x_{i}^{*} y^{*} G \\ = x_{i}^{*} y^{*} x_{i}^{- 1} y^{- 1} β_{i}^{#} G . \end{array}

(12)

Due to

θ_{i}

and

β_{i}

, adversary cannot forge

β_{i}^{*}

directly from

x_{i}^{*} β_{i} + θ_{i}

but to compute

x_{i}^{*} y^{*} x_{i}^{- 1} y^{- 1} β_{i}^{#} G

. A has

x_{i}^{*}

x_{i}

β_{i}^{#}

, and G; the factors it lack are y and

y^{*}

Calculating y from $β_{i}^{#} = x_{i} y G$ and $y^{*}$ from $β_{i}^{*} = x_{i}^{*} y^{*} G$ is ECDLP, which means calculating $β_{i}^{*}$ from $x_{i}^{*} y^{*} x_{i}^{- 1} y^{- 1} β_{i}^{#} G$ is hard.

Another concern rises when adversary keeps eavesdropping on sensor node i and records the messages sent by i. Assume that adversary has

\begin{matrix} {〈 x_{i # j}^{#}, α_{i # j}^{#}, β_{i # j}^{#} 〉 ∣ j \in [1, n]}, \end{matrix}

(13)

where

〈 x_{i # j}^{#}, α_{i # j}^{#}, β_{i # j}^{#} 〉

represents the message

〈 x_{i}^{#}, α_{i}^{#}, β_{i}^{#} 〉

, node i sends to parent in the jth query, and n is the number of queries. Note that in each query, every sensor node i chooses a new secret key

l_{i}

For all $j \in [1, n]$ , adversary has

\begin{matrix} β_{i # j}^{#} = β_{i # j} x_{i # j}^{#} + θ_{i} . \end{matrix}

(14)

Because the number of variable is the number of equations plus one, so adversary cannot solve equations in (14) to obtain

β_{i # j}

θ_{i}

In conclusion, the probability of an adversary successfully forging a new message $〈 x_{i}^{*}, α_{i}^{*}, β_{i}^{*} 〉$ when eavesdropping on sensor node i is negligible, completing the proof.

Definition 6 (aggregator inconsistency).

Let $〈 x_{t}^{#}, α_{t}^{#}, β_{t}^{#} 〉$ be an internal message aggregated by node t with two children u and v. Let $〈 x_{u}^{#}, α_{u}^{#}, β_{u}^{#} 〉$ and $〈 x_{v}^{#}, α_{v}^{#}, β_{v}^{#} 〉$ be two messages from u and v. There is an inconsistency at node t if (1)

$x_{t}^{#} \neq x_{t}^{'} + x_{u}^{#} + x_{v}^{#} or$

(2)

$α_{t}^{#} \neq α_{t}^{'} + α_{u}^{#} + α_{t}^{#} or$

(3)

$β_{t}^{#} \neq β_{u}^{'} + β_{u}^{#} + β_{t}^{#}$ .

Definition 7 (compromised aggregator forgery).

An adversary which compromised a aggregator j successfully forges a new aggregate result $〈 x_{j}^{*}, α_{j}^{*}, β_{j}^{*} 〉$ if (1)

$x_{j}^{*} \neq {x^{#}}_{j}$ ,

(2)

$α_{j}^{*} = α_{j}^{'} + k_{j - 1} α_{j 1}^{#} + k_{j - 2} α_{j 2}^{#} + \dots + k_{j - l} α_{j l}^{#}$ ( $α_{j}^{'} = 0$ if j does not sense data),

(3)

$β_{j}^{#} = β_{j}^{'} + β_{j 1}^{#} + β_{j 2}^{#} + \dots + β_{j l}^{#}$ ( $β_{j}^{'} = 0$ if j does not sense data), assuming aggregator j has l children $j 1, j 2, \dots, j l$ .

Lemma 8.

If elliptic curve discrete logarithm problem is hard, then it is not possible to forge a valid aggregate result for all probabilistic, polynomial adversaries even when a high-level aggregator is compromised.

Proof.

We assume that aggregator 10 has been compromised where an adversary is in complete control of node 10, obtaining all secret keys of node 10. Now an adversary attempts to forge SUM aggregation and two corresponding tags after eavesdropping several aggregations and records

\begin{matrix} {〈 x_{10 # i}^{#}, α_{10 # i}^{#}, β_{10 # i}^{#} 〉 ∣ i \in [1, n]}, \end{matrix}

(15)

where

〈 x_{10 # i}^{#}, α_{10 # i}^{#}, β_{10 # i}^{#} 〉

represents the message

〈 x_{10}^{#}, α_{10}^{#}, β_{10}^{#} 〉

, node 10 sends to base station in the

i th

query, and n is the number of queries. Note that in each query, every sensor node j chooses a new secret key

l_{j}

For all $i \in [1, n]$ , adversary has

\begin{array}{l} β_{10 # i}^{#} = l^{- 1} k_{bs- 10} α_{10 # i}^{#} \\ = (θ_{1} + θ_{2} + θ_{3} + θ_{4} + θ_{5} + θ_{6} + θ_{7}) \\ - l^{- 1} θ x_{10 # i}^{#} . \end{array}

(16)

Now adversary tries to forge a new message

〈 x_{10}^{*}, α_{10}^{*}, β_{10}^{*} 〉

which satisfies (14). Since node 10 is compromised, adversary has

x_{8}^{#}, x_{9}^{#}, x_{10}^{#}, k_{10 - 8}, k_{10 - 9}, α_{8}^{#}, α_{9}^{#}, β_{8}^{#}, β_{9}^{#}

in each query and the knowledge of the elliptic curve group

E (Z_{p}

Case 1. Intuitively, adversary tries to obtain l, $k_{bs- 10}$ , θ, and $\sum_{}^{} θ_{i}$ (i: from 1 to 7). However, this requires a powerful adversary which we do not concern here.

Case 2. Adversary tries to compute $l^{- 1} θ$ by multiplying $k_{i, 1}$ (equals $θ / k_{path}$ ) and the inverse of $k_{i, 1}$ (equals $l / k_{path}$ ). However, all path-keys and temporal key are encrypted before forwarding to nodes, so adversary cannot compute $l^{- 1} θ$ when node 10 only works as an aggregator.

Case 3. Aggregator 10 also senses data. So adversary can compute $l^{- 1} θ$ as in Case 2 and $l^{- 1} k_{bs- 10}$ which is the inverse of $l / k_{bs- 10}$ modulo q. Now adversary has

\begin{array}{l} β_{10}^{#} - α_{10}^{# #} \\ = θ_{1} + θ_{2} + θ_{3} + θ_{4} + θ_{5} + θ_{6} + θ_{7} - x_{10}^{# #} \\ ⟹ β_{10}^{# #} \\ = y_{1} (θ_{1} + θ_{2} + θ_{3} + θ_{4} + θ_{5} + θ_{6} + θ_{7}) = y_{2} G \end{array}

(17)

for some integer

y_{2}

. Similar to Lemma 4, since

\sum_{}^{} θ_{i}

(i: from 1 to 7) is kept secret from adversary and computing

y_{2}

from

β_{3}^{# #} = y_{2} G

is ECDLP, then

β_{10}^{# #}

cannot be forged either.

In all cases, adversary can only forge a new message $〈 x_{10}^{*}, α_{10}^{*}, β_{10}^{*} 〉$ with negligible probability, completing the proof.

Theorem 9.

EIPDAP is integrity-preserving.

Proof.

From Lemmas 5 and 8, we know that EIPDAP is secure against sensor node forgery in the presence of an eavesdropper and aggregator forgery when an aggregator is compromised. Thus, EIPDAP is integrity-preserving, completing the proof.

5.2. Congestion Complexity

The computational and memory costs are likely to be insignificant compared to communication [3, 14]. Higher computation surely causes more energy, but a Berkeley mote spends approximately the same amount of energy to compute 800 instructions as it does in sending a single bit of data [13, 20] in WSN.

Unlike general hard problems, there is no sub-exponential algorithm is known to solve the elliptic curve discrete logarithm problem (ECDLP), meaning that smaller parameters can be used in ECC than in other systems like RSA and DSA but with equivalent level of security. Because of their smaller key size, faster computations and reductions in processing power, storage space, and bandwidth, ECC is ideal for WSN. Although the use of elliptic curve cryptography incurs higher computational overhead than symmetric-key cryptography, our protocol is mainly designed to save energy.

In query dissemination phase, the base station collects aggregation tree information and broadcasts edge keys and path keys directly to the corresponding nodes. Collecting aggregation tree information costs each edge $O (1)$ congestion, and there is no congestion for sensor nodes and aggregators in broadcasting keys. In aggregation phase, each node forwards a message. The edge congestion in the aggregation tree is $O (1)$ . In result-checking phase, all operations are done in the base station, so there is no congestion in the aggregation tree. Congestion complexity comparisons with Chan's scheme, Keith's scheme, and SDAP are shown in Tables 1, 2, and 3.

Table 1

Edge congestion in the aggregation tree comparison, n is the number of the nodes, and $n_{g}$ is the group size.

	Query dissemination	Data aggregation	Result-checking
Chan's scheme	$O (1)$	$O (\log n$ )	$O (\log^{2} n$ )
Keith's scheme	$O (1)$	$O (\log n$ )	$O (\log n$ )
SDAP	$O (1)$	$O (Δ \log (n / n_{g})$ )	$O (Δ \log (n / n_{g})$ )
EIPDAP	$O (1)$	$O ($ 1)	0

Table 2

Node congestion in the aggregation tree comparison, Δ is the degree of the aggregation tree.

	Query dissemination	Data aggregation	Result-checking
Chan's scheme	$O (Δ$ )	$O (Δ \log n$ )	$O (Δ \log^{2} n$ )
Keith's scheme	$O (Δ$ )	$O (Δ \log n$ )	$O (Δ \log n$ )
SDAP	$O (Δ$ )	$O (Δ \log (n / n_{g})$ )	$O (Δ \log (n / n_{g})$ )
EIPDAP	$O (Δ$ )	$O (Δ$ )	0

Table 3

Aggregation tree congestion comparison.

	Query dissemination	Data aggregation	Result-checking
Chan's scheme	$O (n$ )	$O (n \log n$ )	$O (n \log^{2} n$ )
Keith's scheme	$O (n$ )	$O (n \log n$ )	$O (n \log n$ )
SDAP	$O (n$ )	$O (n$ )	$O (n \log n$ )
EIPDAP	$O (n$ )	$O (n$ )	0

By the comparison, we can conclude that EIPDAP has the minimum congestion and is much more energy efficient. Therefore it is much more suitable for power limited sensor networks.

6. Conclusion and Future Work

Protecting hierarchical data aggregation from losing integrity is a challenging problem in sensor networks. In this paper, we focus on the very problem of preserving data integrity and propose a novel approach to guarantee the integrity of aggregation result through aggregation in sensor networks. The main algorithm is based on performing modulo addition operation using ECC.

EIPDAP can immediately verify the integrity of aggregation result after receiving the aggregation result and corresponding authentication information, hence significantly reducing energy consumption and communication delay which will be caused if the verification phrase is done through another query-and-forward phase.

Compared with the other related schemes, our scheme reduces the communication required per node to $O (Δ)$ , where $Δ$ is the degree of the aggregation tree for the network. To the best of our knowledge, our scheme has the most optimal upper bound on solving the integrity-preserving data aggregation problem. Based on the elliptic curve discrete logarithm problem, we prove that EIPDAP is integrity-preserving.

In the future, we will first further enrich EIPDAP in detail. Second, we will focus on the possibility of reducing the number of secret keys shared between sensor nodes and base station or the keys broadcast to all nodes. Third, based on the proposed algorithm, we may consider meeting other security requirements, like data confidentiality, source authentication, and availability.

We anticipate that our work provides new perspective on preserving integrity of hierarchical aggregation and encourages other researchers to consider this approach.

Footnotes

Appendix

Acknowledgments

The authors would like to thank the anonymous reviewers and their coworkers for their valuable comments and useful suggestions.

References

Yick

Mukherjee

Ghosal

Wireless sensor network survey

Computer Networks 2008 52 12 2292 2330

2-s2.0-46449122114

10.1016/j.comnet.2008.04.002

Akkaya

Demirbas

Aygun

R. S.

The impact of data aggregation on the performance of wireless sensor networks

Wireless Communications and Mobile Computing 2008 8 2 171 193

2-s2.0-39649120580

10.1002/wcm.454

Evans

Secure aggregation for wireless networks

Proceedings of the Workshop on Security and Assurance in Ad Hoc Networks

January 2003

Orlando, Fla, USA

Przydatek

Song

Perrig

SIA: Secure information aggregation in sensor networks

Proceedings of the 1st International Conference on Embedded Networked Sensor Systems (SenSys ′03)

November 2003

255 265

2-s2.0-18844457825

Liu

Nguyen

Nahrstedt

Abdelzaher

Pda: privacy-preserving data aggregation in wireless sensor networks

Proceedings of the 26th IEEE International Conference on Computer Communications (INFOCOM ’07)

2007

2045 2053

Feng

Wang

Zhang

Ruan

Confidentiality protection for distributed sensor data aggregation

Proceedings of the 27th IEEE Communications Society Conference on Computer Communications (INFOCOM ′07)

April 2007

475 483

2-s2.0-51349116601

10.1109/INFOCOM.2007.20

Shi

Zhang

Liu

Zhang

PriSense: privacy-preserving data aggregation in people-centric urban sensing systems

Proceedings of the 29th IEEE International Conference on Computer Communications (INFOCOM ′10)

March 2010

IEEE

758 766

2-s2.0-77953308558

10.1109/INFCOM.2010.5462147

Katz

Lindell

A. Y.

Aggregate message authentication codes. Topics in cryptology

Proceedings of the Cryptographers' Track at the RSA Conference (CT-RSA ′08)

2008

Springer

155 169 Lecture Notes in Computer Science

Albath

Madria

Secure hierarchical data aggregation in wireless sensor networks

Proceedings of the IEEE Wireless Communications and Networking Conference (WCNC ′09)

April 2009

1 6

2-s2.0-70349179495

10.1109/WCNC.2009.4917960

10.

Liu

Ning

TinyECC: a configurable library for elliptic curve cryptography in wireless sensor networks

Proceedings of the International Conference on Information Processing in Sensor Networks (IPSN ′08)

April 2008

245 256

2-s2.0-51249087814

10.1109/IPSN.2008.47

11.

Chan

Perrig

Song

Secure hierarchical in-network aggregation in sensor networks

Proceedings of the 13th ACM Conference on Computer and Communications Security (CCS ′06)

November 2006

Alexandria, Va, USA

ACM Press

278 287

2-s2.0-33845747795

10.1145/1180405.1180440

12.

Frikken

K. B.

Dougherty

J. A.

An efficient integrity-preserving scheme for hierarchical sensor aggregation

Proceedings of the 1st ACM Conference on Wireless Network Security (WiSec ′08)

April 2008

Alexandria, Va, USA

ACM Press

68 76

2-s2.0-56749178915

10.1145/1352533.1352546

13.

Yang

Wang

Zhu

Cao

SDAP: a Secure hop-by-hop Data Aggregation Protocol for sensor networks

Proceedings of the 7th ACM International Symposium on Mobile Ad Hoc Networking and Computing (MOBIHOC ′06)

May 2006

Florence, Italy

ACM Press

356 367

2-s2.0-33748089962

14.

Castelluccia

Chan

A. C. F.

Mykletun

Tsudik

Efficient and provably secure aggregation of encrypted data in wireless sensor networks

ACM Transactions on Sensor Networks 2009 5 3 1 36

2-s2.0-67651030465

10.1145/1525856.1525858

15.

Deng

Han

Y. S.

Varshney

P. K.

A witness-based approach for data fusion assurance in wireless sensor networks

Proceedings of the Global Telecommunications Conference (GLOBECOM ′03)

December 2003

IEEE

1435 1439

2-s2.0-0842332407

16.

Eikemeier

Fischlin

Götzmann

J.-F.

Lehmann

Schröder

Wagner

History-free aggregate message authentication codes

6280

Proceedings of the 7th International Conference on Security and Cryptography for Networks

2010

Amalfi, Italy

309 328 Lecture Notes in Computer Science

17.

Girao

Westhoff

Schneider

CDA: concealed data aggregation in wireless sensor networks

Proceedings of the ACM Workshop on Wireless Security (WiSe ′04)

2004

Philadelphia, Pa, USA

ACM Press

18.

Castelluccia

Mykletun

Tsudik

Efficient aggregation of encrypted data in wireless sensor networks

Proceedings of the 2nd Annual International Conference on Mobile and Ubiquitous Systems -Networking and Services (MobiQuitous ′05)

July 2005

San Diego, Calif, USA

IEEE Computer Society

109 117

2-s2.0-33749525209

10.1109/MOBIQUITOUS.2005.25

19.

Çam

Özdemir

Nair

Muthuavinashiappan

Ozgur Sanli

Energy-efficient secure pattern based data aggregation for wireless sensor networks

Computer Communications 2006 29 4 446 455

2-s2.0-32644435647

10.1016/j.comcom.2004.12.029

20.

Madden

Franklin

M. J.

Hellerstein

J. M.

Hong

TAG: a tiny aggregation service for ad-hoc sensor networks

SIGOPS Operating Systems Review 2002 36 131 146