Abstract
Wireless sensor networks consist of a great number of sensor nodes with strictly limited computation capability, storage, communication resources, and battery power. Because they are deployed in remote and hostile environments and hence are vulnerable to physical attacks, sensor networks face many practical challenges. Data confidentiality, data integrity, source authentication, and availability are all major security concerns. In this paper, we focus on the very problem of preserving data integrity and propose an Efficient Integrity-Preserving Data Aggregation Protocol (EIPDAP) to guarantee the integrity of aggregation result through aggregation in sensor networks. In EIPDAP, base station can immediately verify the integrity of aggregation result after receiving the aggregation result and corresponding authentication information. However, to check integrity, most existing protocols need an additional phase which will consume a lot of energy and cause network delay. Compared with other related schemes, EIPDAP reduces the communication overhead per node to
1. Introduction
Wireless sensor networks (WSNs) have many security-critical applications such as real-time traffic monitoring, wildfire tracking, or military surveillance. In a sensor network, thousands of low-cost sensor nodes collectively monitor an area within a certain range and report their own data to the base station which distributes a data query. However, this would incur high communication overhead which cannot be afforded by sensor nodes. Data aggregation [1, 2] mechanisms are proposed to reduce the power consumption. Data aggregation poses security threat; many secure data aggregation protocols [3, 4] have been emerging over these years, which prove to be secure and considerably improve the resource utilization.
Although data confidentiality could guarantee that legal parties obtain plain data without being leaked out to adversaries, it does not protect data from being altered [5–7]. In this paper, we focus on the problem of preserving data integrity through aggregation in sensor networks. Message authentication codes (MACs) are used in [8] to protect data integrity, while causing other problems, such as high communication overhead. In this paper, we present a provably secure sensor network integrity-preserving aggregation protocol based on the elliptic curve discrete logarithm problem for general networks with hierarchical aggregator topologies, assuming that adversaries are able to corrupt a (small) fraction of sensors. With the increasing of sensor node's computation capacity, public key cryptography, such as elliptic curve cryptosystems (ECC), is suitable for constrained environments such as WSN. In [9], authors propose secure data aggregation schemes using ECC to obtain data confidentiality and integrity in the data aggregation because of their smaller key size, faster computations, and reductions in processing power, storage space, and bandwidth. TinyECC is proposed by Liu and Ning [10] which provides ECC-based operations that can be flexibly configured and integrated into WSN applications.
An adversary can perform a variety of attacks. For example, a denial-of-service (DoS) attack can totally block the communication between sensor nodes and the base station. However, this attack is not concerned because it is detectable by the querier and solutions can be implemented to remedy this situation. In stealthy attack [4], the attacker's goal is to make the base station accept false aggregation results, which are significantly different from the true results determined by the measure values, while not being detected by the base station. Our goal is to prevent this kind of attack even when high-level aggregator is corrupted.
A number of protocols [11–13] have been proposed which focus on the problem that how can the base station obtain a good approximation of the aggregation result and how to obtain data integrity when a fraction of sensor nodes are compromised. One common sensor feature is the disproportionately high cost of transmitting information, as compared to performing local computation. For example, a Berkeley mote spends approximately the same amount of energy to compute 800 instructions as it does in sending a single bit of data. It thus becomes essential to reduce the number of bits forwarded by intermediate nodes, in order to extend the entire network's lifetime [14]. All the above schemes need to verify the integrity of aggregation result in an additional phase which consumes a lot of energy and causes network delay.
In this paper we propose EIPDAP, which can immediately verify the integrity of aggregation result after receiving the aggregation result and corresponding authentication information, hence significantly reducing energy consumption and communication delay which will be caused if the verification process is done through another query-and-forward phase.
The rest of the paper is organized as follows: in Section 2 we describe a survey of other approaches to integrity-preserving aggregation in sensor networks, in Section 3 more details about the problem we are trying to solve are discussed, in Section 4 we describe a new scheme that is, the centerpiece of our work, and in Section 5 the security properties and performance of our scheme are analyzed.
2. Related Work
There has been a number of works on preserving integrity in aggregation protocols for sensor networks. Many protocols have been proposed for the single-aggregator model [4, 13, 15]. But the aggregator in these schemes suffers from significantly high congestion and only reduces communications on the link between the aggregator and the base station. So this model is not scalable to large multihop sensor deployments.
Another significant work is introduced in [11]. The main idea of this approach is that each node sends its value, complement, and commitment up the aggregation tree and then a commitment would pass down the tree for a node to verify that if its value was added into the SUM aggregation and the complement of its data value was added into the COMPLEMENT aggregation. However, the scheme requires three phases. The delay aggregation strategy used in the second phase increases communication from
A secure hop-by-hop data aggregation protocol SDAP for sensor networks is proposed in [13]. The authors believe that we should be more concerned about high-level nodes, since these nodes represent a large portion of the final result delivered to base station and there would be more catastrophic consequences if they are compromised. Hence, SDAP dynamically partitions the topology tree into multiple logical groups of similar sizes using a probabilistic approach, following the divide-and-conquer principle. In this way, fewer nodes are located under a high-level sensor node in a logical subtree resulting in reduced potential security threat by a compromised high-level node. SDAP introduces probability and attestation to the data result-checking; the communication required per node is
Aggregate message authentication codes introduced by Katz and Lindell (CT-RSA 2008) [8] provided a new perspective of preserving integrity. In their construction, aggregating MAC simply computes the XOR of all the MACs into one value, the size of which is the same as an ordinary MAC. After receiving all the data and the final aggregate MAC, the base station uses secret keys shared with each node to compute a new aggregate MAC from these data and compares it with the received aggregate MAC. Although it remarkably reduces communication overhead we have seen in former protocols [11–13] and is easy to perform, it suffers from the “mix-and-match” attacks [16] in which the adversary can easily forge several types of aggregate combinations.
In [9], the authors proposed a new algorithm using homomorphic encryption and additive digital signatures to achieve confidentiality, integrity, and availability for in-network aggregation in WSN. However, the protocol cannot resist stealthy attack. We discuss concrete attack on the protocol due to Albath and Madria [9] in the appendix.
Besides, there have been several protocols designed for preserving the confidentiality of the aggregation results [17–19]. This issue is orthogonal to our work and is not considered in this paper.
3. Problem Model
This section contains the definitions of basic problems and includes discussion on the nodes' setup, the security infrastructure, and the attack model.
3.1. Network Assumptions
We assume a query-based sensor network with a large number of sensors and a powerful base station with transmission ranges covering the whole wireless sensor network can broadcast messages to all nodes directly. Before aggregation process, sensors will form a tree topology where base station locates at the root.
We further assume that the base station would broadcast an authenticated query before collecting data. If there is no aggregation tree, then an aggregation tree should be formed as the query has been sent to all nodes. Our protocol takes the structure of the aggregation tree as given. One method for constructing an aggregation tree is described in TaG [20].
Each node is sensing an integer value r that is in the range
3.2. Security Infrastructure
We assume that each node i has a unique identifier
3.3. Attack Model and Security Goals
We consider a setting with a polynomially bounded adversary, which can physically access the sensors and read their interval values. The adversary is also restricted to corrupt a (small) fraction of nodes including the aggregators.
Once the adversary compromises a sensor node, it can obtain all the node's secret keys. An adversary can modify, forge, or discard messages or simply transmit false aggregate results, and its goal is to forge valid aggregate result to be accepted by the base station. The higher false aggregate result level is, more catastrophic consequence will be caused.
In this setting, we focus on stealthy attacks [4] where the attacker's goal is to make the base station accept false aggregate results while not being detected by base station. And our security goal is to prevent stealthy attacks. In particular, we want to guarantee that once the aggregate result has been accepted by the base station, it is indeed the real result aggregated by honest nodes.
Definition 1 (integrity-preserving aggregation algorithm).
An aggregation algorithm is integrity-preserving if, by tampering with the aggregation process, an adversary is unable to induce the base station to accept any forged aggregate result.
Since if a sensor node is compromised, the adversary can obtain all its confidential information (e.g., cryptographic keys) and send false data without being detected. In this paper we will focus on the situation where an aggregator is compromised and see whether it can forge a valid aggregate result.
In this paper, however, we do not address the denial-of-service attack where the adversary prevents the querier from getting any aggregation result at all; because nodes' not responding queries clearly indicate that something is wrong and solutions can be implemented to remedy this situation.
4. Our Work
In this section, we present a new approach, especially aiming to preserve integrity of the aggregation result. We first give an overview of this approach and then present the details.
4.1. Overview
The design of our algorithm is based on the elliptic curve discrete logarithm problem. The overall algorithm consists of three main phases: query dissemination, aggregation-commit, and result-checking.
In query dissemination phase, the base station broadcasts the query to the network. An aggregation tree, or a directed spanning tree over the network topology with the base station at the root, is formed as the query sent to all the nodes, if one is not already present in the network. Then the path-keys and edge-key for each node encrypted with the secret key shared between base station and node are sent to the corresponding node. Path-key and edge-key are calculated by the base station according to the network topology. We show the detail of the calculation of the path-key in Section 4.2.
In aggregation-commit phase, each sensor node collects raw data and computes two corresponding tags before sending them to their own parent node in the aggregation tree. After receiving all the messages from all child nodes, aggregator performs modulo addition operations over the three items and forwards the result to high-level aggregators until the base station.
In the result-checking phase, the base station verifies the integrity of the SUM aggregation with two aggregation tags. Compared with Chan's and Keith's approach, ours does not require any dissemination from top root node down to the leaf nodes which causes congestion
4.2. The SUM Approach
4.2.1. Query Dissemination Phase
Before aggregation, the base station broadcasts an authenticated query to the network. The query request message contains a nonce N to prevent replay attack [1]. If there is no aggregation tree, an aggregation tree with the base station at the root will be formed as the query has been sent to all nodes. Then the tree information will be reported back to the base station. After the base station receives the tree information, it calculates the path-key for each node: for each aggregator or sensor node i, base station generates a key
For each sensor node i, base station also calculates two path-keys
If the path from base station to sensor node i is 1-2-3-i, then
Finally the base station with transmission ranges covering the whole wireless sensor network directly broadcasts to node i the path-keys and edge-keys encrypted with the secret key
4.2.2. Data Aggregation Phase
In the query dissemination phase, each node has already identified their parents and the base station has the overall view of the aggregation tree.
In Figure 1, take paths BS-10-8-3-1 and BS-10-8-3-2, for instance, as sensor node, nodes 3, 1, and 2 each has a message, that is, to be passed to their parents. And the message has the following format:

Aggregation phase. The nodes 1, 2, 3, 4, 5, 6, and 7 are sensor nodes, and the nodes 8, 9, and 10 are aggregators while node 3 works as both sensor node and aggregator. Without losing generality, we assume that every intermediate node is able to sense raw data and performs aggregation like node 3 does.
For nodes 1 and 2:
For aggregator/sensor node 3 with data
Aggregators 8 and 10 perform corresponding tasks:
4.2.3. Result-Checking Phase
The purpose of result-checking phase is to enable base station to verify that the integrity of SUM
Base station checks if
Since
Again, note that
5. Analysis
This section discusses the security and congestion complexity of EIPDAP.
5.1. Overview
Once a node has been compromised, it is under the full control of the adversary which can record and inject messages as will. We also assume that the adversary can only corrupt a (small) fraction of nodes including the aggregators. Also, we do not concern denial-of-service attack. The following is the proof for security of EIPDAP.
Definition 2 (sensor node inconsistency).
Let
Definition 3 (sensor node forgery).
An adversary eavesdropping on sensor node i successfully forges a new message
Since once a sensor node is compromised, the adversary can obtain all its confidential information (e.g., cryptographic keys) and send false data without being detected; however, we do not address this kind of forgery here.
Lemma 4.
Let the final SUM aggregation received by the base station be
Proof .
As the conclusion is obvious here, so we do not prove in detail.
Lemma 5.
If elliptic curve discrete logarithm problem is hard, then it is not possible to forge a valid message as an honest sensor node for all eavesdropping probabilistic, polynomial adversaries.
Proof.
Let
Say adversary is eavesdropping on node i. In order to forge a valid message
As
Calculating y from
Another concern rises when adversary keeps eavesdropping on sensor node i and records the messages sent by i. Assume that adversary has
For all
In conclusion, the probability of an adversary successfully forging a new message
Definition 6 (aggregator inconsistency).
Let
Definition 7 (compromised aggregator forgery).
An adversary which compromised a aggregator j successfully forges a new aggregate result
Lemma 8.
If elliptic curve discrete logarithm problem is hard, then it is not possible to forge a valid aggregate result for all probabilistic, polynomial adversaries even when a high-level aggregator is compromised.
Proof.
We assume that aggregator 10 has been compromised where an adversary is in complete control of node 10, obtaining all secret keys of node 10. Now an adversary attempts to forge SUM aggregation and two corresponding tags after eavesdropping several aggregations and records
For all
Case 1. Intuitively, adversary tries to obtain l,
Case 2. Adversary tries to compute
Case 3. Aggregator 10 also senses data. So adversary can compute
In all cases, adversary can only forge a new message
Theorem 9.
EIPDAP is integrity-preserving.
Proof.
From Lemmas 5 and 8, we know that EIPDAP is secure against sensor node forgery in the presence of an eavesdropper and aggregator forgery when an aggregator is compromised. Thus, EIPDAP is integrity-preserving, completing the proof.
5.2. Congestion Complexity
The computational and memory costs are likely to be insignificant compared to communication [3, 14]. Higher computation surely causes more energy, but a Berkeley mote spends approximately the same amount of energy to compute 800 instructions as it does in sending a single bit of data [13, 20] in WSN.
Unlike general hard problems, there is no sub-exponential algorithm is known to solve the elliptic curve discrete logarithm problem (ECDLP), meaning that smaller parameters can be used in ECC than in other systems like RSA and DSA but with equivalent level of security. Because of their smaller key size, faster computations and reductions in processing power, storage space, and bandwidth, ECC is ideal for WSN. Although the use of elliptic curve cryptography incurs higher computational overhead than symmetric-key cryptography, our protocol is mainly designed to save energy.
In query dissemination phase, the base station collects aggregation tree information and broadcasts edge keys and path keys directly to the corresponding nodes. Collecting aggregation tree information costs each edge
Edge congestion in the aggregation tree comparison, n is the number of the nodes, and
Node congestion in the aggregation tree comparison, Δ is the degree of the aggregation tree.
Aggregation tree congestion comparison.
By the comparison, we can conclude that EIPDAP has the minimum congestion and is much more energy efficient. Therefore it is much more suitable for power limited sensor networks.
6. Conclusion and Future Work
Protecting hierarchical data aggregation from losing integrity is a challenging problem in sensor networks. In this paper, we focus on the very problem of preserving data integrity and propose a novel approach to guarantee the integrity of aggregation result through aggregation in sensor networks. The main algorithm is based on performing modulo addition operation using ECC.
EIPDAP can immediately verify the integrity of aggregation result after receiving the aggregation result and corresponding authentication information, hence significantly reducing energy consumption and communication delay which will be caused if the verification phrase is done through another query-and-forward phase.
Compared with the other related schemes, our scheme reduces the communication required per node to
In the future, we will first further enrich EIPDAP in detail. Second, we will focus on the possibility of reducing the number of secret keys shared between sensor nodes and base station or the keys broadcast to all nodes. Third, based on the proposed algorithm, we may consider meeting other security requirements, like data confidentiality, source authentication, and availability.
We anticipate that our work provides new perspective on preserving integrity of hierarchical aggregation and encourages other researchers to consider this approach.
Footnotes
Appendix
Acknowledgments
The authors would like to thank the anonymous reviewers and their coworkers for their valuable comments and useful suggestions.
