Abstract
Due to the characteristics of resource-constrained and battery-powered sensors in wireless sensor networks (WSNs), energy consumption is always a major concern. Data aggregation is an essential technique to reduce the communication overhead and energy consumption. Since many applications require data privacy, we need to take security into consideration. In this paper, we propose an energy-efficient, secure, highly accurate, and scalable scheme for data aggregation (EESSDA). The main idea of EESSDA is that secure data aggregation is achieved by establishing secure channel and slicing technology. The EESSDA scheme does not need encryption and decryption operations during the data aggregation, which saves energy and obtain high accuracy of aggregation results. Meanwhile, in EESSDA scheme, the advanced deployment of shared information between nodes is not required, making the networks with good scalability. Our analysis and simulations show that EESSDA is of lower communication overhead, more efficiency and accuracy, and better privacy preservation and scalability than existing schemes.
1. Introduction
Wireless sensor networks (WSNs) are composed of a large number of sensor nodes to cooperatively monitor physical or environmental conditions, such as temperature, humidity, or noise, at different locations. WSNs have become increasingly popular in many military and civilian applications [1–3], for example, in the military field, identifying and locating targets for potential attacks through WSNs and in civilian field, tracking a patient's blood pressure, blood sugar, heart rate, and so forth. via wearable sensors to monitor the patient's health.
Sensor nodes are usually constrained in energy, communication, storage, and computation capability, especially the ones powered by batteries which cannot be replaced optionally. Therefore, it is requisite for WSNs to save energy and increase network lifetime. In [4], a node consumes approximately the same amount of energy to compute 800 instructions as it does in sending a single bit of data. Hence, reducing the amount of traffic is a crucial way to save energy. WSNs usually generate large amounts of raw data in which there exists high redundancy. So, it is important to develop efficient data processing technique to reduce redundant data and the amount of transmission. Data aggregation [5–10] is an efficient method to eliminate data redundancy and save energy. However, data are transmitted by multihop and wireless in WSNs, which makes the transmission of data be captured and eavesdropped easily by a malicious attacker. In many applications, WSNs encounter some serious security problems, so the scheme of data aggregation not only optimizes raw data and reduces the amount of transmission for network, but also keeps the network at a high level of security.
Generally, security requirements of the privacy-preserving data aggregation scheme can be satisfied using encryption technology. Privacy-preserving data aggregation scheme is classified into two types: hop-by-hop and end-to-end. In hop-by-hop fashion, aggregator nodes must decrypt all sensor data they receive, aggregate the sensor data according to the corresponding aggregation function, and encrypt the aggregation result before sending it to next hop node [11–14]. End-to-end privacy-preserving data aggregation scheme performs data aggregation through homomorphic encryption technology. One intermediate (cluster head) node receives the ciphertexts from leaf (cluster) nodes and then aggregates them with its own encrypted sensor data; the result will finally be sent to a next node [15, 16]. Obviously, the above privacy-preserving data aggregation scheme will cause great latency and energy consumption because of the decryption/encryption process. In [17], a new privacy-preserving data aggregation protocol was proposed. Sink shares a random number (key) with each sensor node. Then each sensor node simply adds its data up with the random number and gets a pseudodata which will be aggregated along the aggregation tree to the Sink. Knowing all the shared numbers, Sink can get the real aggregation results with subtraction. However, a portion of the sensor nodes may not participate in the data aggregation due to collisions, which is hard to be tracked by Sink. In that case, Sink still subtracts all the shared numbers from a pseudoaggregation result, which might yield the aggregation results with large deviations.
In this paper, we propose a secure, energy-efficient, scalable, and highly accurate scheme for data aggregation (EESSDA). In EESSDA, a secure channel is established between each sensor and its neighbor (i.e., the two sensors share a common random number) for transmitting message without encrypting private data. Considering that the leaf nodes' data will be disclosed to intermediate nodes, in EESSDA, a technology similar to SMART in [11] (each node slices its private data randomly into J pieces, one kept for itself and the remaining encrypted and sent to its neighboring nodes) is adopted. Different from SMART, our method can overcome the above mentioned disclosure just using the leaf nodes' data and will consume much less amount of traffic because only the leaf nodes need to decompose their data into slices, and intermediate nodes only need to send one message for data aggregation. In conclusion, EESSDA requires no encryption/decryption operations and reduces the amount of traffic. Therefore, EESSDA has high accuracy of the aggregation results, since it involves less collision and no case like Sink subtracting the random number of the failed node appeared in [17]. In addition, because EESSDA does not require deploying shared common information between nodes in advance, the network has good scalability, which is essential for the cheap sensor with easy loss. Theoretical analysis and simulation results demonstrate that EESSDA exhibits an excellent performance in terms of security, energy efficiency, accuracy, and scalability.
Our contributions in this paper are as follows. (1) Privacy: EESSDA provides end-to-end data confidentiality by the use of secure channel and “slicing and assembling” technology on leaf nodes. (2) Energy efficiency: EESSDA does not require the encryption/decryption in processing of data aggregation, which economizes on energy consumption and latency. On the other hand, only the leaf nodes need to process “slicing and assembling,” so EESSDA greatly reduces the amount of traffic and consumption of energy. (3) Accuracy: EESSDA reduces the amount of traffic and the latency of time and does not need encryption/decryption, which can improve the accuracy of the aggregation because data packets have less chance to collide. (4) Scalability: in EESSDA scheme, each node only needs to predistribute k keys randomly drawn from the key-pool, making the network have good scalability and more suitable for the dynamic network.
The rest of the paper is organized as follows. In Section 2, we overview some related works on secure data aggregation. Section 3 introduces the network model and design goals. In Section 4, we give the detailed descriptions of our scheme EESSDA and analysis of its scalability. Section 5 evaluates and simulates the proposed schemes of EESSDA. Finally, we summarize our conclusions in Section 6.
2. Related Work
In typical WSNs, sensor nodes are usually resource-constrained and battery-limited. There has been extensive work on data aggregation schemes to increase the lifetime of WSNs by reducing the amount of traffic and resource consumption. However, these aggregation schemes have been designed without security in mind. In practice, WSNs may be deployed in an untrusted environment in many applications, such as battlefield, where an adversary may compromise nodes and reveal sensitive information. Hence, privacy-preserving is a key technology to extend the application of WSNs. The secure data aggregation is becoming a new hot research topic in WSNs [11–16, 18].
Several secure data aggregation schemes have been proposed based on hop-by-hop encryption mechanism. In [12], SDAP was proposed based on the principle of divide-and-conquer and commitment-and-attest. First, SDAP dynamically partitions the topology tree into multiple logical groups (subtrees) of similar sizes using a probabilistic approach. A commitment-based hop-by-hop aggregation is performed in each group to generate a group of aggregation results, which is the criteria for the base station to determine whether the group is suspicious. In [11], the authors proposed two privacy-preserving data aggregation schemes, CPDA and SMART, for additive aggregation. The CPDA scheme leverages algebraic properties of polynomials to calculate the desired aggregate value. The SMART scheme builds on slicing techniques and the associative property of addition. In [13], the iPDA scheme was proposed to improve the integrity of the data based on the SMART scheme. In iPDA, data privacy is achieved through data slicing and assembling technique; and data integrity is achieved through redundancy by constructing two disjoint aggregation trees to collect data of interests. However, the iPDA has high communication overhead and low aggregation accuracy due to the slicing technology and each sensor node has to send its data to both aggregation trees. EEHA [14] preserves data privacy like SMART scheme, in which the nodes are divided into leaf nodes and intermediate nodes. In EEHA, only leaf nodes utilize slicing and assembling technology to preserve data privacy, and intermediate nodes only aggregate their private data, data pieces received from leaf nodes and data from child nodes into a new aggregated data to protect the privacy of its private data. Hence, compared with SMART scheme, EEHA scheme has less communication and higher data accuracy.
The schemes in [15, 16] utilize privacy homomorphic encryption to allow aggregation of encrypted data. In CDA [15], each sensor node splits its data into d parts
Besides, in KIPDA [18] scheme, the authors proposed a noncryptographic method which obfuscates data by hiding them among a set of camouflage values, enabling k-indistinguishability for data aggregation. KIPDA defines a message set consisting of the actual data and camouflage values for MIN/MAX aggregation. The message set is an array of values, where the actual data and camouflage values are assigned cleverly to specific positions in the array according to predefined policies that guarantee 100% accuracy of the aggregation, while the attacker cannot distinguish between the actual data and camouflage values. Because the data are not encrypted, it is easily and efficiently aggregated with minimal in-network processing delay, but the level of privacy is relatively low.
3. System Model
3.1. Network Model
WSNs are composed of a large number of resource-constrained sensors, equipped with nonrechargeable batteries. We use the tree structure to organize sensors to perform the task of data aggregation, as shown in Figure 1. There are three types of nodes in the sensor network: the Sink, intermediate nodes, and leaf nodes. The Sink is the node where aggregation result is destined. The intermediate nodes serve as aggregator nodes, which are responsible for forwarding queries, aggregating the received data and its own data, and sending the aggregation to their parent nodes. The leaf nodes utilize the “slicing and assembling” technique to protect data privacy by decomposing their private data into pieces, sending the pieces to neighboring nodes, and assembling their piece and the pieces they received to get new results which will be sent to their parent. Typical aggregation functions include SUM, AVERAGE, COUNT, MAX, and MIN. We focus on additive aggregation functions because all the typical aggregation functions can be reduced to the additive aggregation function SUM [17].

Network model for data aggregation.
3.2. Attack Model and Design Goals
3.2.1. Attack Model
A malicious attacker can launch a variety of attacks to undermine the data security. We consider the following two cases. (1) Eavesdropping attack: Because WSNs transmit message through wireless communication, the attacker can overhear the transmission to obtain private information. Eavesdropping attack is the most common and easiest form of attack on data privacy, which is the focus of this paper. We assume the attacker can eavesdrop on the entire network. (2) Compromising sensor nodes: After compromising one or more sensor nodes, an adversary can obtain its data and keys and perform the following attacks. Firstly, an adversary use the keys obtained from compromised nodes to decrypt the ciphertext of private data sent by other node(s). Secondly, an adversary utilizes several colluding compromised nodes to collect and infer private data of other node(s).
3.2.2. Design Goals
The main goal of secure data aggregation scheme is to maintain data privacy for each node in the WSNs. Meanwhile, the scheme must consider the performance of efficiency, accuracy, and scalability. Therefore, a desired secure data aggregation should meet the following criteria.
Privacy: to broaden the area of WSNs' applications, data aggregation must guarantee the privacy of data. Each node should only know its own data. However, the wireless link is vulnerable to eavesdropping by attackers to reveal private data. Furthermore, some compromised nodes may collude to uncover the private data of other nodes. A good secure data aggregation scheme should be robust to such attacks. Efficiency: the goal of data aggregation is to reduce the amount of messages transmitted within the WSNs by using in-network processing to eliminate redundant data and thus reducing resource and energy usage. To protect the privacy of data, additional overhead is unavoidable in secure data aggregation. However, a good private data aggregation scheme should keep the overhead as little as possible. Accuracy: data aggregation results may be used to make critical decisions in the WSNs, so the accuracy of the aggregation results must be guaranteed during the process of data aggregation. Therefore, accuracy should be a crucial criterion to estimate the performance of secure data aggregation schemes. Scalability: the cheap sensor nodes are prone to fail, which makes WSNs dynamic in network topology. When some nodes fail or new nodes are deployed, it is very necessary for the secure data aggregation scheme to continue to be implemented correctly. A good secure data aggregation scheme needs to have easy scalability.
3.3. Key Setup for Security Channel
Neighboring nodes establish a secure channel with encryption technology. In this paper, key management adopts a random key distribution mechanism proposed in [19]. The key distribution consists of three phases. (1) Key predistribution: a large key-pool of K keys and their corresponding identities are generated. Each node within the WSNs randomly selected k keys from the key-pool. These k keys form a key ring for a sensor node. (2) Shared-key discovery: each sensor node finds out which neighbors share a common key with itself by exchanging discovery messages. If two neighboring nodes share a common key then there is a secure link between them. (3) Path-key establishment: if two neighboring nodes do not share a common key, their secure link is established by two or more multihop.
In the random key distribution mechanism, the probability that any pair of neighboring nodes possess at least one common key is
4. Energy-Efficient and Scalable Secure Data Aggregation (EESSDA) Scheme
In this section, we present the detail of our proposed secure data aggregation scheme which is energy-efficient, scalable, and highly accurate. The EESSDA scheme consists of five steps: (1) aggregation tree construction; (2) secure channel establishment; (3) slicing; (4) assembling and mixing; and (5) aggregation. Because of the dynamic nature of WSNs, this section also describes how to deploy new nodes or handle failed nodes.
4.1. Secure Data Aggregation
The scheme consists of five steps, whose detailed procedures are listed as follows.
Step 1 (aggregation tree construction).
A common technique for data aggregation is constructing an aggregation tree. There are various methods for building an aggregation tree. We construct the aggregation tree using the method described in TAG [10]. The network is organized as a tree rooted at the Sink node, and each sensor node has a shortest routing path to the Sink. Meanwhile, all parent-child nodes at least share a common key by setting conditions of path selection during constructing aggregation tree, as shown in Figure 2.

Aggregation tree construction.
Step 2 (secure channel establishment).
Aggregation tree is composed of intermediate nodes and leaf nodes.
Intermediate nodes: each intermediate node establishes a secure channel with its parent or child node; that is, every pair of parent-child nodes shares a common secret random number. For example, node Leaf nodes: each leaf node establishes a secure channel with its parent node. In addition, each leaf node establishes secure channel with its neighbors or nodes within h-hop which at least share a common key with it.
After the establishment of secure channel, sensor node transmits data (including sensing data, aggregate results, slice) through secure channel. The node adds data up with the random number (secure channel) and then send the pseudodata to the destination node. The destination node gets the real data after subtracting the random number. For example,
Step 3 (slicing).
Because the leaf nodes contain only its own data, each leaf node ensures the confidentiality of its data by slicing data into pieces before sending data to its parent node. We adopt the slicing technique similar to that of the SMART [11]. Each leaf node
Figure 3 describes the slicing step, where one of the J pieces is kept at node

Slicing.
Step 4 (assembling and mixing).
First, all nodes wait for certain time
Step 5 (aggregation).
After a leaf node mixes up the received slices to get a new result, it sends the new result to its parent through secure channel. The intermediate nodes receive new results
For example,
(1) Receiving Child nodes: Leaf nodes slices: (2) Aggregation (3) Transmission
4.2. Aggregation Algorithm
The pseudocode of EESSDA for every node is described in Algorithm 2.
(1) Construct an aggregation tree on top of TAG; (2) Ensure that all parent-child nodes share a common key; (3) Set waiting time Δt; (4) Establish secure channel; (5) foreach leaf node (6) perform slicing operation and wait; (7) perform mixing operation and get new result (8) send (9) end for (10) foreach intermediate node (11) if receives slice from leaf node then (12) perform mixing operation (13) end if (14) if receives all child nodes data or time elapsed then (15) perform aggregate operation and send aggregation result (16) end if (17) end for
We propose an energy-efficient and scalable secure data aggregation algorithm, described in Algorithm 2. It basically is composed of three phases. The first phase (lines 1–4) is the predeployment stage, including construction of aggregation tree (line 1) and establishment of secure channel (lines 2–4). The second phase (lines 5–9) is slicing-mixing operation, in which we enumerate all leaf nodes by one loop. Each leaf node slices its primitive data (line 6), mixes (line 7) all the received slices (include itself slice), and sends mixing result
4.3. Scalability
Because of the dynamic nature of WSNs, the network may need to deploy new nodes. In existing aggregation or query schemes, when a new node is deployed, the network needs to distribute some shared information between the new node and Sink/parent node/root node of subtree in advance. The network expansion is very difficult. On the other hand, when there are some failed nodes in WSNs, aggregation scheme needs to ensure that the network still performs aggregation correctly. So our proposed scheme EESSDA has good scalability, which can be described in detail as follows.
Deploying new nodes. When it is deployed into the network, a new node Failed nodes. When the parent of node
5. Simulation and Performance Analysis
In this section, we evaluate the performance of EESSDA through theoretical analysis and simulation study, including communication overhead, computation overhead, energy efficiency, accuracy, and privacy-preservation. Based on the simulator of WSNs in [20], we use C# and MATLAB to implement a simulator in order to simulate executing EESSDA and SMART schemes. We implemented these two schemes using a real world data set from Intel Lab Data [21] to compare their performance of communication overhead, computation overhead, energy efficiency, accuracy, privacy-preservation and so on.
5.1. Simulation Setting
The simulation runs on a PC with Core i3-3220CPU, 4G memory, and Win 7 OS. We assume networks with 400 sensor nodes. These nodes are randomly deployed over a 400 × 400 m2 area. The transmission range of a sensor node is 50 m and data rate is 1 Mbps. According to [18], as far as TelosB Mote is concerned, the energy used to transmit and receive 1 bit of data are
5.2. Communication Overhead and Energy Consumption
5.2.1. Communication Overhead
The communication overhead of EESSDA consists of two parts:
In EESSDA, the behavior of leaf nodes is not similar with that of the intermediate nodes. (1) Intermediate node: it receives all data form its child nodes or leaf nodes and then performs an aggregation operation with itself data to get a new aggregation result and sends the result to its parent. We suppose the reading at node is in range
In our experiments, we implemented EESSDA and SMART on the same already constructed aggregation tree. In SMART, each node needs to send J messages for secure data aggregation (

Comparison of EESSDA and SMART (

Communication overhead with respect to

Communication overhead with respect to
5.2.2. Energy Consumption
Energy consumption involves two aspects: communication and computation. Computation involves encryption/decryption operations and modular arithmetic operations. Encryption/decryption is much more energy consuming than modular arithmetic. Therefore, we only consider the cost of encryption/decryption computation. Figure 7 shows that EESSDA saves 45% energy compared with SMART. In EESSDA, encryption/decryption computations only occur during the secure channel establishment step. Each secure channel needs to perform encryption and decryption computation once; that is,

Energy consumption comparison for EESSDA and SMART

Energy consumption with respect to

Energy consumption with respect to
5.3. Privacy
To preserve the privacy of data during data aggregation, the primitive data produced by the sensor nodes must not be disclosed to the neighbor nodes or attackers. To address privacy, SMART adopts the “slicing and assembling” and encryption technique, in which nodes divide their primitive data into several pieces, send encrypted data pieces to neighbor nodes, and aggregate the received data from its child nodes or neighbor nodes, and then routes the aggregated result to the Sink. Our EESSDA scheme utilizes “secure channel” to ensure that the data will not be disclosed to other nodes or attackers. In EESSDA, the schemes used to ensure data privacy are different for leaf nodes and intermediate nodes. For leaf nodes, we utilize the “slicing and assembling” technique mentioned above. For intermediate nodes, it aggregates the received data and its own data to conceal the primitive data. So the data produced by the intermediate nodes are not disclosed to their parent node. First, we evaluate the privacy of secure channel. And then we analyze the privacy of leaf nodes and intermediate nodes, respectively.
Secure channel: a secure channel is a common random number which is shared by two nodes. There are two situations under which random number is revealed. (1) A compromised neighbor node holds a communication key and is able to decrypt the random number. From [11], we can see that the probability that the node has the communication key by its key ring is Leaf node: leaf node Intermediate node: the intermediate node where
Figure 10 compares the privacy preservation performance for EESSDA and SMART

Privacy preservation comparisons for EESSDA and SMART

Privacy preservation with respect to
5.4. Aggregation Accuracy
In ideal situations when there is no data loss in the network, the scheme should get 100% accurate aggregation results. However, due to collisions over wireless channels, data processing delays and then data may get lost or delayed. So the aggregation accuracy may be lower than it is in the ideal situation. We define the accuracy metric as the ratio between the aggregation result and real result of all individual sensor nodes. This paper focuses on additive aggregation function, we thus have
Figure 12 shows the accuracy of EESSDA and SMART

Accuracy comparison of EESSDA and SMART

Accuracy of EESSDA with respect to J.
6. Conclusions
Providing efficient and privacy-preserving, data aggregation is a challenging problem in WSNs. We propose EESSDA scheme for secure data aggregation in WSNs. Different from general data aggregation that preserves data privacy by encryption technology, EESSDA achieves data privacy through secure channel. Because EESSDA does not need encryption/decryption operations during the data aggregation, it saves much energy for encryption/decryption operations, reduces the time of data processing, and consequently leads to improvement of aggregation accuracy. In addition, EESSDA can ensure that the network aggregates correctly when deploying new nodes or having few failed nodes. So EESSDA scheme has good scalability. We compare the performance of EESSDA and SMART, simulation results show that EESSDA scheme decreases 20% or more communication overhead and 40% energy consumption compared with SMART. And our scheme provides higher aggregation accuracy and scalability than SMART scheme.
Footnotes
Acknowledgments
This work is supported by the National Natural Science Foundation of China (61373015, 61370050), the Research Fund for the Doctoral Program of High Education of China (no. 20103218110017), a project funded by the Priority Academic Program Development of Jiangsu Higher Education Institutions (PAPD), and the Fundamental Research Funds for the Central Universities, NUAA (nos. NP2013307 and NZ2013306).
