Abstract
Data aggregation is an important method to reduce the energy consumption in wireless sensor networks (WSNs); however, it suffers from the security problems of data privacy and integrity. Existing solutions either have large communication and computation overheads or only produce inaccurate results. This paper proposes a novel secure data aggregation scheme based on homomorphic primitives in WSNs (abbreviated as SDA-HP). The scheme adopts a symmetric-key homomorphic encryption to protect data privacy and combines it with homomorphic MAC synchronically to check the aggregation data integrity. It compares the scheme with the previously known methods such as SIES, iPDA, and iCPDA in terms of the data privacy protection efficiency, integrity performance, computation overhead, communication overhead, and data aggregation accuracy. Simulation results and performance analysis show that our SDA-HP requires less communication and computation overheads than previously known methods and can effectively preserve data privacy, check data integrity, and achieve high data transmission efficiency and accurate data aggregation rate while consuming less energy to prolong network lifetime. To the best of our knowledge, this is the first work that provides both integrity and privacy based on homomorphic primitives.
1. Introduction
Currently, wireless sensor networks (WSNs) have many popular applications, such as real-time accident reporting, environment monitoring, and military investigation. In WSNs, sensors are deployed to gather different kinds of data within a certain range and send them to the base station (BS). Sensors are restricted by energy consumption due to battery supply and computational capacity; therefore, energy saving technologies must be considered. Data aggregation [1] is one of the important approaches to facilitate the utilization of WSNs. However, WSNs are often deployed in an open and hostile environment; the inherent characteristics of WSNs and data aggregation algorithms make WSNs data aggregation face many security and performance challenges, such as data integrity; privacy protection, and how to enhance the security and performance becomes the key issues for practical applications.
In recent years, some schemes [2–5] have been proposed focusing on guaranteeing the data privacy during data aggregation phase, but these do not protect the integrity of aggregation data sent to the BS. A compromised aggregator may arbitrarily forge aggregation data and let the BS accept them. Other schemes [6–8] are proposed to guarantee the data integrity, but these lead to the leakage of data privacy due to decryption at aggregator. In this paper, we aim to bridge this gap in secure data aggregation and focus on preserving data privacy and integrity simultaneously through data aggregation phase in WSNs.
Many innovative secure data aggregation schemes have been proposed, and a survey of these works is presented in the literature [1]. These solutions fall into two main categories: hop-by-hop and end-to-end secure data aggregation. The hop-by-hop schemes are vulnerable to attacks because the data will be decrypted on aggregators and often have to enhance the security by using the expensive encryption and decryption algorithms; thus, the communication and computation overheads are large. The end-to-end schemes seem more secure; the data are transparent to the aggregators which can aggregate and transfer the encrypted data without decrypting them, and the end-to-end privacy is achieved by using homomorphic encryption (HE). HE allows the ciphertext to be aggregated directly, then the decrypted aggregation result matches the result of aggregation operations performed on plaintext. HE has been widely used for data aggregation in WSNs [9, 10]. However, the existing HE schemes suffer from the data integrity issue.
Since the incoming packets have been aggregated by aggregators, data integrity cannot be checked by the conventional MACs method. Therefore, we propose a novel integrity protection scheme for data aggregation based on homomorphic MAC (HM). HM is an energy-efficient symmetric approach, and it is first designed to check the integrity of network coded data [11]. However, it cannot be used directly in WSNs due to its complicated set of parameters and steps. Therefore, we remove an unnecessary step and some parameters in the original scheme and then adapt it to protect the integrity of data aggregation for energy constraint WSNs, and combine it with the symmetric-key HE [12] to protect data privacy. Then, we originally propose a novel secure data aggregation scheme based on homomorphic primitives (SDA-HP). This scheme relies on the combination of a revised version of the HM and the new symmetric-key HE [12] synchronically for WSNs. To the best of our knowledge, our proposed method is the first work that addresses data aggregation supporting both integrity and confidentiality based on homomorphic primitives. The proposed scheme can significantly improve the transmission efficiency and energy efficiency, shorten the communication delay, and achieve accurate data aggregation.
The rest of this paper is organized as follows. Section 2 presents the existing approaches to privacy protection and integrity in WSNs. Section 3 discusses the background and assumptions about the problem we are trying to solve. Section 4 presents a new secure data aggregation scheme based on homomorphic primitives. Section 5 analyzes the security performance and experimental results to prove the effectiveness and efficiency of our scheme. Section 6 gives conclusions and some future work.
2. Related Work
To solve both integrity and privacy issues for data aggregation phase in WSNs, Ozdemir and Çam [13] propose an authentication protocol to integrate false data detection with data aggregation and confidentiality; the monitoring nodes of every aggregator also perform data aggregation and compute the MACs for data verification; then, the sensors between two consecutive aggregators verify the integrity of the encrypted data. However, it has some flaws: (i) topological constraints that require at least T nodes on the path between any two consecutive data aggregators, (ii) nonneighboring sensors which need to spend longer time to establish pairwise keys, which may incur attacks for adversary, and (iii) compromised nodes which are likely to obtain group key through group key establishment process, which causes the data privacy leakage.
He et al. present two new data aggregation schemes, named iPDA [14] and iCPDA [15], which piggyback on SMART [2] and CPDA [2] schemes respectively. iPDA achieves privacy protection through data slicing and assembling technique as SMART and achieves integrity through redundancy by constructing disjoint aggregation trees. In iCPDA protocol, cluster members can detect data pollution attacks through monitoring the cluster leaders, so iCPDA spends a little more message overhead to achieve data integrity. However, both schemes need much more communication and computation overheads.
Chan et al. [16] present the first provably secure hierarchical data aggregation scheme based on aggregation-commit-verify approach, which forces the adversary to commit to its choice of aggregation results and then allows the sensors to verify whether their aggregation contributions are correct or not. Although this secure aggregation scheme can be used for arbitrary topologies and multiple malicious nodes, the communication and computation overheads are still very large. Then, Frikken and Dougherty [17] improve Chan's scheme by reducing the maximum communication per node from
In order to mitigate the drawbacks of the hop-by-hop schemes, some end-to-end protocols are proposed. Chen et al. [18] propose a recoverable concealed data aggregation scheme for data integrity in WSNs, named RCDA. It integrates the aggregate signature scheme to ensure data integrity and authenticity and can recover all sensing data even these data which have been aggregated; however, it is not practical for large scale network deployment due to its high costs. Castelluccia et al. [19] present a simple and provably secure scheme based on an extension of the one-time pad encryption technique. This scheme allows efficient additive aggregation of encrypted data, and only one modular addition over modulo n is necessary for aggregation. The privacy and integrity of the scheme are based on the indistinguishability of a pseudorandom function, but its aggregation authentication scheme is only against outsider attacks. Papadopoulos et al. [20] present a scheme, named SIES, that provides both integrity and confidentiality through combination of homomorphic encryption and secret sharing. It can cover numerous aggregates and return exact results. Although this scheme only introduces a small amount of bandwidth consumption, the data transmission efficiency is low due to the oversize space of secret keys.
Garofalakis et al. [21] and Roy et al. [22] propose some lightweight secure data aggregation schemes which return approximate aggregation results securely based on synopsis diffusion approach. It can tolerate the malicious activities in the compromised nodes that contribute false subaggregate values. Though secure approximate aggregation is an interesting domain, it is orthogonal to our work.
3. Background and Assumptions
Section 3.1 describes in-network aggregation techniques. Section 3.2 provides some useful homomorphic primitive tools.
3.1. In-Network Aggregation
3.1.1. System Architecture
We assume a large number of sensors to form a query-based WSN, where a multihop routing protocol can be applied. Sensors will be organized as a tree topology where BS locates at the root, as shown in Figure 1. Each sensor acts as either a leaf sensor (L) or an aggregator (A), or both. Aggregation tree is constructed by using the TAG [23] protocol, and the BS is assumed to broadcast an authenticated query before aggregation. Each node collects environmental readings and an aggregator performs sum aggregation due to resource constrains. Without loss of generality, we focus on additive aggregation. It is not a too restrictive assumption, in respect of that it serves as the base of many other statistics aggregations, such as count, average, and variance.

Aggregation tree.
3.1.2. Attack Model
We assume there are some polynomially bounded adversary that can perform attacks to break the privacy and integrity of aggregation results.
In this paper, we focus on two types of attacks:
eavesdropping attacks: overhearing the sensors transmission data over its neighboring wireless links, the privacy of system can be compromised;
stealthy attacks: it is an intelligent attack to disinform a sensor network in a manner that to escape from attack discovery; that is, the BS accepts false sensors aggregation data.
Our goal is to propose a secure data aggregation scheme, which is robust against eavesdropping and stealthy attacks, efficient in keeping the additional overhead as small as possible, and effective in achieving accurate aggregation results.
3.2. Homomorphic Primitive Tools
3.2.1. Homomorphic Encryption
HE allows direct addition and multiplication of ciphertexts. Let
If
3.2.2. Homomorphic MAC
Homomorphic MAC [11] should satisfy the following properties.
Homomorphism: given (message, tag) pairs
Security against the chosen message attack: it is infeasible for the adversary to create a valid tag for an aggregated message even under a chosen message attack.
The modified homomorphic MAC includes three polynomial-time algorithms.
Sign algorithm:
Aggregation algorithm:
Verify algorithm:
Definition 1.
Let 𝒯 be a homomorphic MAC scheme and let Adv(
4. Secure Data Aggregation Based on Homomorphic Primitives
This section describes the details of SDA-HP scheme. The sequence flow diagram of SDA-HP consists of tree formatting phase, key generation phase, sign-encrypt phase, aggregation phase, and decrypt-verify phase, as shown in Figure 2.

Sequence flow diagram of SDA-HP.
4.1. Tree Formatting Phase
An aggregation tree is constructed according to the standard aggregation protocol TAG [23]; the formation method of the aggregation tree is illustrated as follows.
Step 1.
BS is appointed to be the root, which broadcasts a message requesting sensors to generate an aggregation tree. In that message, it contains its own ID and its level information
Step 2.
When receiving the message from BS, any sensor not in the aggregation tree should assign its own level
Step 3.
Each node of the aggregation tree rebroadcasts the message containing its own ID and level. When it receives the message, if any node has already been in the tree, it will reject the message, otherwise, the node also assigns its level
Step 4.
During the aggregation phase, each sensor listens for messages from its children and then computes a partial state record which is the aggregate of children values with its own sensor readings. Eventually, the partial state record is transmitted upward along the tree, until the complete aggregated result converges at the root.
4.2. Key Generation Phase
This scheme uses
4.3. Sign-Encrypt Phase
The encryption function is of the form:
To formally present our scheme, message
To generate and verify tags, there is one global MAC key that consists of
Sign-encrypt
Randomly generate
Consider
4.4. Aggregation Phase
Aggregate
4.5. Decrypt-Verify Phase
Decrypt-verify
Suppose
//Multiply each component with the corresponding
Use CRT to find
If
4.6. An Example of SDA-H
Let
Suppose there are two plaintexts in
Step 1.
Sign-encrypting
Decompose
Decompose
Step 2.
Aggregating
Step 3.
Decrypt verifying:
Using CRT, the two components of the result m are
For computing homomorphic MAC, as we know from the above:
Hence; the result m is accepted.
Actually, we may set
In SDA-HP, for the convenience of performance analysis, we let

Data format of
5. Simulation and Analysis
In this section, we evaluate performances of SDA-HP scheme in terms of security properties, power analysis, and aggregation accuracy. The simulation is conducted in TOSSIM system operated by TinyOS 2.0. Table 1 shows the parameters setting in the simulation, and Figure 4 depicts the topology of nodes, where the BS coordinate is (200,200).
Simulation parameters.

Nodes distribution.
5.1. Privacy Protection
Secure to ciphertext-only attacks: SDA-HP is secure as long as factoring integer composite of large primes is difficult. In addition, test for encrypted zero in SDA-HP is unlikely according to the literature [12]. It is hard for an adversary to breach all encryption keys or find the plaintext
We denote a as the probability that one key is broken and

In the scheme, we proposed to limit d to a value in the range of 2–4 to balance the security protection and energy consumption.
The scheme SDA-HP uses homomorphic encryption to achieve end-to-end data confidentiality, so addition and scalar multiplication of the ciphertexts are just component-wise vector addition and scalar multiplication of the corresponding d-tuples. Even if the aggregation data is disclosed, the adversary can only get the aggregation result but not sensor data.
5.2. Integrity Performance
Definition 2.
Assuming F and G are secure,
Theorem 3.
Assuming F and G are secure, the homomorphic MAC scheme is secure for any fixed b and n. Also, for all homomorphic MAC scheme 𝒯 adversaries 𝒜, there are the F adversary
Proof.
We use three games to prove the theorem. Supposing
Game 1 is the same to Attack Game 1 [11] applied to the scheme 𝒯. So there is
In Game 2, if we use a truly random string instead of the output of G in the scheme 𝒯, that is, the challenger computers
In Game 3, if we use a truly random string instead of F in the scheme 𝒯, that is, the challenger computers
The challenger in Game 3 works as follows:
The adversary submits MAC queries
For
send
Then, the adversary 𝒜 outputs
If
set
else for
Let
Let
All the above equations prove the theorem.
5.3. Computation Overhead
We evaluate the computation overhead of SDA-HP for the Micaz Mote with an Atmega 128 CPU and compare it with the known algorithms such as SIES, iPDA and iCPDA. Both SDA-HP and SIES adopt symmetric-key homomorphic encryption, and both iPDA and iCPDA use simple hop-by-hop encryption with RC5. Therefore, the computational overhead can be determined by the following equation:
where Enc and Dec are the energy consumptions for one encryption and decryption process of 10 bits value, and Cal represents the energy consumption of one computational operation.
In SIES, it only needs some basic procedures, which involve three HMAC operations, one modular addition and one modular multiplication for every sensor.
In SDA-HP, we suggest that the keys and the parameters generation phase can be performed by the manufacturer before the WSN is deployed; furthermore, the pseudorandom generator is performed using AES regularly, so the energy consumption for the preconfiguration and pseudorandom function are not considered here. The computational overhead for an encryption and integrity protection at the sensor nodes depends on the choice of d. We illustrate the number of major operations with different d for every sensor in Table 2. Apparently, the computation and communication overheads increase with the increasing of d. Considering the security performance and the data overhead, we propose to use a moderate value of d, which is
Computation effort for SDA-HP.
However, iPDA and iCPDA involve generating an excessive number of sketches and performing some hop-by-hop RC5 encryptions and decryptions. iPDA requires in total six encryptions and decryptions and seven computational operations. iCPDA includes two encryptions and decryptions and seventeen computational operations for each leaf sensor.
To determine energy consumptions and computational time of encryption operations for each sensor, we use the cost functions for common operations listed by Groat et al. [27], which are given in Tables 3 and 4. Table 3 depicts the costs of the transmission and reception of 1 bit of data and computes for 1 CPU clock cycle. Table 4 shows time spent to encrypt and decrypt 10 bits of data on the TelosB and MICAz architectures with RC5, RC4, and IDEA.
Energy consumption of common operations [27].
Cost to encrypt/decrypt 10 bits of data [27].
In Figure 6, we assess the computational time for each leaf sensor. In order to control the relative error within 10%, we ran every experiment over 25 times and reported the average cost value. The computational time in SIES and SDA-HP is in the order of few milliseconds, which outperforms iPDA and iCPDA by approximately three orders of magnitude, since the computation-expensive RC5 encryptions and decryptions are performed in iPDA and iCPDA during sending or receiving data slice. Furthermore, the computational cost of SIES, iPDA and iCPDA is independent of d. Therefore, the comparison result for computation overhead is SIES < SDA-HP < iPDA

Computational time comparison.
5.4. Communication Overhead
We compare the communication overhead among SDA-HP, SIES, iPDA, and iCPDA. The above four schemes adopt TAG to construct the aggregation tree. The data packet size for SDA-HP, iPDA, and iCPDA is 30 bytes, and the packet size for SIES is 32 bytes. In order to make the comparison fairly, we evaluate the communication overhead through the sum of sending bytes of all sensors during one aggregation tree construction and aggregation phase for these four schemes. In order to reduce the relative error, we run every experiment over 25 times and record the average value. Figure 7 shows that the communication overhead for the four schemes is independent of epoch duration, and the comparison result is SDA-HP < SIES < iCPDA

Communication overhead comparison.
The theoretical analysis is as the following.
SDA-HP uses the homomorphic encryption to provide data confidentiality preserving and homomorphic MAC to check the aggregation data integrity. The number of data packets sent from every sensor is the same as the TAG, including one “hello” packet and one aggregation packet. Accordingly, the number of packets sent from all sensors is
SIES adopts the homomorphic encryption to protect data confidentiality and attaches the secret sharing in message to provide data integrity. The number of data packets sent from every sensor is the same as SDA-HP. Hence the number of packets sent from all sensors is also
iPDA is built on data slicing techniques to achieve both data confidentiality and integrity. The number of data slicing
iCPDA adopts secure multiparty computation method to protect data privacy and uses the sensor surveillance method to achieve integrity. Figure 8 shows the data packet interaction phase. Each member sensor needs to send three packets, and every cluster leader should send four packets, moreover, in order to check integrity, the cluster leaders also need to send the aggregation result of sublevel leaders and themselves to the neighbor monitors; hence, the number of packets sent from leaders amounts to six. Accordingly, the number of packets sent from all sensors is

The message interaction in cluster of iCPDA.
5.5. Data Accuracy
In WSNs, message may be delayed and even dropped due to the processing time and collisions over wireless channels, so the aggregation accuracy is one of important performance metrics. Here, we define the accuracy metric as the ratio of the actual aggregation result collected by BS to the sum of the data sent by the all individual sensors.
Figure 9 shows the accuracy for SDA-HP, SIES, iPDA

Accuracy comparison for SDA-HP, SIES, iPDA
6. Conclusion
Protecting data privacy and integrity during data aggregation at the same time is challenging in hierarchical WSNs. We originally present SDA-HP, a novel and efficient secure data aggregation scheme based on the homomorphic encryption and a revised version of the homomorphic MAC. Both techniques are lightweight, and they require very few energy consumptions. The proposed scheme can achieve high data transmission efficiency and accurate data aggregation. These features make SDA-HP very suitable to be applied for resource-constrained WSNs. Subsequently, we prove the security performance and give an example to illustrate how SDA-HP can be adopted to handle data aggregation, while protecting data privacy and integrity. Finally, simulations are used to compare our scheme with several known schemes in terms of the computation overhead, communication overhead, and accuracy. The numerical results show that SDA-HP is superior to other approaches in terms of security, complexity, and accuracy.
At present, SDA-HP is applied to the secure aggregation scheme for SUM queries only. Further research will be to design a secure data aggregation scheme which can cover a wide range of exact aggregate queries.
Footnotes
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
Acknowledgments
This work is supported by the National Research Project of China under Grant no. 2012A138, National Basic Research Program (973 Program) of China under Grant nos. 2011CB302903, the National Natural Science Foundation of China under Grant nos. 61272084, 61302158, and 61300240, the Key Project of the Natural Science Research of Jiangsu University under Grant no. 11KJA520002, the Key Project of the Natural Science Research of Anhui University under Grant no. KJ2011ZD06, the Research Innovation Program for College Graduates of Jiangsu Province under Grant no. CXZZ12_0477, and the Natural Science Foundation of Chuzhou University under Grant no. 2012kj003Z.
