Abstract
Data aggregation techniques have been widely used in wireless sensor networks (WSNs) to solve the energy constraint problems of sensor nodes. They can conserve the significant amount of energy by reducing data packet transmission costs. However, many data aggregation applications require privacy and integrity protection of the real data while transmitting data from the sensing nodes to a sink node. The existing schemes for supporting both privacy and integrity, that is, iCDPA, and iPDA, suffer from high communication cost, high computation cost, and data propagation delay. To resolve the problems, we propose a signature-based data security technique for protecting sensitive data aggregation in WSNs. To support privacy-preserving data aggregation and integrity checking, our technique makes use of the additive property of complex numbers. Out of two parts of a complex number, the real part is used to hide the sampled data of a sensor node from its neighboring nodes and adversaries, whereas the imaginary part is used for data integrity checking at both data aggregators and the sink node. Through a performance analysis, we prove that our privacy-preserving data aggregation scheme outperforms the existing schemes up to 50% in terms of communication and computation overheads as well as up to 3 times in terms of integrity checking and data propagation delay.
1. Introduction
Wireless sensor networks (WSNs) have been widely studied in ubiquitous computing environment. The WSNs can be applied to various types of applications, such as environment management and military monitoring [1–4]. However, the sensor nodes that form WSNs have resource constraints such as limited power, slow processor, and less memory. For these reasons, it is essential to improve the energy efficiency of sensor nodes (or WSN) in order to enhance the quality of application service [5–10]. The first issue of WSNs is to reduce energy consumption in WSNs. Because the amount of energy consumption for communication is the greatest, it is important to reduce communication overhead. For reducing communication cost, transmitting the required and partially processed data is more meaningful than sending a large amount of raw data. In general, sending raw data causes the energy consumption of sensor nodes because duplicated messages are sent to the same node, called implosion, as well as neighboring nodes receive the duplicated messages if two nodes share the same observing region, called overlapping. In recent years, data aggregation has been actively used to combine data coming from many sensor nodes. An extension of this approach is in-network aggregation which aggregates data progressively as data are passed through the network [11–14]. In-network data aggregation can reduce the number of data transmissions and the number of nodes involved in gathering data from a WSN.
The second issue of WSNs is how to preserve sensitive measurements where data privacy becomes an important aspect from an adversary [15]. In many scenarios, the confidentiality of transported data can be considered critical. For instance, data from sensors might measure patients’ health information such as heartbeat and blood pressure details. In addition, a future application might measure household details such as power and water usage, thus computing average trends and making local recommendations. Since sensitive data is transported wirelessly among sensor nodes, it is typically prone to interception and eavesdropping. It is mandatory to maintain the data privacy of sensor nodes even from other trusted participating sensor nodes of the WSNs. As a result, even though private data are overheard and decrypted by adversaries, it is necessary to prevent recovering the sensitive information of a sensor node [16–18].
The last issue of WSNs is data integrity [19–21]. In communication, data integrity is simply defined as maintaining consistency and correctness of messages (message without modification by adversaries). In other words, it is ensured that the received data is not altered in transit either by an adversary or by noise in the data collecting node, that is, sink node. Data pollution due to the noise is an unintentional process and it can be handled by using some existing mechanisms like cyclic redundancy checking (CRC). Hence, the integrity checking due to the unintentional data pollution is out of the scope of this research. On the other hand, the mechanisms like CRC are unable to cope with the intentional data pollution by an adversary because the adversary can generate the same CRC of the source node after modifying the data. As data aggregation result is used for making critical decisions, the aggregation result must be verified before accepting it. For this reason, it is required to design a data protocol for WSNs which can ensure that the aggregated result has not been polluted (manipulation of data by an adversary) on the way to the sink node.
Since data privacy and integrity protection processes consume a significant amount of precious resource (i.e., limited power) of sensor nodes, they shorten the lifetime of the WSNs. Therefore, it is necessary to devise a light-weight technique, which can achieve data privacy and integrity protection efficiently. However, the existing work needs much resource consumption of sensor nodes due to generating unnecessary messages in the network. For this reason, in this paper, we propose a resource-efficient data security technique that can aggregate sensitive data while protecting data integrity in WSNs. Our technique protects from the leak of the sensed data by using the algebraic properties of the complex numbers. Our technique not only ensures that no trend about the sensitive data of a sensor node is released to any other nodes and adversaries, but also can aggregate and hide data for data privacy during transmissions to the data sink. Out of two parts of a complex number, the real part is used to hide the sampled data of a sensor node from its neighboring nodes and adversaries, whereas the imaginary part is used for data integrity. Before transmitting data to a parent node, every sensor node transforms its sampled data into a complex number form. The real part is generated by combining the sampled data with a unique private seed and the imaginary part is generated by appending an imaginary unit to the modified sampled data. Thus, our technique prevents from recovering sensitive information even though private data are overheard and decrypted by adversaries or other trusted participants. For strong data security, our technique can be built on the top of the existing secure communication protocols like [22]. Moreover, our technique can be applied to any type of WSNs regardless of network topology since it is a general approach.
The rest of the paper is organized as follows. In Section 2, we present some related work. Section 3 describes our integrity-protecting sensitive data aggregation technique. Simulation results are shown in Section 4. Along with some future research directions, we finally conclude our work in Section 5.
2. Related Work
In this section, we present related work for privacy-preserving data aggregation schemes. Figure 1 illustrates the classification of the privacy-preserving data aggregation techniques for WSNs. These techniques are broadly categorized into two categories: homogeneous techniques and heterogeneous ones. They are categorized based on the type of nodes in the WSNs, particularly the type of data aggregating nodes (aggregators). The aggregators can either be special (more powerful) nodes or regular sensor nodes. Moreover, the techniques are further divided into five groups: perturbation in homogeneous technique, shuffling, privacy homomorphism, perturbation in heterogeneous, and hybrid. First, the perturbation technique is also known as data customization. In this technique, every sensor node uses encryption key and/or seeds (private or public) generated by randomization techniques [23, 24] in order to hide the sampled data before transmitting them to a parent node. The perturbation in homogeneous technique include iCPDA [21], Conti et al.'s scheme [25], DADPP [26], PHA [27], and HP2S [28, 29], while the perturbation technique in heterogeneous includes Sheng and Li's scheme [30]. Second, in the shuffling technique, every sensor node slices its data into the fixed number (J) of data pieces and sends a data piece to the selected

Classification of the privacy preserving-data aggregation techniques for WSNs.
In the previous section, we addressed three important considerations for WSNs, which are energy consumption, data privacy, and data integrity. However, iPDA and iCPDA are the only works to support both privacy preservation and data integrity for WSNs; we provide the detailed explanation of iPDA and iCPDA in Section 2.1.
2.1. Privacy Preserving Data Aggregation Scheme with Data Integrity
He et al. proposed iPDA [19] and iCPDA [21] schemes for WSNs to support privacy-preserving data aggregation as well as data integrity. In the iPDA scheme, they protect data integrity by designing two node-disjoint aggregation trees rooted at the query server where each node belongs to a single aggregation tree. In this technique, first, every sensor node slices its private data randomly into L pieces and
In the iCPDA, three rounds of interactions are required. Firstly, each node sends a seed to other cluster members. Next, each node hides its sensory data via the received seeds and sends the hidden sensory data to each cluster member. Then, each node adds its own hidden data to the received hidden data and sends the calculated results to its cluster head which calculates the aggregation results via inverse and multiplication of matrix. To enforce data integrity, cluster members check the transmitted aggregated data of the cluster head. There are some disadvantages of iCPDA. Firstly, the communication overhead of iCPDA increases quadratically with the cluster size. Secondly, the computational overhead of CPDA increases quickly with the increase of the cluster size which introduces large matrix, whereas lower cluster size introduces lower privacy-preserving efficacy.
Both iPDA and iCPDA support very weak data integrity checking because if any node modifies its sampled value 30 to 300 and uses the value 300 for aggregation process none of both methods can detect such misbehavior in the network. Hence, in this paper, we propose a new, efficient (in terms of communication overhead and data propagation delay), and general (in terms of supporting network topology) scheme in order to support data privacy and achieve integrity assurance in data aggregation for WSNs. Our scheme is based on the algebraic properties of the complex numbers and it not only ensures that no trend about sensitive data of a sensor node is released to any other nodes and adversaries but also provides data integrity checking of the aggregated value of sensor data.
3. Integrity-Protecting Sensitive Data Aggregation Technique
To overcome the previously mentioned shortcomings of the iPDA and iCPDA, in this section, we propose a new energy-efficient data aggregation scheme for preserving data privacy in WSNs. Our scheme exploits an additive property of complex number to aggregate the sensed data in WSNs. Our assumption is that we only focus on additive aggregation function (SUM), like the iCPDA and iPDA. This is because other aggregation functions, such as average, count, variance, and standard deviation, can be obtained by using the additive aggregation function [34]. In our scheme, out of two parts of a complex number (
The proposed privacy and integrity preserving technique is performed through five steps. In the first step, we assign a special type of positive integer
Real ID of 8 sensor nodes with signature.
When the network receives an SQL-like query for SUM aggregation function, in the second step, the sampled sensitive data ds of each sensor node is, first, concealed in a by combining with a unique seed (sr) which is a private real number. The seeds can be selected from an integer range (i.e., space between lower bound and upper bound). By increasing the size of the range, we can further increase the level of the data privacy. Hence, our approach can support data privacy feature strongly. To support data integrity, an integer value b—the difference of the previous sensed value and the current sensed value of the sensor node—with i is appended to the a by using genCpxNum() function to form a complex number
Customized data creation for each node.
In the third step, the parent sensor node (i.e., data aggregator) decrypts the received data by using respective pairwise symmetric keys of its child sensor nodes. For each child node, the parent node computes the difference value (
In the fourth step, when the sink node receives all intermediate result sets
for all sensor nodes { ID = ID = Signature(
for all sensor nodes { sense ds; a = mask(ds, sr); // sr is a unique private seed transmit(
for every intermediate aggregators { for all received customized data { Drc( If ( {reject Else { SSig = Superimpose( transmit(
for all receive( SUM2 = add (
fetch_Nodes_IDs(); Node_IDs = SuperSig && SSig; SUM2 = disjoin (SUM2R, SUM2IM); SUM1R = Compute (sum of real seeds of contributed nodes); SUM = SUM If (SUM2IM = checking */ {return SUM;} Else {reject SUM;}

Superimposing signatures and addition of customized sensor readings in a multihop WSN (
4. Performance Evaluation
In this section, we present simulation results of our scheme by comparing it with iPDA and iCPDA schemes in terms of communication overhead and integrity checking. For this, we use TOSSIM [38] simulator running over TinyOS [39] operating system and GCC compiler. We consider 100 sensor nodes distributed randomly in 100 m × 100 m area. As presented in directed diffusion [40], we use such parameters as receiving power dissipation of 395 mW and transmitting power dissipation of 660 mW. Moreover, MATLAB 7.6.0.324 (R2008a) is used to get execution time required for data customization and data aggregation.
4.1. Data Aggregation
Figure 3 shows communication overhead in terms of the number of messages generated in a WSN with respect to varying number of sensor nodes. As expected, the number of messages in the iPDA, iCPDA, and our schemes increases when the number of sensor nodes increases. This is because every sensor node in the WSN is capable of sensing data and when the number of source nodes increases, the number of messages also naturally increases in all of the three schemes. However, our scheme outperforms the iPDA and iCPDA schemes because the existing schemes generate unnecessary messages in the network. The reason is that in our scheme each sensor node can customize its data by itself and it does not need to generate extra messages in the network for data privacy and integrity checking. On the other hand, the iPDA and iCPDA schemes generate six messages and four messages, respectively, for privacy preservation and integrity checking. Due to many messages exchanged among the nodes, the existing schemes cause high data collisions. That is to say, the number of messages generated in the network increases drastically as the number of sensor nodes becomes larger. iPDA and iCPDA schemes consume much energy for successful data transmission, compared with our scheme.

Energy consumption.
The messages generated in the WSN are finally consumed by the sink node. For this, message transmission and message reception processes are involved. Both processes require significant amount of energy. Figure 4 shows communication overhead in terms of energy dissipation by the iPDA, iCPDA, and our schemes with respect to varying number of sensor nodes in the WSN. As expected, the dissipated energy by all three schemes increases when the number of sensor nodes increases. This is because every message generated in the network requires some amount of energy to reach the sink node. However, the power consumption by our scheme is always lower than that of iPDA and iCPDA schemes. The reason is that the iPDA and iCPDA schemes generate too many unnecessary messages in the WSN while achieving integrity protection and privacy preservation in data aggregation. And also every sensor node becomes active for longer time to communicate all the messages. However, in our scheme, every sensor node can achieve both integrity protection and privacy preservation by comparing the current complex number with the previous one. Hence, the energy consumption of our scheme is reduced by 80% and 60% over the iPDA and iCPDA, respectively.

Energy consumption by the iPDA, iCPDA, and our schemes.
Table 3 shows the computation overhead of data aggregation. The result shows that iCPDA has the worst performance on the computation overhead for privacy-preserving data aggregation. The reason is that the iCPDA uses a time-consuming encryption method with two seeds to achieve data privacy. On the other hand, the computation cost of our scheme is about two times and 83 times faster than those of the iPDA and iCDPA, respectively. It is shown that our scheme reduces a significant amount of resource (CPU time) usage for achieving private data aggregation. This is because our scheme reduces the number of communication messages by using the additive property of a complex number.
Computational overhead for data customization and aggregation.
4.2. Data Integrity
Figure 5 shows data propagation delay in terms of average time required by sampled data of sensor nodes to reach to the sink node considering data privacy and integrity checking. During this process, a sensor node in iPDA and iCPDA has to communicate (i.e., transmit and receive) at least six and four messages, respectively. Hence, sensor nodes in both iPDA and iCPDA need more active time to perform all communications than our scheme resulting in very high data propagation delay in the existing work. In this way, dutycycling, which is the percent of time that an entity spends in an active state as a fraction of the total time [41], is also increased in the existing schemes. The iCPDA generates less number of messages than the iPDA but has complex computation for privacy preservation and longer size message than that of the iPDA. Moreover, in iCPDA, the sampled data of sensor nodes is sent to the opposite direction (data is transmitted from the cluster head to the cluster members) of the sink node for privacy preservation process. Therefore, the iCPDA has the worst performance among the three schemes. On the other hand, every sensor node in our scheme sends only one message (the aggregated data) to its parent node because it checks the integrity of the sensed data without the communication of other sensor nodes.

Average data transmissions time for iPDA, iCPDA, and our schemes.
Figure 6 provides the performance of three schemes in terms of the detection ratio of polluted messages for integrity checking. It is shown that our scheme can detect all polluted messages, whereas iPDA and iCPDA can detect less than 30% of polluted messages. The reason is that every node in our scheme checks the integrity of its incoming data received from the lower-level nodes. On the other hand, only the sink node can check the integrity of the aggregated data in iPDA, whereas only the sink node and the cluster heads can perform the integrity checking in iCPDA.

Integrity checking.
5. Conclusion
In this paper, we proposed an efficient and general scheme in order to aggregate sensitive data protecting data integrity for private data generating environments such as patients’ health monitoring application. For maintaining data privacy, our scheme applies the additive property of complex numbers where sampled data are customized and given the form of complex number before transmitting towards the sink node. As a result, it protects the trend of private data of a sensor node from being known by its neighboring nodes including data aggregators in WSNs. Moreover, it is still difficult for an adversary to recover sensitive information even though data are overheard and decrypted. Meanwhile, data integrity is protected by using the imaginary unit of complex-number-form customized data at the cost of just two extra bytes. Through simulation results, we have shown that our scheme is much more efficient in terms of communication and computation overheads, data propagation delay, and integrity checking than the iPDA and iCPDA schemes.
As future work, we will provide more simulation results by designing data integrity and sensitive data-preserving scheme under collusive attacks. Moreover, we will improve our privacy-preserving data aggregation scheme to support MAX and MIN aggregations.
Footnotes
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
Acknowledgments
This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology (Grant no. 2013010099). And this research was supported by the Brain Korea 21 PLUS Project, NRF.
