Abstract
Security is always a hot topic in wireless sensor networks (WSNs). Privacy-preserving data aggregation has emerged as an important concern in designing data aggregation algorithm. This paper proposes a precision-enhanced and encryption-mixed privacy-preserving data aggregation (PEPDA). The objective is to reduce collision during data transmission and energy consumption and to compensate loss caused by the collision. Based on the Slice-Mix-AggRegaTe (SMART) scheme, it optimizes data slicing by using small data packet, node classifying, and positive and negative data slicing techniques. It also describes a randomized time slot and a data compensation algorithm. Theoretical analysis and simulation show that PEPDA demonstrates a good performance in terms of accuracy, complexity, and security.
1. Introduction
Wireless sensor network (WSN) has received considerable attention during last decade. It has been developed for a wide variety of applications, including military sensing and tracking, environment and security monitoring, and equipment and human monitoring and tracking. Sensor networks usually consist of a large number of ultrasmall autonomous devices. Each device, called a node, is battery powered and equipped with integrated sensors, digital signal processors (DSPs), and radio frequency (RF) circuits. Because of special characteristics and limitations of wireless sensor networks, we face an important challenge in security issue, particularly for the applications where wireless sensor networks are developed in a hostile environment or used for some crucial purposes. For example, an adversary can easily listen to the traffic and mislead communications between nodes. Usually, one of the objectives to develop a sensor network is to collect data. We have therefore to establish a secure network and data aggregation mechanism, together with designing secure protocols to deal with problems about key agreement and encryption in communications and to develop privacy-preserving data aggregation algorithms.
Sensor nodes collect data from where they are deployed and forward the corresponding data to sink node. If some sensors are compromised, the aggregated result will be ill-performed; Chan et al. [1] and Yang et al. [2] have introduced intrusion detection to identify ill-performed aggregation. These are passive privacy-preserving schemes. Moreover, some positive privacy-preserving schemes then are proposed by using cryptographic mechanism to establish secure communication links. A key predistribution scheme was first presented by Eschenauer and Gligor in [3], and a series of improved key distribution schemes [4–6] were described after that. The predistribution keys can be used to construct a hop-by-hop secure data aggregation algorithm. It is a simple and effective way to employ the encryption in data aggregation. However, the encryption and decryption operations have to be executed at each node. Therefore, data aggregating cost is relatively high. In order to get efficiency in privacy-preserving data aggregation, homomorphic encryption was introduced to construct an end-to-end secure data aggregation algorithm. This technique allows arithmetic operations to be performed on ciphertext directly. Note that the schemes using key distribution can ensure data not to be revealed by attackers from outside of network. However, a more stringent scenario may ask for guarantees of in-network confidentiality, which means that individual sensitive data should not be disclosed to any node in the network, including parent node or neighboring node. Some approaches are presented in [7–9] to address these issues. Meanwhile, a typical scheme, called SMART, is proposed by He et al. in [7], which slices individual sensitive data into a set of pieces and sends them to corresponding associated nodes. The SMART scheme guarantees privacy-preserving against attacks from outside and inside of a network by using encryption with the predistributed keys and slicing the data, respectively. It has attracted much more attention in research of privacy-preserving data aggregation.
The objective of this paper is to evaluate both security vulnerability and efficiency in data aggregating schemes, particularly for the SMART scheme, and to propose a novel optimal approach, because efficiency and privacy are two important factors considered in designing data aggregation algorithm. The network's whole lifetime is tied up with node's individual energy consumptions which are spent on processing instructions, computations of CPU, send and receive operations, and so forth. Based on SMART scheme, we propose a PEPDA scheme by optimizing some parameters to reduce data collision, data loss and overhead, then to prolong the lifetime of a WSN. Compared with SMART scheme, the proposed PEPDA scheme demonstrates better performances in terms of piece accepting rate, aggregation accuracy, energy consumption, and privacy-preserving efficacy.
The rest of this paper is organized as follows. Section 2 gives a summary of related work. Section 3 evaluates limitations in SMART scheme and introduces our improvement assumptions. Section 4 describes the PEPDA approach with five optimizing factors. Section 5 provides detailed PEPDA protocol. Section 6 analyzes performance of PEPDA scheme. Section 7 gives conclusions and sketches some future work.
2. Related Work
Security of data aggregation in WSNs has been investigated during last decade. Several references give a review about it [10]. Obviously, it is a good way to use cryptography to protect privacy of data.
Privacy-preserving data aggregation schemes using cryptographic mechanisms can be classified into two types: end-to-end encryption scheme and hop-by-hop encryption scheme [2, 7, 8, 11]. The end-to-end encryption scheme aims to establish secure link between base station (BS) and individual sensor node. Sensitive data are encrypted before forwarding upstream; BS then extracts original data using agreed key with each node, making intermediate nodes transparent during data transmission process. However, end-to-end encryption without aggregation is very power-consuming, because each encrypted data is transmitted to BS directly. Along with the fact that nodes closer to BS consume more energy as more data pass through them, the efficiency of end-to-end encryption without aggregation is debatable. To tackle this problem, homomorphic encryption technique is introduced by Castelluccia et al. in [12] and de Cristofaro in [13] to achieve in-network aggregation with end-to-end encryption. Some schemes were described to deal with addition operations in data aggregation with homomorphic encryption, such as to find sum or average value. Homomorphic encryption makes it possible to aggregate data without doing encryption and decryption at intermediate nodes. However, it is not easy to find out operation satisfying the homomorphic properties.
In hop-by-hop encryption scheme, upon receiving an aggregated data, the node decrypts it, aggregates with its own data, encrypts the newly aggregated data, and then forwards upstream. The encrypt and decrypt operations are performed by using certain key distribution scheme. Obviously, hop-by-hop encryption scheme is not an efficient design due to frequent intermediate encrypt and decrypt operations which brings about extra energy consumption and computational delay. Moreover, underlying privacy vulnerability is exposed when decrypted data are eavesdropped. Particularly, we face a challenge from inner attack. Piece slicing technique in SMART is a solution to this problem. In-network aggregation with end-to-end encryption scheme, on the contrary, provides better efficiency and does not have to worry about privacy vulnerability during intermediate node aggregation. Along with the attractive advantages, in-network aggregation with end-to-end encryption scheme designs however have to deal with certain problems in key distribution phase, which are elaborated by Feng et al. in [14].
Privacy-preserving data aggregation (PDA) scheme presented by He et al. in [7] consists of two schemes: cluster-based private data aggregation (CPDA) and SMART. Privacy performance is improved in these two schemes. However, neither of them is efficient. The former one is of computational complexity and big computational burden. Limitations of the latter scheme will be investigated in Section 3.
Some improved schemes are proposed based on the PDA scheme. Yang et al. present an energy-saving and privacy-preserving data aggregation (ESPART) scheme [8], which shows a good performance in both energy consumption and privacy-preserving efficacy. As a result, the lifetime of network could be prolonged. Work presented in [11] introduces a scheme which applies the additive property of complex numbers in order to combine sensor data and preserve data privacy during transmissions to the query server. The performance evaluation shows that it is more efficient than the PDA scheme in terms of both communication and computation costs. Work presented in [9] specializes in nonlinear aggregation functions instead of traditional additive function. The presented K-indistinguishable privacy-preserving data aggregation (KIPDA) scheme achieves the goal of privacy-preserving upon MAX and MIN aggregation functions by obfuscating data being forwarded. Zhang et al. proposed schemes to support both additive aggregation functions and nonadditive ones such as Max/Min, Median, and Histogram at the sacrifice in data accuracy [15]. A formal treatment to the security of concealed data aggregation (CDA) and the more general private data aggregation (PDA) is given in [16]. It analyzed security by comparing with SMART scheme. Despite the existence of schemes for privacy-preserving data aggregation, a rigorous analysis and optimization for SMART are still missing in the literature.
3. Background and Assumptions
3.1. SMART Scheme
Before introducing our proposal, we make a review about SMART scheme presented in [7], which consists of three steps.
Step 1 (Slicing).
Each sensor node i randomly selects a set
Step 2 (Mixing).
When a node j receives an encrypted slice, it decrypts the data using the shared key with the sender. Then, it sums up all the received slices.
Step 3 (Aggregating).
All nodes aggregate the data according to the TAG protocol in [17] and send the result to the query server.
3.2. Analysis of SMART Scheme
This subsection evaluates SMART scheme in terms of collision and slice failed node (SFN). Assume that each node slices data into 3 pieces and forwards 2 of them to neighbors shown in Figure 1. We denote

An example of slicing and forwarding in SMART.
Another problem is the SFN. In SMART scheme
Figure 2 shows the variation of SFN with the slicing number J. The number of SFN increases with the increase of slicing number J. This implies that the more pieces a node goes to slice, the more difficult it is to find enough corresponding destination nodes to send. One reason is that communication range of a sensor node is limited. Figure 2 also reveals that almost all nodes become SFN when slicing size J tends to 11, which indicates that SMART degenerates into TAG gradually [17].

The relation between SFN and J.
The above analyses inspire us to optimize SMART scheme with some factors to reduce the collision rate and to increase data aggregation accuracy. Five factors will be evaluated, respectively, which are shown in Figure 3. In order to reduce collision rate, a randomized time slot and node choosing technique are developed, while to reduce collision loss, small data packet, positive and negative piece slicing, and compensation methods are presented. All are discussed in Sections 4 and 5.

Improvement outline.
4. Overview of Proposed Approach
4.1. System Model
Consider a network with N nodes. Each node is marked from 1 to N, and the
Epoch duration presents the amount of time in data aggregation process, which is divided into four intervals: tree formatting phase, collusion phase, postback phase, and aggregation phase. Tree formatting phase expresses the time interval assigned for N nodes to establish a treelike hierarchical structure, collusion phase is a time interval for nodes to send sliced pieces to others, postback phase describes the time interval for nodes to send acknowledgment, and aggregation phase specifies the time interval for network to aggregate according to certain aggregation protocol.
4.2. The Proposed Approach
This subsection describes how to optimize SMART scheme. Detailed algorithms will be presented in Section 5.
4.2.1. Randomized Time Slot Factor
In SMART scheme, each node slices data into J pieces and sends them to neighbors. In order to reduce collision rate, a random sending time schedule is used during collusion phase, instead of spontaneously sending sliced pieces at the same time. Figure 4 demonstrates a piece forwarding time diagram. The first phase from A to B is tree build duration, while the second phase from B to D is collusion duration. Assume that slice number J is 3, therefore, every node has 2 pieces to send. The piece forwarding moments in SMART are B and C, respectively, (C is the midpoint of the line BD). In PEPDA scheme, the forwarding times are set randomly in time slots BC and CD, respectively. This forwarding mechanism demonstrates a good performance in reducing data collision which will be shown later.

Piece forwarding moment diagram.
4.2.2. Partial Factor
Note that the node in the set of SFN cannot find enough neighbor nodes to send
4.2.3. Small Data Factor
In SMART scheme,
4.2.4. Positive and Negative Factor
Figure 5 illustrates three special slicing cases. Assume that slicing number J is 3 and all of the corresponding sent pieces are dropped as a result of collision. Data

Three slicing cases.
From the analysis of case 1 and case 2, we notice that aggregation accuracy can be improved if a negative piece is sent. One reason is that using a negative piece increases proportion of the data kept by the node itself and decreases the influence of data loss caused by collision. As it is shown in case 2,
4.2.5. Compensation Factor
During the collusion phase, some pieces get lost because of collision, which finally influences the aggregation result at the sink node. If a node knows whether a piece is received by a neighbor successfully or has the loss rate, it can compensate for aggregating data and forward the result upstream during the data aggregation phase. This process needs to solve two problems: one is how to get the loss rate; the other is how to calculate the compensation. In PEPDA scheme, an ACK message will be sent to the neighbor to get the loss rate, and also an algorithm is presented to determine the compensation in Section 5.
5. Algorithms and Their Property
This section describes details of PEPDA scheme. The randomized time slot factor, the partial factor, the positive and negative factor, and the small data factor are used in the collusion phase, while the compensation factor is taken in the postback phase and the aggregation phase.
5.1. Tree Formatting Phase
An aggregation tree is constructed in the following way according to the standard aggregation protocol TAG [17].
Step 1.
Sink node 1 marks its tree level 0 and broadcasts a Hello message which contains its level information
Step 2.
On receiving a Hello message, the node drops the message if it is already in the aggregation tree. Otherwise the node extracts
Step 3.
Loop Step 2 until all nodes are added to the aggregation tree.
As aggregation tree is being constructed, are the node set
Upon receiving a Hello packet during the tree formatting phase, the node i records the source address number of this Hello packet into a memory space such as a neighboring table. The node set
This phase establishes also shared keys used in encryption/decryption. We refer to the existing random key distribution mechanism proposed in [3].
5.2. Collusion Phase
This phase in PEPDA scheme is quite different from that of SMART scheme. In the tree formatting phase, the node set
Based on the technique of the positive and negative factor and the small data factor, the sent pieces are calculated in the following way.
Assume that the , positive ⋮ The odd piece is positive and the even one emerges as a negative value. If J is an odd number, the numbers of positive and negative piece are the same, that is, We have
The piece calculating method shows the following effects.
Now we define a sending time schedule based on technique of randomized time slot factor. Let a single time slot be ⋮
Then, only the node from node set T sends the piece
Theorem 1.
Under the same collision rate p, piece loss in PEPDA is less than that in SMART.
Proof.
In SMART scheme, for the ith node, we have
The sum of sent pieces at ith node is
The total loss
Similarly, in PEPDA scheme, the sum of sent pieces at ith node is
Then the loss is
Therefore, the total loss
We need now to prove
For
If N is an even number, we have
This gives the conclusion. The theorem somehow indicates that under the same collision rate, aggregation accuracy in PEPDA is superior to that of SMART. Moreover, the collision rate in our proposed scheme is reduced; accordingly, the practical aggregation accuracy will be even better as a result of smaller piece loss.
5.3. Postback Phase
This phase estimates the data loss rate for each node, which will be used in calculating the compensation factor.
During the collusion phase, an ACK forwarding table is established. After the collusion phase, each node sends ACK messages to nodes recorded in the table. We could do this step once a node is stored in the table. But this will cause more data collision. So the ACK message is sent after collusion phase to avoid collision between piece packet and ACK packet. Each node in the node set T will receive the ACK message from neighbor. Let
According to this rate, the node i calculates the compensation factor as follows:
5.4. Aggregation
Before aggregating, node
For an individual node, the total piece value to be sent is
From the formulas (12) and (13), it indicates that the compensation
6. Simulation and Analysis
This section evaluates performances of PEPDA scheme in terms of privacy-preserving efficacy, piece accepting rate, aggregation accuracy, and power analysis. The simulation environment is TOSSIM under TinyOS. The parameters in simulation are shown in Table 1. A topology of nodes is shown in Figure 6.
Simulation parameters.

Nodes distribution.
6.1. Privacy-Preserving Efficacy
We denote by
In SMART scheme, the privacy is broken only when
According to (14), theoretical privacy-preserving efficacy is shown in Figure 7, which indicates that

Theoretical value of
Under this circumstance,

Practical value of
In PEPDA scheme, only if an eavesdropper breaks all the incoming and outgoing links, along with the end-to-end encryption key of a node, will it be able to crack the private data held by this node. Therefore,
Figure 9 shows the privacy-preserving efficacy

As mentioned in [14], applying homomorphism encryption technique in end-to-end encryption needs to tackle with several underlying issues, for example, capacity of confidentiality protection. Our scheme would not have to worry about this problem. If

Infected rate.
6.2. Piece Accepting Rate
Because collision happens during the collusion phase, some data pieces are lost. It influences effectively aggregation accuracy. To reduce the collision rate is one of the objectives in designing data aggregation approach. As a metric, piece accepting rate is evaluated. It is defined by
Figure 11(a) shows the relationship between slicing number J and piece accepting rate by using the partial factor. It indicates that with the increase of slicing number J, piece accepting rate increases. Particularly, piece accepting rate approaches 1 when

Piece accepting rate influenced by the factors.
6.3. Aggregation Accuracy
Figure 12 shows the aggregation accuracy of PEPDA with respect to slicing number J. The accuracy curve rises as J increases at beginning and then keeps steady, though there is tiny fluctuation within, because we use the compensation factor and the small data factor. However, for SMART scheme, the aggregation accuracy decreases as J increases.

Accuracy
Figure 13 gives an accuracy comparison between SMART and PEPDA schemes with

Accuracy comparison.
6.4. Complexity Analysis
This subsection focuses on evaluating complexity of schemes in terms of communication overhead and computation overhead.
6.4.1. Communication Overhead
Each node needs 2 basic messages in both SMART and PEPDA schemes. One is a Hello message to accomplish tree formation; another is for data aggregation [7]. Except for these common overheads, in PEPDA, an extra communication overhead consists of collusion overhead and ACK forwarding overhead. In fact, each node in the set T sends
Figure 14 gives communication overheads of PEPDA. It shows that the total communication overhead is approximately twice as much as collusion communication overhead. Therefore, we have

Communication overhead in PEPDA.
6.4.2. Computation Overhead
Both in SMART and PEPDA, a node executes the following process during collusion duration:
calculate the piece size (slice) → encrypt → send receive → decrypt → sum (mix).
Therefore, the computational energy consumption can be determined by the following equation:
In SMART, all nodes perform
Detailed values of parameters for (1) in SMART and PEPDA.
The energy consumption of encrypting and decrypting 10 bits of data on the MICAz and TelosB architectures with IDEA, RC4, and RC5 algorithms for hop-by-hop encryption can be found in [9] (Table 3). Figure 15 illustrates the total energy cost of each scheme with the change of slicing number J. Total energy cost of PEPDA is lower than SMART.
Energy consumption of common operations on MICAz Mote and TelosB Mote [9].

Total energy cost of SMART and PEPDA.
Table 4 summarizes properties of SMART, ESPART, and PEPDA in terms of security, data accepting rate, aggregation accuracy, and overhead. The PEPDA scheme demonstrates a good performance compared with the other two methods.
Performance comparison.
7. Conclusion
Accuracy and privacy preserving are two important challenges in designing data aggregation algorithm in WSNs. Based on SMART scheme, the five factors are used to optimize the algorithms of data aggregation. The objective is to reduce the collision rate and collision loss. From this point, the randomized time slot factor and the partial factor are developed to decrease the collision rate, while the small data factor, the positive and negative factor, and the compensation factor are designed to improve the collision loss. We propose a novel privacy-preserving data aggregation scheme based on the five optimized factors. Analysis and simulation show that the proposed PEPDA scheme demonstrates a good performance in terms of accuracy, complexity, and security.
From the point of view of security, PEPDA uses the same mechanism as that in SMART. It is interesting to design privacy-preserving data aggregation schemes by combining hop-by-hop encryption with end-to-end encryption in PEPDA.
Footnotes
Acknowledgments
This work is supported by the National Basic Research Program (973 Program) of China under Grant no. 2011CB302903, the National Natural Science Foundation of China under Grants nos. 61272084, 61202004, and 61202353, the Key Project of Natural Science Research of Jiangsu University under Grant no. 11KJA520002, and Specialized Research Fund for the Doctoral Program of Higher Education under Grant no. 20113223110003.
