Abstract
Wireless sensor networks are always deployed in remote and hostile environments to gather sensitive information, in which sensor nodes are apt to encounter some serious leakage of sensitive data. Hence, privacy-preserving is becoming an increasingly important issue in security data aggregation for wireless sensor networks. In this paper, we propose a balance privacy-preserving data aggregation model (BPDA) based on slicing and mixing technology. Compared to fixed or random slicing, BPDA model gives a balance slicing mechanism to ensure that slice can be sent to the nodes which have lower privacy preservation and enhance the privacy-preserving efficacy. Furthermore, according to the influence of the node degree and energy, three different schemes are presented to keep the privacy-preserving data aggregation balance. Theoretical analysis and simulation show that BPDA model demonstrates a good performance in terms of privacy-preserving efficacy and communication overhead and prolongs the lifetime of network.
1. Introduction
A wireless sensor network (WSN) is a typical ad hoc network which is highly distributed and self-organized. It usually consists of plenty of small sensor nodes which gather the data from its monitoring physical or environment conditions (e.g., the temperature, the sound, etc.) and send their data to the destination (base station) directly or via multihop [1, 2]. WSN has many popular applications [3, 4], such as military surveillance, industrial process monitoring and control, air pollution monitoring, and machine health monitoring. Sensor node has typical weakness such as processing capability, storage capacity, and limited energy. In particular, the sensor nodes are always deployed in the harsh environment, without being recharged or replaced. Therefore, energy efficiency in in-network data processing is very important for WSN.
In WSN, sensor nodes collect regional information and upload them to the base station, where the base station disposes these data to obtain the result. There are plenty of redundant data in the process of uploading. For example, hundreds of sensor nodes are used to collect the temperature of an area while the manager just wants to know the maximum temperature. So, it is not necessary to send all the temperature data but a derivative such as maximum to base station. Data aggregation [5, 6] aims to aggregate redundant data at intermediate sensor nodes applying a suitable aggregation function on the received data. Aggregation reduces the amount of network traffic which helps to reduce energy consumption on sensor nodes.
WSN is always deployed in unsecured and untrusted environment, which makes it exposed to all kinds of intrusions, and encounters some serious security issue. Some works [7–12] studied security of data aggregate in WSN. These schemes use cryptographic mechanism to establish secure communication links for data aggregation. In some special scenario, the individual sensitive data should not be disclosed to any node in the network, including parent node or neighboring node. This is privacy-preserving [13, 14] in WSN, which keeps private data from being intercepted and used by adversaries and untrusted nodes and maintains data privacy of a sensor node from other trusted neighboring nodes in the WSN. Nowadays, privacy-preserving is becoming an increasingly important issue for security of WSN [15–24]. SMART (Slice-Mix-AggRegaTe) [18] is a typical scheme, which slices individual sensitive data into a fixed set of pieces and sends them to corresponding associated nodes. Afterwards, some improved approaches, such as iPDA [19], PEPDA [23] and ESPART [22], were proposed.
In this paper, we propose a balance privacy preserving-data aggregation (BPDA) model for WSN. Our work focuses on the distribution mechanism of slice for privacy data, which considers balance slices distribution based on the random distribution. It reduces the redundancy of the privacy preservation efficacy and prolongs the lifetime of the WSN.
The remainder of this paper is organized as follows. In Section 2, the related work is summarized. In Section 3, preliminaries of our work are described. A balanced privacy-preserving data aggregation model is proposed in Section 4. Section 5 analyzes the privacy preservation efficiency of proposed schemes. Performance evaluation and analysis are described in Section 6. Finally, the conclusion of this paper is given.
2. Related Work
Recently, secure data aggregation is becoming an important issue for wireless sensor networks. Cryptograph has been an efficient mechanism to secure data aggregation. Generally, there are two typical encryption methods: end-to-end scheme and hot-by-hop scheme. End-to-end scheme [15–17] needs to establish secure link between base station and each sensor node before data transmission, and then encrypted data is transmitted to base station directly. Hot-by-hop scheme [18–23] needs sensor node to encrypt data before sending and decrypt them after receiving. The shortcoming of this scheme is that it cannot provide data confidentiality in the node during the process of decryption and encryption.
Some existed works on secure data aggregation focused on symmetric key cryptography to achieve end-to-end security. Recently, homomorphic encryption technique is introduced to achieve in-network aggregation, which allows the ciphertext to be aggregated directly, and then the receiver verifies if decrypted aggregation result matches the result of aggregation operations performed on plaintext. Castelluccia et al. [15] proposed a homomorphic encryption scheme based on addition operation named AHE. AHE is a simple and provably secure encryption scheme that allows efficient additive aggregation of encrypted data. Only one modular addition is necessary for ciphertext aggregation. CDA [16] is an approach that conceals sensitive data end-to-end but still provides efficient and flexible in-network data aggregation. The aggregating intermediate nodes are not able to read the sensitive plaintext data. Ozdemir and Xiao [17] proposed a novel integrity protecting hierarchical concealed data aggregation protocol, which employs an elliptic curve cryptography-based homomorphic encryption algorithm. The scheme can offer data integrity and confidentiality along with hierarchical data aggregation. In addition, during the decryption of aggregated data, the base station is able to classify the encrypted and aggregated data based on the encryption keys. But homomorphism based secure data aggregation schemes need more computation overhead, and they cannot be used in the network which is divided into plenty of clusters. These schemes were described to deal with addition operations in data aggregation with homomorphic encryption, such as finding sum or average value. Homomorphic encryption makes it possible to aggregate data without doing encryption and decryption at intermediate nodes. However, it is not easy to find out operation satisfying the homomorphic properties.
Meanwhile, a typical slicing technology is introduced into privacy-preserving data aggregation in WSN. He proposed SMART scheme [18] firstly which includes three steps of slicing, mixing, and aggregation. In slicing step, each node slices its private data into J pieces randomly and keeps one of the J pieces by itself while sending the remaining
In ESPART model, a const MinDeg is set. If the indegree of a node is less than MinDeg, data in this node needs to be sliced. When preserving privacy, it begins at the nodes whose indegree equals 1 in the WSN. These nodes slice one piece of data to their neighbor. And then do the same thing to the nodes whose indegree equals 2, till all the indegree of nodes in the WSN is not less than MinDeg, and then the privacy-preserving process is ended. The process of mix and aggregation is the same as SMART model.
All the models based on SMART above send the slice to the neighbors randomly. In SMART scheme, a slice increases both the privacy-preserving efficacy of sending node and receiving node. But randomly slicing may lead the indegree of some nodes to getting large which is a redundancy of the privacy-preserving efficacy to the WSN.
The redundancy of the privacy-preserving efficiency means that the privacy-preservation efficacies of some nodes are far larger than other nodes. The redundancy costs more communication overhead which will shorten the lifetime of WSN.
In this paper, we present a balance privacy-preserving data aggregation model. The balance mechanism in the model ensures that slice can be sent to the nodes which have lower privacy preservation and enhances the privacy-preserving efficacy. At the same time, the model has less communication overhand and can prolong the lifetime of wireless sensor networks
3. Preliminaries
In this section, we explain our network model, as well as our assumptions and the key pre-distribution scheme used in our model.
3.1. Network Model
Here, we consider a WSN network including N nodes and the network is connected. All the nodes build a graph
Sensor nodes collect various data from monitoring environment and send them to the base station with suitable data aggregation schemes. In our model, we consider an additive aggregation function. It is a basic aggregation function because plenty of aggregation functions, such as count, average, and variance, can be deduced to the additive aggregation function. Data aggregation function is usually defined as follows:
3.2. Key Distribution
To prevent attackers from eavesdropping, some messages are usually encrypted before sending the data. The following is the brief review of the random key distribution mechanism proposed in [25] which will be used in our model.
Firstly, a large key pool of K keys and their corresponding identities are generated. Each sensor node in WSN chooses k keys randomly from the key-pool and finds out which neighbors share a common key with itself by exchanging discovery messages. A secure link exists between two neighboring nodes only if they share a key. If two neighboring nodes cannot share a key but they can be connected by a link consisting of some nodes, this link can be the secure link between these two nodes.
In the random key distribution mechanism mentioned above, the probability that any pair of nodes possess at least one common key is
Assume there are 10000 keys in the key pool, that is,
4. Balance Privacy-Preserving Data Aggregation Model
This section describes the details of the balance privacy-preserving data aggregation model (BPDA).
BPDA model considers the balance of the privacy-preserving efficacy in the whole WSN. In BPDA model, when nodes slice the data and send them to the neighbors, a balance mechanism is used to ensure that these slices will be sent to the nodes which have a low privacy preservation efficacy. This mechanism holds all the nodes at a similar privacy preservation efficacy, reduces the redundancy of the privacy preservation efficacy, and prolongs the lifetime of the WSN.
Figure 1 is an example to show the difference of balance slicing scheme and random slicing scheme. After building a tag tree, nodes 1 to 6 prepare to slice the data. In random slicing scheme (a), the minimum degree of all six nodes is 3 while degrees of node 3 and 5 reach 5, so the two nodes have privacy-preserving redundancy. So, we can adopt balance slicing scheme (b) which only increases the degree of those nodes whose degree is 3. So, only the degree of node 3 is 4, while others are 3. Less degree leads to fewer slices which can reduce communication overhead.

Random and balance slicing.
BPDA model consists of three phases as shown in Figure 2.

The process of BPDA.
(1) Preparing Phase. An aggregation tree is constructed according to the standard aggregation protocol TAG. Each node records its own degree and computes the threshold of slice to prepare for data aggregation.
(2) Privacy-Preserving Phase. Firstly, the nodes which need to be preserved are determined, then balance slicing scheme is used to preserve the privacy data of these nodes.
(3) Data Aggregation Phase. All nodes aggregate the data according to the TAG protocol and send the data to the base station.
4.1. Preparing Phase
In the preparing phase, after establishing the TAG aggregation tree, each node records its own degree and then broadcasts the degree to its neighbors in one hop.
Next, each node prepared to utilize slicing and mixing technology in order to preserve data privacy. The number of slice plays an important role for the privacy-preservation which decides the minimum of the privacy preservation efficacy to the WSN. In the existing schemes, the number of slice is estimated according to administrator experience. In our BPDA scheme, we will give a principle to decide the slicing number.
In many applications, the network manager may expect that the exposed probability of the privacy data is not more than a const Q. A degree threshold MinDeg is computed according to the experience probability. We assume that
In this paper, the data collected by node i is exposed only if all the messages both sent to and received from this node are exposed. Obviously, the sum of these messages equals the degree of this node, so we get
In BPDA model, the threshold of the degree can be set according to the process above, so we have
Figure 3 shows the relationship of the degree threshold MinDeg and the exposed probability q. Given

The relation of q and MinDeg at
4.2. Privacy-Preserving Phase
After the preparing phase, the base station computes the whole time of privacy-preserving phase
The values of MinDeg,
All of the sensor nodes begin to slice according to the time
At first, node i compares its
If
This round of slicing operation is finished and next round is ready.
If node i receives a slice from another node k as
If
Algorithm 1 shows the details of slicing.
Estimate the time Compute the whole slicing time Broadcast the values MinDeg,
Receive the values MinDeg,
If Find the neighbors of node i Compute the connected probability of each neighbor Confirm the receiving node (assume that is node i) Produce a slice Send the Update the Update the If receive a Receive Update the End If Else If receive a Receive Update the End If End If
In BPDA model, the receiving node is determined by the above balance slice algorithm instead of being selected randomly. In this process, the energy and the degree are the main factors to be considered. The following sections present three different algorithms according to the energy factor, the degree factor, and the both factors.
4.2.1. Energy Based
In energy based algorithm, a threshold
Firstly, a receiving node should be determined by the connected probability. The connected probability in this part is as follows:
Secondly, if
This algorithm balances the energy consumption and prolongs the lifetime in the WSN, but it may cause some redundancy of the privacy preservation efficacy. The model using this algorithm is called E-BPDA model.
4.2.2. Degree Based
In degree based algorithm, only one rule is considered; that is, a node with higher degree has lower probability to connect and to be connected. The connected probability is as follows:
This algorithm reduces the redundancy of the privacy preservation and balances the privacy preservation of the whole WSN, but it may cause a little unbalance of the energy consumption. The model using this algorithm is called D-BPDA model.
4.2.3. Both Energy and Degree Based
Energy based algorithm and degree based algorithm are complementary to each other. So the cooperation of these two types is considered. Firstly, similarly to energy based algorithm,
Firstly, a receiving node should be determined by the connected probability.
The connected probability in this part is as follows:
Secondly, if
This algorithm reduces the redundancy of the privacy preservation efficacy and balances the energy consumption at the same time. The model using this algorithm is called C-BPDA model.
4.3. Data Aggregation Phase
In this phase, each node sends its data to the base station along the aggregation tree.
5. Analysis of Privacy Preservation Efficacy
An evaluation method is necessary to compare different privacy-preserving schemes. One of such methods is proposed in [13] and is used by many other papers which can be described as follows.
Firstly, it assumes that
Then, the probability that the private data of node s is exposed for a given q under either condition above in SMART algorithm is as follows:
Obviously,
Actually, this evaluation considers the privacy preservation of the whole network instead of some certain node. As shown in Table 1, there are two networks, NW1 and NW2. Each of them has 8 nodes. Nodes in NW1 are not of the same degree, but in NW2 every node has the same degree of 3.
The degree distributions of two networks.
According to the Table 1 and formula (18), we have
If
6. Simulation
In this section, a wireless sensor network with 800 nodes is considered, and these nodes are randomly deployed over 400 × 400 areas. The energy of each node is 0.5 J. We apply TAG scheme [25] which is a typical data aggregation scheme in the simulation. We study the performances of BPDA model in four aspects with simulation which are degree distribution, privacy preservation efficacy, communication overhead, and lifetime. BPDA models will be compared with SMART model and ESPART model in these performances.
6.1. Degree Distribution
In this section, a node with degree of 2 is regarded as privacy-preserved enough.
Figure 4 shows degree distribution in different models. TAG is a data aggregation scheme without privacy consideration and the basis of the other models. In TAG model, the minimum degree is 1 and the maximum degree is 9 while 80 percent of nodes in network take the minimum degree. After privacy preserving, all the schemes increase the minimum degree to 2 and the maximum degree is increased too. In three BPDA models, the D-BPDA and C-BPDA only increase the maximum degree from 9 to 10. In the E-BPDA model, the maximum degree increases to 11 which is the same in the ESPART model. Meanwhile, the SMART model increases the maximum degree to 16. The increasing of maximum degree means that some nodes which need not to be privacy preserved are preserved. This is a main reason that causes the redundancy of the privacy preservation.

The degree distribution in different models.
On the other hand, the degree of more than 50 percent of nodes in three BPDA models is 2 while ESPART and SMART schemes increase more nodes’ degrees which are redundancy.
In three BPDA models, E-BPDA considers so many energy balances that its redundancy is the most. D-BPDA has the less redundancy by considering how to reduce it. The C-BPDA which combines both D-BPDA and E-BPDA leads less redundancy than the E-BPDA.
6.2. Privacy Preservation Efficacy
Here, the evaluation method of the privacy preservation efficacy in [22] is adopted.
Figure 5 shows the exposed probability of nodes in different models. In Figure 5, the exposed probability of nodes in BPDA models is higher than that of SMART and ESPART models because this evaluation method works from a global view of the whole WSN. In many cases, the larger the sum of degrees is, the lower exposed probability the model has. It seems that SMART and ESPART models have more ability on privacy preservation because they pay much more on the redundancy when some nodes have high privacy preservation with rather high degree after the operation. The BPDA models consider the redundancy problem and put the algorithms only effect on the nodes with minimum degree. Although their exposed probabilities are higher than others’, they are still kept in the similar level.

The exposed probability of nodes in different algorithms.
6.3. Communication Overhead
As to the communication overhead, the amount of the sending data in slicing step is considered. Tables 2 and 3 show the amount of sending data of different schemes SMART, ESPART, D-BPDA, E-BPDA, C-BPDA, and their percentage to SMART at conditions
The communication overhead and percentage to SMART, at
The communication overhead and percentage to SMART, at
In both Tables 2 and 3, all of the sending data in three BPDA models are less than those in the other two models. So the BPDA models reduce the communication overhead obviously.
When
Similar to
From the data of Tables 2 and 3, we can see that he BPDA models send less data which means they have a lower communication overhead. The ESPART model and BPDA models are closer in sending data with the increasing J. As a general rule, the BPDA models can preserve the data privacy well while using slice with
In three BPDA models, the E-BPDA model considers more of the energy balance of the whole network, so it causes the higher communication overhead. Other two models both consider the degree balance which causes lower communication overhead. And in D-BPDA and C-BPDA, the communication overheads are at the same level.
6.4. Lifetime
In the simulation of lifetime, we assume that all sensor nodes have an initial energy which is 0.5 J. The data packet size is 1000 bits. The minimum degree in the network is 2 after privacy preservation. A WSN cannot operate when more than 20% of the sensor nodes are out of work. And the number of nodes in network is 800. Therefore, the network lifetime is defined as the time when 160 sensor nodes are discharged.
Nodes consume energy both in sending and receiving data according to [26, 27]. In this paper, we use the model that the pass loss exponent is 2. The model is as follows.
A k-bit data packet is transmitted and the energy consumption of sending node is given by
A k-bit data packet is transmitted, and the energy consumption of receiving node is given by
Figure 6 shows that the drained nodes appear in about 700th round in the three BPDA schemes. And there is a bifurcation point at about the 900th round. Before the demarcation point, the increasing of the drained nodes in all models is at the same trace. After the point, the drained nodes in D-BPDA models increase to 160 in about 200 rounds, and then the lifetime is up. In the E-BPDA and C-BPDA models, the increasing of drained nodes is slower than D-BPDA model. So it can prolong the lifetime when considering the energy balance which can balance the energy consumption of the network.

The lifetime in different algorithms.
7. Conclusion
In wireless sensor networks, sensitive information that sensor nodes gathered is prone to be leaked for the hostile environment. Privacy-preserving has become an important issue in data aggregation. A balance privacy-preserving data aggregation model based on slicing and mixing technology is proposed in this paper. Firstly, a degree threshold is computed according to security requirement of the WSN. Compare with fixed or random slicing, the proposed slicing method emphasizes that sensor node sends the slices to its neighbors refer to the degree threshold and ensures that the slices can be sent to the nodes which have lower privacy preservation. So, it reduces the redundancy and increases the privacy-preservation efficacy. Furthermore, according to the influence factor in real application, energy based E-BPDA, degree based D-BPDA, and both energy and degree C-BPDA three different schemes are presented to keep the privacy-preserving data aggregation balance. Simulation shows that E-BPDA model has a longer lifetime, D-BPDA model has a lower communication overhead, and the C-BPDA combines the advantage of the E-BPDA and D-BPDA.
Footnotes
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
Acknowledgments
This work is supported by the National Natural Science Foundation of China under Grant 61201159, the Beijing Natural Science Foundation under Grant (4132057), and the Beijing Municipal Education Commission on Projects (SQKM201510016013).
