Sage Journals: Discover world-class research

Abstract

Security is always a hot topic in wireless sensor networks (WSNs). Privacy-preserving data aggregation has emerged as an important concern in designing data aggregation algorithm. This paper proposes a precision-enhanced and encryption-mixed privacy-preserving data aggregation (PEPDA). The objective is to reduce collision during data transmission and energy consumption and to compensate loss caused by the collision. Based on the Slice-Mix-AggRegaTe (SMART) scheme, it optimizes data slicing by using small data packet, node classifying, and positive and negative data slicing techniques. It also describes a randomized time slot and a data compensation algorithm. Theoretical analysis and simulation show that PEPDA demonstrates a good performance in terms of accuracy, complexity, and security.

1. Introduction

Wireless sensor network (WSN) has received considerable attention during last decade. It has been developed for a wide variety of applications, including military sensing and tracking, environment and security monitoring, and equipment and human monitoring and tracking. Sensor networks usually consist of a large number of ultrasmall autonomous devices. Each device, called a node, is battery powered and equipped with integrated sensors, digital signal processors (DSPs), and radio frequency (RF) circuits. Because of special characteristics and limitations of wireless sensor networks, we face an important challenge in security issue, particularly for the applications where wireless sensor networks are developed in a hostile environment or used for some crucial purposes. For example, an adversary can easily listen to the traffic and mislead communications between nodes. Usually, one of the objectives to develop a sensor network is to collect data. We have therefore to establish a secure network and data aggregation mechanism, together with designing secure protocols to deal with problems about key agreement and encryption in communications and to develop privacy-preserving data aggregation algorithms.

Sensor nodes collect data from where they are deployed and forward the corresponding data to sink node. If some sensors are compromised, the aggregated result will be ill-performed; Chan et al. [1] and Yang et al. [2] have introduced intrusion detection to identify ill-performed aggregation. These are passive privacy-preserving schemes. Moreover, some positive privacy-preserving schemes then are proposed by using cryptographic mechanism to establish secure communication links. A key predistribution scheme was first presented by Eschenauer and Gligor in [3], and a series of improved key distribution schemes [4–6] were described after that. The predistribution keys can be used to construct a hop-by-hop secure data aggregation algorithm. It is a simple and effective way to employ the encryption in data aggregation. However, the encryption and decryption operations have to be executed at each node. Therefore, data aggregating cost is relatively high. In order to get efficiency in privacy-preserving data aggregation, homomorphic encryption was introduced to construct an end-to-end secure data aggregation algorithm. This technique allows arithmetic operations to be performed on ciphertext directly. Note that the schemes using key distribution can ensure data not to be revealed by attackers from outside of network. However, a more stringent scenario may ask for guarantees of in-network confidentiality, which means that individual sensitive data should not be disclosed to any node in the network, including parent node or neighboring node. Some approaches are presented in [7–9] to address these issues. Meanwhile, a typical scheme, called SMART, is proposed by He et al. in [7], which slices individual sensitive data into a set of pieces and sends them to corresponding associated nodes. The SMART scheme guarantees privacy-preserving against attacks from outside and inside of a network by using encryption with the predistributed keys and slicing the data, respectively. It has attracted much more attention in research of privacy-preserving data aggregation.

The objective of this paper is to evaluate both security vulnerability and efficiency in data aggregating schemes, particularly for the SMART scheme, and to propose a novel optimal approach, because efficiency and privacy are two important factors considered in designing data aggregation algorithm. The network's whole lifetime is tied up with node's individual energy consumptions which are spent on processing instructions, computations of CPU, send and receive operations, and so forth. Based on SMART scheme, we propose a PEPDA scheme by optimizing some parameters to reduce data collision, data loss and overhead, then to prolong the lifetime of a WSN. Compared with SMART scheme, the proposed PEPDA scheme demonstrates better performances in terms of piece accepting rate, aggregation accuracy, energy consumption, and privacy-preserving efficacy.

The rest of this paper is organized as follows. Section 2 gives a summary of related work. Section 3 evaluates limitations in SMART scheme and introduces our improvement assumptions. Section 4 describes the PEPDA approach with five optimizing factors. Section 5 provides detailed PEPDA protocol. Section 6 analyzes performance of PEPDA scheme. Section 7 gives conclusions and sketches some future work.

2. Related Work

Security of data aggregation in WSNs has been investigated during last decade. Several references give a review about it [10]. Obviously, it is a good way to use cryptography to protect privacy of data.

Privacy-preserving data aggregation schemes using cryptographic mechanisms can be classified into two types: end-to-end encryption scheme and hop-by-hop encryption scheme [2, 7, 8, 11]. The end-to-end encryption scheme aims to establish secure link between base station (BS) and individual sensor node. Sensitive data are encrypted before forwarding upstream; BS then extracts original data using agreed key with each node, making intermediate nodes transparent during data transmission process. However, end-to-end encryption without aggregation is very power-consuming, because each encrypted data is transmitted to BS directly. Along with the fact that nodes closer to BS consume more energy as more data pass through them, the efficiency of end-to-end encryption without aggregation is debatable. To tackle this problem, homomorphic encryption technique is introduced by Castelluccia et al. in [12] and de Cristofaro in [13] to achieve in-network aggregation with end-to-end encryption. Some schemes were described to deal with addition operations in data aggregation with homomorphic encryption, such as to find sum or average value. Homomorphic encryption makes it possible to aggregate data without doing encryption and decryption at intermediate nodes. However, it is not easy to find out operation satisfying the homomorphic properties.

In hop-by-hop encryption scheme, upon receiving an aggregated data, the node decrypts it, aggregates with its own data, encrypts the newly aggregated data, and then forwards upstream. The encrypt and decrypt operations are performed by using certain key distribution scheme. Obviously, hop-by-hop encryption scheme is not an efficient design due to frequent intermediate encrypt and decrypt operations which brings about extra energy consumption and computational delay. Moreover, underlying privacy vulnerability is exposed when decrypted data are eavesdropped. Particularly, we face a challenge from inner attack. Piece slicing technique in SMART is a solution to this problem. In-network aggregation with end-to-end encryption scheme, on the contrary, provides better efficiency and does not have to worry about privacy vulnerability during intermediate node aggregation. Along with the attractive advantages, in-network aggregation with end-to-end encryption scheme designs however have to deal with certain problems in key distribution phase, which are elaborated by Feng et al. in [14].

Privacy-preserving data aggregation (PDA) scheme presented by He et al. in [7] consists of two schemes: cluster-based private data aggregation (CPDA) and SMART. Privacy performance is improved in these two schemes. However, neither of them is efficient. The former one is of computational complexity and big computational burden. Limitations of the latter scheme will be investigated in Section 3.

Some improved schemes are proposed based on the PDA scheme. Yang et al. present an energy-saving and privacy-preserving data aggregation (ESPART) scheme [8], which shows a good performance in both energy consumption and privacy-preserving efficacy. As a result, the lifetime of network could be prolonged. Work presented in [11] introduces a scheme which applies the additive property of complex numbers in order to combine sensor data and preserve data privacy during transmissions to the query server. The performance evaluation shows that it is more efficient than the PDA scheme in terms of both communication and computation costs. Work presented in [9] specializes in nonlinear aggregation functions instead of traditional additive function. The presented K-indistinguishable privacy-preserving data aggregation (KIPDA) scheme achieves the goal of privacy-preserving upon MAX and MIN aggregation functions by obfuscating data being forwarded. Zhang et al. proposed schemes to support both additive aggregation functions and nonadditive ones such as Max/Min, Median, and Histogram at the sacrifice in data accuracy [15]. A formal treatment to the security of concealed data aggregation (CDA) and the more general private data aggregation (PDA) is given in [16]. It analyzed security by comparing with SMART scheme. Despite the existence of schemes for privacy-preserving data aggregation, a rigorous analysis and optimization for SMART are still missing in the literature.

3. Background and Assumptions

3.1. SMART Scheme

Before introducing our proposal, we make a review about SMART scheme presented in [7], which consists of three steps.

Step 1 (Slicing).

Each sensor node i randomly selects a set $S_{i}$ of sensor nodes, say $J$ , within h hops, and slices data $d_{i}$ randomly into J pieces. One of the J pieces is kept at node i itself; the remaining $J - 1$ pieces are encrypted by using their shared keys and sent to nodes randomly selected from the node set $S_{i}$ .

Step 2 (Mixing).

When a node j receives an encrypted slice, it decrypts the data using the shared key with the sender. Then, it sums up all the received slices.

Step 3 (Aggregating).

All nodes aggregate the data according to the TAG protocol in [17] and send the result to the query server.

3.2. Analysis of SMART Scheme

This subsection evaluates SMART scheme in terms of collision and slice failed node (SFN). Assume that each node slices data into 3 pieces and forwards 2 of them to neighbors shown in Figure 1. We denote $S_{i j}$ as a piece sent from node i to node j. If $S_{14}$ and $S_{41}$ are forwarded at the same time, a collision happened between node 1 and node 4. As a result, it influences aggregation result. The worst situation is that each piece collides with another. In fact, the aggregation accuracy is around 40% for $J = 3$ , because the piece accepting rate is only about a quarter. This inspires us to design new accuracy oriented privacy-preserving scheme.

Figure 1

An example of slicing and forwarding in SMART.

Another problem is the SFN. In SMART scheme $J - 1$ pieces are encrypted and sent to nodes randomly selected from node set $S_{i}$ . Unfortunately, for some nodes the number of nodes in the set $S_{i}$ is less than $J - 1$ . Therefore, the encrypted pieces could not be sent to neighbors, which influence privacy-preserving efficacy in SMART. We define these nodes as SFN. For example, for a network with size $N = 600$ , hop size $h = 1$ , and slicing number $J = 3$ , there exist averagely 24 nodes in SFN. If the value $| S_{i} |$ of the node i is lower than 2, it could not find enough destination nodes to send its corresponding 2 sliced pieces.

Figure 2 shows the variation of SFN with the slicing number J. The number of SFN increases with the increase of slicing number J. This implies that the more pieces a node goes to slice, the more difficult it is to find enough corresponding destination nodes to send. One reason is that communication range of a sensor node is limited. Figure 2 also reveals that almost all nodes become SFN when slicing size J tends to 11, which indicates that SMART degenerates into TAG gradually [17].

Figure 2

The relation between SFN and J.

The above analyses inspire us to optimize SMART scheme with some factors to reduce the collision rate and to increase data aggregation accuracy. Five factors will be evaluated, respectively, which are shown in Figure 3. In order to reduce collision rate, a randomized time slot and node choosing technique are developed, while to reduce collision loss, small data packet, positive and negative piece slicing, and compensation methods are presented. All are discussed in Sections 4 and 5.

Figure 3

Improvement outline.

4. Overview of Proposed Approach

4.1. System Model

Consider a network with N nodes. Each node is marked from 1 to N, and the $I D$ of sink is 1. Assume that each node collects body temperature, which fluctuates from 35°C to 43°C, denote $D_{i}$ as collected reading of node i, and the lower-bound and upper-bound of $D_{i}$ are 35 and 43, respectively. There are many types of aggregation functions such as SUM, MAX, and AVERAGE. Here we only deal with SUM aggregation function $y (t) = \sum_{i = 1}^{N} f (D_{i} (t))$ as other functions could be somehow simplified into SUM model [12]. The aggregation accuracy is defined as

\begin{matrix} p_{a} = \frac{D_{1}^{′′}}{\sum_{i = 1}^{N} D_{i}}, \end{matrix}

(1)

where

D_{1}^{′′}

represents the final aggregation result in sink node.

Epoch duration presents the amount of time in data aggregation process, which is divided into four intervals: tree formatting phase, collusion phase, postback phase, and aggregation phase. Tree formatting phase expresses the time interval assigned for N nodes to establish a treelike hierarchical structure, collusion phase is a time interval for nodes to send sliced pieces to others, postback phase describes the time interval for nodes to send acknowledgment, and aggregation phase specifies the time interval for network to aggregate according to certain aggregation protocol.

4.2. The Proposed Approach

This subsection describes how to optimize SMART scheme. Detailed algorithms will be presented in Section 5.

4.2.1. Randomized Time Slot Factor

In SMART scheme, each node slices data into J pieces and sends them to neighbors. In order to reduce collision rate, a random sending time schedule is used during collusion phase, instead of spontaneously sending sliced pieces at the same time. Figure 4 demonstrates a piece forwarding time diagram. The first phase from A to B is tree build duration, while the second phase from B to D is collusion duration. Assume that slice number J is 3, therefore, every node has 2 pieces to send. The piece forwarding moments in SMART are B and C, respectively, (C is the midpoint of the line BD). In PEPDA scheme, the forwarding times are set randomly in time slots BC and CD, respectively. This forwarding mechanism demonstrates a good performance in reducing data collision which will be shown later.

Figure 4

Piece forwarding moment diagram.

4.2.2. Partial Factor

Note that the node in the set of SFN cannot find enough neighbor nodes to send $J - 1$ pieces. Therefore, there is at least one edge used to transmit twice the pieces. This is one of the reasons to cause collision. As a remedy, we divide nodes into two subsets T and F based on the condition $| S_{i} | \geq J - 1$ , where $i = 1, \dots, N$ . The node set F contains all SFN nodes, while the rest is in the node set T. Only nodes from the set T participate in piece slicing and mixing. Therefore, communication overhead is cut down, and then collision rate and energy consumption are reduced.

4.2.3. Small Data Factor

In SMART scheme, $J - 1$ pieces will be forwarded to neighbors. There is the possibility that large pieces are sent out while a small piece is kept by the node itself; under this circumstances, if collision occurs, most part of the data will be lost. Therefore, aggregation accuracy will be influenced. In order to improve the performance of SMART scheme, we define a small data factor L and make small fragments to be sent to neighbors. A detailed algorithm is given in the next section.

4.2.4. Positive and Negative Factor

Figure 5 illustrates three special slicing cases. Assume that slicing number J is 3 and all of the corresponding sent pieces are dropped as a result of collision. Data $D_{i} (i = 1, 2, 3)$ are 10. The numerical number on the line presents data sent or received by a node. According to the definition of the aggregation accuracy, we have the following results:

\begin{matrix} p_{case 1} = \frac{(40 - 10 - 10) \times 3}{40 \times 3} = 0.5, \\ p_{case 2} = \frac{[40 - 15 - (- 10)] \times 3}{40 \times 3} = 0.875, \\ p_{case 3} = \frac{[40 - (- 15) - 10] \times 3}{40 \times 3} > 1 . \end{matrix}

(2)

Figure 5

Three slicing cases.

From the analysis of case 1 and case 2, we notice that aggregation accuracy can be improved if a negative piece is sent. One reason is that using a negative piece increases proportion of the data kept by the node itself and decreases the influence of data loss caused by collision. As it is shown in case 2, $Σ_{m} {Positive Piece}_{m} \geq Σ_{n} | {Negative Piece}_{n} |$ , aggregation accuracy will be improved. Otherwise, aggregation accuracy will be distorted (see case 3). Such a negative piece technique is used in PEPDA scheme.

4.2.5. Compensation Factor

During the collusion phase, some pieces get lost because of collision, which finally influences the aggregation result at the sink node. If a node knows whether a piece is received by a neighbor successfully or has the loss rate, it can compensate for aggregating data and forward the result upstream during the data aggregation phase. This process needs to solve two problems: one is how to get the loss rate; the other is how to calculate the compensation. In PEPDA scheme, an ACK message will be sent to the neighbor to get the loss rate, and also an algorithm is presented to determine the compensation in Section 5.

5. Algorithms and Their Property

This section describes details of PEPDA scheme. The randomized time slot factor, the partial factor, the positive and negative factor, and the small data factor are used in the collusion phase, while the compensation factor is taken in the postback phase and the aggregation phase.

5.1. Tree Formatting Phase

An aggregation tree is constructed in the following way according to the standard aggregation protocol TAG [17].

Step 1.

Sink node 1 marks its tree level 0 and broadcasts a Hello message which contains its level information $L_{v}$ .

Step 2.

On receiving a Hello message, the node drops the message if it is already in the aggregation tree. Otherwise the node extracts $L_{v}$ value from the packet and marks its tree level $L_{v} + 1$ . Accordingly, the source node in the packet becomes parent node of this node. Then the node continues to broadcast Hello message containing level information $L_{v}$ of its own.

Step 3.

Loop Step 2 until all nodes are added to the aggregation tree.

As aggregation tree is being constructed, are the node set $S_{i} (i = 1, \dots, N)$ is established in the following way.

Upon receiving a Hello packet during the tree formatting phase, the node i records the source address number of this Hello packet into a memory space such as a neighboring table. The node set $S_{i}$ is then selected from such kind of neighboring table according to a fixed value (say $| S_{i} | = J$ ).

This phase establishes also shared keys used in encryption/decryption. We refer to the existing random key distribution mechanism proposed in [3].

5.2. Collusion Phase

This phase in PEPDA scheme is quite different from that of SMART scheme. In the tree formatting phase, the node set $S_{i}$ is set up. All nodes are then divided into two subsets according to the condition $| S_{i} | \geq J - 1$ . Nodes satisfying or against the condition $| S_{i} | \geq J - 1$ are classified into node sets T or F, respectively. Each node $i (i = 1, \dots, | T |)$ from the node set T slices data $D_{i}$ into J pieces. One of the J pieces is kept at the node i itself; the remaining $J - 1$ pieces are encrypted and sent to nodes randomly selected from the node set $S_{i}$ . When a node receives an encrypted piece, it decrypts the data using its shared key with the sender. Meanwhile, the node extracts the source address of this piece packet and adds it to an ACK forwarding table which is used for forwarding ACK messages later. The encrypting and decrypting operations are performed with the keys distributed by a random key distribution mechanism mentioned above.

Based on the technique of the positive and negative factor and the small data factor, the sent pieces are calculated in the following way.

Assume that the $J - 1$ pieces to be sent by each node are $P_{1}, P_{2}, \dots, P_{J - 1}$ . Define $L = (43 - 35) / (J - 1) = 8 / (J - 1)$ , choose $r \in (0,1)$ , then we have $P_{1} = L r = L {(- 1)}^{1 - 1} r^{1} = L r^{1}$

, positive

$P_{2} = L r (- r) = L {(- 1)}^{2 - 1} r^{2} = - L r^{2}$ , negative

$P_{3} = L r (- r) (- r) = L {(- 1)}^{3 - 1} r^{3} = L r^{3}$ , positive

⋮

$P_{J - 1} = L {(- 1)}^{J - 2} r^{J - 1}$ .

The piece calculating method shows the following effects.

(i)

The odd piece is positive and the even one emerges as a negative value.

(ii)

If J is an odd number, the numbers of positive and negative piece are the same, that is, $(J - 1) / 2$ ; otherwise, if J is an even number, there is one more positive piece.

(iii)

We have $Σ_{m} {Positive Piece}_{m} \geq Σ_{n} | {Negative Piece}_{n} |$ .

Now we define a sending time schedule based on technique of randomized time slot factor. Let a single time slot be $t_{s}$ = $T_{collusion}$ /( $J - 1$ ), $T_{collusion}$ represents time spent in collusion phase. The forwarding moment is determined as.

$t_{1}$ : a random moment between 0 and $t_{s}$ ;

$t_{2}$ : a random moment between $t_{s} and 2 t_{s}$ ;

⋮

$t_{J - 1}$ : a random moment between $(J - 2) t_{s}$ and $(J - 1) t_{s}$ .

Then, only the node from node set T sends the piece $P_{i}$ at the time $t_{i} (i = 1, 2, \dots, J - 1)$ to neighbor.

Theorem 1.

Under the same collision rate p, piece loss in PEPDA is less than that in SMART.

Proof.

In SMART scheme, for the ith node, we have

\begin{array}{l} P_{1} = D_{i} r_{1}, the rest is D_{i} - D_{i} r_{1} = D_{i} (1 - r_{1}), \\ P_{2} = D_{i} (1 - r_{1}) r_{2}, the rest is D_{i} (1 - r_{1}) - D_{i} (1 - r_{1}) r_{2} \\ = D_{i} (1 - r_{1}) (1 - r_{2}), \\ ⋮ \\ P_{J - 1} = D_{i} r_{J - 1} \prod_{j = 1}^{J - 2} (1 - r_{j}), \end{array}

(3)

where

r_{1}, r_{2}, \dots, r_{J - 1} \in (0,1)

The sum of sent pieces at ith node is $\sum_{m = 1}^{J - 1} P_{m}$ . If the rate of lost piece is p, then the loss is

\begin{array}{l} p \sum_{m = 1}^{J - 1} P_{m} = p D_{i} [r_{1} + (1 - r_{1}) r_{2} + (1 - r_{1}) (1 - r_{2}) r_{3} + \dots \\ + r_{J - 1} \prod_{j = 1}^{J - 2} (1 - r_{j})] . \end{array}

(4)

The total loss $R_{SMART}$ is

\begin{array}{l} \sum_{i = 1}^{N - 1} p D_{i} [r_{1} + (1 - r_{1}) r_{2} + (1 - r_{1}) (1 - r_{2}) r_{3} + \dots \\ + r_{J - 1} \prod_{j = 1}^{J - 2} (1 - r_{j})] . \end{array}

(5)

Similarly, in PEPDA scheme, the sum of sent pieces at ith node is

\begin{matrix} \sum_{m = 1}^{J - 1} P_{m} = D_{i} [r_{1} - r_{1}^{2} + r_{1}^{3} + \dots + {(- 1)}^{J - 2} r_{1}^{J - 1}] . \end{matrix}

(6)

Then the loss is

\begin{matrix} p D_{i} [r_{1} - r_{1}^{2} + r_{1}^{3} + \dots + {(- 1)}^{J - 2} r_{1}^{J - 1}] . \end{matrix}

(7)

Therefore, the total loss $R_{PEPDA}$ is

\begin{matrix} \sum_{i = 1}^{N} p D_{i} [r_{1} - r_{1}^{2} + r_{1}^{3} + \dots + {(- 1)}^{J - 2} r_{1}^{J - 1}] . \end{matrix}

(8)

We need now to prove $R_{SMART} > R_{PEPDA}$ . In fact, for $N = 1$ , we have $R_{SMART} = R_{PEPDA}$ .

For $N \geq 2$ and N is an odd number, we have

\begin{array}{l} R_{SMART} - R_{PEPDA} \\ = (1 - r_{1}) \times [r_{2} + (1 - r_{2}) \times r_{3} + \dots + r_{n} \\ \times \prod_{j = 2}^{n - 1}  (1 - r_{j})  + r_{1}^{2} + r_{1}^{4} + \dots + r_{1}^{n - 1}] > 0 . \end{array}

(9)

If N is an even number, we have

\begin{array}{l} R_{SMART} - R_{PEPDA} \\ = (1 - r_{1}) \times [r_{2} + (1 - r_{2}) \times r_{3} + \dots + r_{n} \\ \times \prod_{j = 2}^{n - 1} (1 - r_{j}) + r_{1}^{2} + r_{1}^{4} + \dots + r_{1}^{n - 2}] \\ + r_{1}^{n} > 0 . \end{array}

(10)

This gives the conclusion. The theorem somehow indicates that under the same collision rate, aggregation accuracy in PEPDA is superior to that of SMART. Moreover, the collision rate in our proposed scheme is reduced; accordingly, the practical aggregation accuracy will be even better as a result of smaller piece loss.

5.3. Postback Phase

This phase estimates the data loss rate for each node, which will be used in calculating the compensation factor.

During the collusion phase, an ACK forwarding table is established. After the collusion phase, each node sends ACK messages to nodes recorded in the table. We could do this step once a node is stored in the table. But this will cause more data collision. So the ACK message is sent after collusion phase to avoid collision between piece packet and ACK packet. Each node in the node set T will receive the ACK message from neighbor. Let $B_{i}$ be the number of ACK messages received by the node i, then, the difference $(J - 1) - B_{i}$ implies the number of packet lost in the collusion phase. Then, we have the rate of lost piece:

\begin{matrix} τ_{i} = \frac{J - 1 - B_{i}}{J - 1} . \end{matrix}

(11)

According to this rate, the node i calculates the compensation factor as follows:

\begin{matrix} C_{i} = L (r - r^{2}) \frac{J - 1 - B_{i}}{J - 1} . \end{matrix}

(12)

5.4. Aggregation

Before aggregating, node $i$ adds $C_{i}$ to itself as compensation, and all nodes in the network then do in-network aggregation following TAG protocol proposed in [17].

For an individual node, the total piece value to be sent is

\begin{array}{l} P_{1} + P_{2} + \dots + P_{J - 1} \\ = L r^{1} - L r^{2} + \dots + L {(- 1)}^{J - 2} r^{J - 1} \\ = L (r - r^{2} + r^{3} + \dots + {(- 1)}^{J - 2} r^{J - 1}) \\ > L (r - r^{2}), r \in (0,1) . \end{array}

(13)

From the formulas (12) and (13), it indicates that the compensation $C_{i}$ used in PEPDA algorithm is the minimum of lost pieces.

6. Simulation and Analysis

This section evaluates performances of PEPDA scheme in terms of privacy-preserving efficacy, piece accepting rate, aggregation accuracy, and power analysis. The simulation environment is TOSSIM under TinyOS. The parameters in simulation are shown in Table 1. A topology of nodes is shown in Figure 6.

Table 1

Simulation parameters.

Radio parameters	White Gaussian noise	Noise floor
Radio parameters	4 dB	−105 dB

Topology parameters	Number of nodes	Terrain dimensions
Topology parameters	600	400 meters × 400 meters

Figure 6

Nodes distribution.

6.1. Privacy-Preserving Efficacy

We denote by $P (q)$ the probability that private data is disclosed and take it as a privacy-preserving efficacy metric, where q represents the probability that link level privacy is broken.

In SMART scheme, the privacy is broken only when $J - 1$ outgoing links and all the incoming links are cracked. Accordingly, $P (q)$ can be approximately defined as

\begin{matrix} P (q) = q^{J - 1} \sum_{k = 0}^{d_{in_\max}} p (in-degree = k) q^{k}, \end{matrix}

(14)

where p(in-degree

= k

) is defined as the probability that the in-degree of a node is k and

d_{in_\max}

represents the maximum in-degree in a network.

According to (14), theoretical privacy-preserving efficacy is shown in Figure 7, which indicates that $P (q)$ decreases with the increase of slicing number J. The theoretical value cannot be reached due to the problem of SFN, which limits outgoing links, so that the incoming links get reduced as well. In fact, a practical definition of $P (q)$ should be calculated as

\begin{matrix} P (q) = \sum_{k = 0}^{d_{\max}} p (degree = k) q^{k}, \end{matrix}

(15)

where p(degree = k) represents the probability that the degree of a node is k, and

d_{\max}

is the maximum degree in a network. Note that the degree of a node includes both in-degree and out-degree values.

Figure 7

Theoretical value of $P (q)$ in SMART.

Under this circumstance, $P (q)$ fluctuates according to slicing number J, which is shown in Figure 8. Although there is a tiny fluctuation, $P (q)$ keeps decreasing with the increase of slicing number J overall.

Figure 8

Practical value of $P (q)$ in SMART.

In PEPDA scheme, only if an eavesdropper breaks all the incoming and outgoing links, along with the end-to-end encryption key of a node, will it be able to crack the private data held by this node. Therefore, $P (q)$ can be approximately defined as

\begin{array}{l} P (q) = P_{e} (P_{T} q^{J - 1} \sum_{k = 0}^{T_{\max}} p_{T} (in-degree = k) q^{k} \\ + P_{F} \sum_{l = 0}^{F_{\max}} p_{F} (in-degree = l) q^{l}), \end{array}

(16)

where

P_{e}

is the probability that an end-to-end encryption key is cracked and can be approximately valued as

P_{e} = 1 / N

P_{T}

and

P_{F}

are the percentages of the sets T and F, respectively, that is,

P_{T} = | T | / N

and

P_{F} = | F | / N

T_{\max}

and

F_{\max}

are the maximums of the in-degree in the node sets T and F, respectively.

p_{T}

(in-degree

= k

) is the probability that the in-degree of a node in the node set T is k, namely,

p_{T}

(in-degree

= k) = N

(in-degree

= k) / | T |

, where N(in-degree

= k

) is the number of node which in-degree equals to k. We have the same definition for

p_{F}

(in-degree = l) = N(in-degree= l)/

| F |

Figure 9 shows the privacy-preserving efficacy $P (q)$ of PEPDA scheme. As is illustrated in Figure 9, $P (q)$ increase as J grows, which is different from that in SMART scheme. In the original SMART scheme, outgoing and incoming links of a node increases as J grows, making it tougher to crack private data. Therefore, privacy-preserving efficacy improves. Figures 8 and 9 imply that PEPDA demonstrates a better performance than SMART in terms of privacy preserving.

Figure 9

$P (q)$ in PEPDA.

As mentioned in [14], applying homomorphism encryption technique in end-to-end encryption needs to tackle with several underlying issues, for example, capacity of confidentiality protection. Our scheme would not have to worry about this problem. If $D_{i}^{'} = D_{i} + k_{i}$ and $D_{i}^{'}$ becomes $D_{i}^{''}$ after collusion communication, $D_{i}^{''}$ in node set T is no longer $D_{i}^{'}$ , as these nodes have done slice and mix operations; $D_{i}^{''}$ in node set F may not be $D_{i}^{'}$ either, as these nodes may receive piece from nodes in node set T. Therefore, even though the range of $k_{i}^{}$ is known by the adversary, it is still difficult to guess the range of $D_{i}^{}$ , which identifies the original private data in a node. In collusion phase, if a node belongs to node set F and receives piece from node set T, it is called an infected node. We define IR (infected rate) as IR $= | N_{I} | / | F |$ , where $N_{I}$ is the number of infected node. IR reveals what the set T influences node set F; the bigger the IR value is, the more difficult it is for adversary to extract the original private data in node set F. As is illustrated in Figure 10, more than half the nodes from node set F are infected when J is lower than 7. The curve reaches its peak when slicing number J is selected as 4. The set F contains more 550 nodes when $J = 8$ , as shown in Figure 2. Only about 50 nodes in the set T slice data into J pieces.

Figure 10

Infected rate.

6.2. Piece Accepting Rate

Because collision happens during the collusion phase, some data pieces are lost. It influences effectively aggregation accuracy. To reduce the collision rate is one of the objectives in designing data aggregation approach. As a metric, piece accepting rate is evaluated. It is defined by

\begin{matrix} p_{r} = \frac{\sum_{i = 1}^{N} N_{i}^{r}}{\sum_{j = 1}^{N} N_{j}^{s}}, \end{matrix}

(17)

where

N_{i}^{r}

is the number of received pieces by a node i, while

N_{j}^{s}

is the number of pieces sent by a node j.

Figure 11(a) shows the relationship between slicing number J and piece accepting rate by using the partial factor. It indicates that with the increase of slicing number J, piece accepting rate increases. Particularly, piece accepting rate approaches 1 when $J = 11$ . That is because, as described in Figure 2, the set F contains almost all nodes in this case, only a few nodes need to be sliced. Therefore, piece collision is reduced. Figure 11(b) describes the influence of piece accepting rate with the factors with $J = 3$ . The curves of both SMART scheme and the partial factor schemes overlap. The reason is that the number of nodes in the set F is too small, say $| F | = 24$ for this case. Therefore, it has little influence on the piece collision by using the technique of the partial factor. However, there is a significant increase in piece accepting rate with randomized time slot factor, because the pieces sending schedules are optimized, then the collision rate is reduced. Obviously, compared with SMART scheme, Figure 11(b) shows a good performance of piece accepting rate by using the randomized time slot factor, which in turn enhances aggregation accuracy.

Figure 11

Piece accepting rate influenced by the factors.

6.3. Aggregation Accuracy

Figure 12 shows the aggregation accuracy of PEPDA with respect to slicing number J. The accuracy curve rises as J increases at beginning and then keeps steady, though there is tiny fluctuation within, because we use the compensation factor and the small data factor. However, for SMART scheme, the aggregation accuracy decreases as J increases.

Figure 12

Accuracy $P_{a}$ versus slicing number J.

Figure 13 gives an accuracy comparison between SMART and PEPDA schemes with $J = 3$ . Obviously, the accuracy of PEPDA scheme is twice as that of SMART scheme. This is a result by using five factors to optimize the aggregation algorithms with different sides. The randomized time slot factor and the partial factor are used to reduce the collision rate, while the small data factor and the positive and negative factor are applied to reduce the loss caused by the collision. The compensation factor is employed to correct the loss. Figure 12 demonstrates a good performance of PEPDA scheme with the five factors.

Figure 13

Accuracy comparison.

6.4. Complexity Analysis

This subsection focuses on evaluating complexity of schemes in terms of communication overhead and computation overhead.

6.4.1. Communication Overhead

Each node needs 2 basic messages in both SMART and PEPDA schemes. One is a Hello message to accomplish tree formation; another is for data aggregation [7]. Except for these common overheads, in PEPDA, an extra communication overhead consists of collusion overhead and ACK forwarding overhead. In fact, each node in the set T sends $(J - 1)$ pieces in the collusion phase and $B_{p}$ ACK messages in the postback phase. Then the total overhead of collusion communication is $(J - 1) | T | + B_{1} + B_{2} + \dots + B_{| T |}$ , where $B_{i} (i = 1, 2, \dots, | T |)$ are the piece accepting rates. As for SMART scheme, collusion communication is exactly the extra communication overhead, which is $(J - 1) N$ .

Figure 14 gives communication overheads of PEPDA. It shows that the total communication overhead is approximately twice as much as collusion communication overhead. Therefore, we have $(J - 1) | T | \approx B_{1} + B_{2} + \dots + B_{| T |}$ , which implies that $B_{i} \approx (J - 1) (i = 1, 2, \dots, | T |)$ in this case. This is because of the randomized time slot factor, which increases the piece accepting rate and makes it approach 1.

Figure 14

Communication overhead in PEPDA.

6.4.2. Computation Overhead

Both in SMART and PEPDA, a node executes the following process during collusion duration: (i)

calculate the piece size (slice) → encrypt → send

(ii)

receive → decrypt → sum (mix).

Therefore, the computational energy consumption can be determined by the following equation:

\begin{matrix} E_{j} = N_{e} Enc + N_{d} Dec + N_{c} Cal, \end{matrix}

(18)

where Enc and Dec are the energy costs of doing one time encryption and decryption of 10 bits value, Cal represents the energy required to perform one time computational instructions,

N_{e}

N_{d}

, and

N_{c}

are the numbers of operations of encryption, decryption, and computation, respectively.

In SMART, all nodes perform $(J - 1) N$ times operations in both encryption and decryption phases to achieve hop-by-hop encryption. Meanwhile, it takes $(J - 1) N$ times operations to compute in both slicing and mixing phases; while in PEPDA, as a result of SFN, slice and mix computing operations, together with encrypt and decrypt operations, are declining to $(J - 1) | T |$ (see Table 2). In order to get more precise result, we take MICAz and TelosB as examples [9]. The MICAz has a bus width of 8 bits and runs at 7.37 MHz, and the TelosB features the 16-bit microcontroller running at 4 MHz.

Table 2

Detailed values of parameters for (1) in SMART and PEPDA.

Scheme	$N_{e}$	$N_{d}$	$N_{c}$
SMART	$(J - 1) N$	$(J - 1) N$	$2 (J - 1) N$
PEPDA	$(J - 1) \| T \|$	$(J - 1) \| T \|$	$2 (J - 1) \| T \|$

The energy consumption of encrypting and decrypting 10 bits of data on the MICAz and TelosB architectures with IDEA, RC4, and RC5 algorithms for hop-by-hop encryption can be found in [9] (Table 3). Figure 15 illustrates the total energy cost of each scheme with the change of slicing number J. Total energy cost of PEPDA is lower than SMART.

Table 3

Energy consumption of common operations on MICAz Mote and TelosB Mote [9].

Operations	MICAz	TelosB
Compute for 1 clock tick	3.5 nJ	1.2 nJ
Transmit 1 bit	0.6 μJ	0.72 μJ
Receive 1 bit	0.67 μJ	0.81 μJ

Figure 15

Total energy cost of SMART and PEPDA.

Table 4 summarizes properties of SMART, ESPART, and PEPDA in terms of security, data accepting rate, aggregation accuracy, and overhead. The PEPDA scheme demonstrates a good performance compared with the other two methods.

Table 4

Performance comparison.

Evaluation	SMART	ESPART [8]	PEPDA
Privacy-preserving efficacy	Excellent	Excellent	Excellent
Piece accepting rate	Low	Ideal	High
Aggregation accuracy	Low	Ideal	High
Communication overhead	Large	Fair	Small
Computation overhead	Fair	Small	Small

7. Conclusion

Accuracy and privacy preserving are two important challenges in designing data aggregation algorithm in WSNs. Based on SMART scheme, the five factors are used to optimize the algorithms of data aggregation. The objective is to reduce the collision rate and collision loss. From this point, the randomized time slot factor and the partial factor are developed to decrease the collision rate, while the small data factor, the positive and negative factor, and the compensation factor are designed to improve the collision loss. We propose a novel privacy-preserving data aggregation scheme based on the five optimized factors. Analysis and simulation show that the proposed PEPDA scheme demonstrates a good performance in terms of accuracy, complexity, and security.

From the point of view of security, PEPDA uses the same mechanism as that in SMART. It is interesting to design privacy-preserving data aggregation schemes by combining hop-by-hop encryption with end-to-end encryption in PEPDA.

Footnotes

Acknowledgments

This work is supported by the National Basic Research Program (973 Program) of China under Grant no. 2011CB302903, the National Natural Science Foundation of China under Grants nos. 61272084, 61202004, and 61202353, the Key Project of Natural Science Research of Jiangsu University under Grant no. 11KJA520002, and Specialized Research Fund for the Doctoral Program of Higher Education under Grant no. 20113223110003.

References

Chan

Perrig

Song

Secure Hierarchical In-Network Aggregation in Sensor Networks 2006

Alexandria, Va, USA

ACM CCS

Yang

Wang

Zhu

Cao

SDAP: a secure hop-by-hop data aggregation protocol for sensor networks

Proceedings of the 7th ACM International Symposium on Mobile Ad Hoc Networking and Computing (MOBIHOC '06)

May 2006

Florence, Italy

356 367

2-s2.0-33748089962

Eschenauer

Gligor

V. D.

A key-management scheme for distributed sensor networks

Proceedings of ACM Conference on Computer and Communications Security (CCS '02)

November 2002

Washington, DC, USA

41 47

Chan

Perrig

Song

D. X.

Random key predistribution schemes for sensor networks

Proceedings of Symposium on Security and Privacy (SP '03)

May 2003

Carnegie Mellon Univ., Pa, USA

197 213

Liu

D. G.

Peng

Establishing pairwise keys in distributed sensor networks

ACM Transactions on Information and System Security 2005 8 1 41 77

Ruj

Nayak

Stojmenovic

Fully secure pairwise and triple key distribution in wireless sensor networks using combinatorial designs

Proceedings of the 30th IEEE International Conference on Computer Communications (IEEE INFOCOM '11)

April 2011

Shanghai, China

326 330

2-s2.0-79960859750

10.1109/INFCOM.2011.5935175

W. B.

Liu

Nguyen

PDA: privacy-preserving data aggregation in wireless sensor networks

Proceedings of the 26th Annual IEEE Conference on Computer Communications (IEEE INFOCOM '07)

May 2007

Anchorage, Alaska, USA

2045 2053

Yang

Wang

A. Q.

Chen

Z. Y.

An energy-saving privacy-preserving data aggregation algorithm

Chinese Journal of Computers 2011 34 792 800

10.3724/SP.J.1016.2011.00792

Groat

M. M.

Hey

Forrest

KIPDA: K-indistinguishable privacy-preserving data aggregation in wireless sensor networks

Proceedings of the 30th IEEE International Conference on Computer Communications (IEEE INFOCOM '11)

April 2011

2024 2032

2-s2.0-79960854358

10.1109/INFCOM.2011.5935010

10.

Ozdemir

Xiao

Secure data aggregation in wireless sensor networks: a comprehensive overview

Computer Networks 2009 53 12 2022 2037

2-s2.0-67549118456

10.1016/j.comnet.2009.02.023

11.

Bista

Kim

Chang

A new private data aggregation scheme for wireless sensor networks

Proceedings of Computer and Information Technology (CIT '10)

June 2010

West Yorkshire, UK

273 280

12.

Castelluccia

Mykletun

Tsudik

Efficient aggregation of encrypted data in wireless sensor networks

Proceedings of the 2nd Annual International Conference on Mobile and Ubiquitous Systems—Networking and Services (MobiQuitous '05)

July 2005

San Diego, Calif, USA

109 117

2-s2.0-33749525209

10.1109/MOBIQUITOUS.2005.25

13.

de Cristofaro

A secure and privacy-protecting aggregation scheme for sensor networks

Proceedings of IEEE International Symposium on a World of Wireless, Mobile and Multimedia Networks (WOWMOM '07)

June 2007

Espoo, Finland

1 5

2-s2.0-47749118726

10.1109/WOWMOM.2007.4351796

14.

Feng

T. M.

Wang

Zhang

W. S.

Ruan

Confidentiality protection for distributed sensor data aggregation

Proceedings of the 27th Conference on Computer Communications (INFOCOM '08)

April 2008

Phoenix, Ariz, USA

56 60

15.

Zhang

W. S.

Wang

Feng

T. M.

{GP}^{\land} 2 S

: generic privacy-preservation solutions for approximate aggregation of sensor data, concise contribution

Proceedings of the 6th Annual IEEE International Conference on Pervasive Computing and Communications (PerCom '08)

March 2008

Hong Kong

179 184

16.

Chan

A. F.

A security framework for privacy-preserving data aggregation in wireless sensor networks

ACM Transactions on Sensor Networks 2011 7 4 29 45

17.

Madden

Franklin

M. J.

Hellerstein

J. M.

TAG: a tiny aggregation service for ad-hoc sensor networks

Proceedings of the 5th Symposium on Operating Systems Design and Implementation (OSDI '02)

2002

New York, NY, USA

131 146

Precision-Enhanced and Encryption-Mixed Privacy-Preserving Data Aggregation in Wireless Sensor Networks

Abstract

1. Introduction

2. Related Work

3. Background and Assumptions

3.1. SMART Scheme

Step 1 (Slicing).

Step 2 (Mixing).

Step 3 (Aggregating).

3.2. Analysis of SMART Scheme

4. Overview of Proposed Approach

4.1. System Model

4.2. The Proposed Approach

4.2.1. Randomized Time Slot Factor

4.2.2. Partial Factor

4.2.3. Small Data Factor

4.2.4. Positive and Negative Factor

4.2.5. Compensation Factor

5. Algorithms and Their Property

5.1. Tree Formatting Phase

Step 1.

Step 2.

Step 3.

5.2. Collusion Phase

Theorem 1.

Proof.

5.3. Postback Phase

5.4. Aggregation

6. Simulation and Analysis

6.1. Privacy-Preserving Efficacy

6.2. Piece Accepting Rate

6.3. Aggregation Accuracy

6.4. Complexity Analysis

6.4.1. Communication Overhead

6.4.2. Computation Overhead

7. Conclusion

Footnotes

Acknowledgments

References