Abstract
With the advancement of wireless communication technologies, mobile phones, PDAs, and car embedded devices are equipped with sensors, such as sound and image. People can apply these devices to form a new sensing network called people-centric sensing network. And this network offers new opportunities for cooperative sensing applications. However, it introduces some challenges, including security challenge and robust challenge. As sensor nodes need to send their individual sensed data to an aggregator node and these data are related to users' real life, privacy-preserving data aggregation is a challenge issue. As a node could become offline or a message could be lost before reaching the aggregator, retaining the correctness of the aggregate computed is important. In this paper, we present the design of PDA, a novel privacy-preserving robust data aggregation scheme in people-centric sensing system. Based on K-anonymity, homomorphic encryption, and secret sharing, PDA can support a wide range of statistical additive and non-additive aggregation functions such as Sum, Subtraction, Average, Count, Max/Min, and Median without leaking individual sensed data. Moreover, PDA is robust to node failure and data loss. We also evaluate the efficacy and efficiency of PDA. The result shows that our scheme can achieve the security and robust goal under a reasonable cost.
1. Introduction
Nowadays, technological advances in sensing, computation, storage, and wireless communications turn the mobile devices carried by people into a global mobile sensing device [1]. For example, currently smartphone is equipped with accelerometers, audio, location, and image sensors. So the mobile device's owner becomes sensor custodian that can use the device to collect meaningful context data [2]. Therefore, people as individuals or special interest groups can apply the new sensing devices to form sensing networks which are called people-centric sensing networks [3]. This transformation provides a chance to create intelligence systems that collect data from widespread public participations [4]. There are many systems in present, such as BikeNet [5], CitySense [6], Mobiscopes [7], Urban Sensing [8], SenseWeb [9], and CarTel [10]. They could be used to monitor environmental pollution, temperature, or noise intensity of urban areas. In order to declare the impact of people-centric sensing networks, we will introduce the CitySense. CitySense passively “senses” the most popular places based on actual real-time activity and displays a live heat map. The application intelligently leverages the inherent wisdom of crowds without any change in existing user behavior, in order to navigate people to the hottest spots in a city [6]. So we could find the most popular places to visit easily through CitySense. That is, people-centric sensing networks have a great potential to provide large-scale, flexible, and global sensing network services.
Even though people-centric sensing networks can offer many benefit's to users, there are still some challenges. One of them is that the mobile device collects context information closely related to user's real life. So the concern of user privacy is a challenge for people-centric sensing networks. As users usually do not gain a direct benefit from reporting data, users are unwilling to share their data if their privacy is at risk [11]. The procedure of data aggregation is the most likely to by attacked by adversary. The other challenge is that a node could become offline or a message could be lost before reaching the aggregator. So we should consider a solution that ensures that these problems could not affect the correctness of the aggregate computed. These challenges motivate our design of PDA, a novel privacy-preserving robust data aggregation scheme in people-centric sensing system.
Although the people-centric sensing network is becoming popular, there is relatively little work focusing on its privacy aspects and robust to node or communication failure. The shortage of prior work can be concluded in three points. The first point is that prior work has generally focused on preserving user's privacy to a certain extent [11–14]. However, these researches are not robust to node or communication failure. The second point is that these researches rarely address the privacy issues of data aggregation which is the most likely to be attacked by adversary [15–17]. The third point is that most of prior work did not suit for the peculiarity of this network. For example, the devices are no longer owned by a single authority but belong to individuals; therefore, the devices could be mobile and malicious.
Though the above analysis, we could know that none of the existing schemes could overcome the challenges that one mentioned above in people-centric sensing networks. The contributions of this paper are summarized as follows.
We formulate the threat model in people-centric sensing networks. These models enable researchers to study the privacy-preserving in people-centric sensing networks conveniently. We present a novel scheme, PDA, to protect users' privacy against threats and be robust to node or communication failure in people-centric sensing networks. This scheme is based on K-anonymity, homomorphic encryption, and secret sharing. And it can support a wide range of statistical additive and nonadditive aggregation functions such as Subtraction, Sum, Average, Count, Max/Min, and Median.
The paper is organized as follows. We present the related work in Section 2. In Section 3, we introduce the basic of K-anonymity, homomorphic encryption, and secret sharing. We present the system architecture, threat model, and design goal in Section 4. We present our scheme in detail in Section 5. Then, we give the evaluation of PDA in Section 6. Conclusion could be found in Section 7.
2. Related Work
Although the people-centric sensing network is becoming popular, there is relatively little work focusing on its privacy aspects and robust to node or communication failure.
In [11], AnonySense architecture for anonymous tasking and reporting has been proposed. Shi et al. [12] present the PriSense scheme to protect privacy during the aggregation process of data. And this scheme support additive and nonadditive aggregation functions. Feng et al. [13] present a scheme to realize the data aggregation. However, it only supports additive aggregation functions such as max/min and median at the sacrifice in data accuracy. Zhang et al. [14] also present a scheme which supports both additive aggregation functions and nonadditive ones. These researches can preserve users' privacy to a certain extent. However, these researches are not robust to node or communication failure.
Wagner [15] present a scheme that protect the data aggregation in the presence of false data injection attack by a few malicious nodes and provide guidelines for selecting appropriate aggregation functions in a sensor network. Chan et al. [16] design an algorithm which the base station could detect if the computed aggregate was falsified. Conti et al. [17] present a scheme to preserve the privacy of robust data aggregation in wireless sensor network. However, these researches did not address the privacy issues of data aggregation which is the most likely to be attacked by adversary. In addition, as the devices are no longer owned by a single authority but belong to individuals in people-centric sensing network, the devices could be mobile and malicious. Therefore, these researches are not suitable for people-centric sensing applications.
Therefore, the novelty and the main differences with the existing work in the literature can be concluded in four points.
PDA is looking at the new sensing network called people-centric sensing network. Most of the existing schemes cannot suit for the peculiarity of this network. For example, the devices are no longer owned by a single authority but belong to individuals; therefore, the devices could be mobile and malicious. The problem that PDA solved contains security challenge and robust challenge. To the best of my knowledge, PDA is the first scheme that can solve these two problems of people-centric sensing network. Moreover, we not only consider the external internal threat, but also the internal threat. This is out of reach for a lot of existing work. PDA is a clever use of K-anonymity, homomorphic encryption, and secret sharing. There is little research to use these schemes for robust challenge. Moreover, the combination of these schemes works very well.
3. Basic Schemes
In this section, we will introduce the basic schemes which are implemented in our scheme.
3.1.
Anonymity
Society is experiencing exponential growth in the number and variety of data collections containing person-specific information as computer technology, network connectivity, and disk storage space become increasingly affordable. But the security of the database is a big challenge. Some researches propose that we can get a certain protection by removing some sensitive message. However, in most of these cases, the remaining data can be used to reidentify individuals by linking or matching the data to other data or by looking at unique characteristics found in the released data.
In order to solve this problem, Samarati [18] propose K-anonymity in 1998. In K-anonymity, an attacker cannot distinguish the specific individual who belongs to the private information by adding a certain amount of fake individuals. Therefore, we can prevent the disclosure of personal privacy.
K-anonymity: Let
Table 1 provides an example of a table
An example of K-anonymity.
As the sensing data will be aggregated to aggregation servers and it could be malicious, we could guarantee security based on K-anonymity.
3.2. Homomorphic Encryption
Homomorphic encryption technology was originally developed in 1978 by Rivest [20]. It is an encryption transformation technology that allows operation of the ciphertext. Homomorphic encryption was first used to encrypt the statistics. It is ensured that the user can operate on sensitive data by the homogeneity of the algorithm without revealing the data information. That is, homomorphic encryption is a form of encryption which allows specific types of computations to be carried out on ciphertext and obtain an encrypted result which matches the decrypted result of operations performed on the plaintext. A cryptosystem which supports both addition and multiplication is known as fully homomorphic encryption. Using such a scheme, any circuit can be homomorphically evaluated, effectively allowing the construction of programs which may be run on encryptions of their inputs to produce an encryption of their output. This is equivalent to not knowing that the problem also gives the answer to the problem.
Homomorphic encryption is built on the foundations of algebra theory. Then we will introduce the basic idea of homomorphic encryption. Assuming that E and D are the encryption and decryption functions, plaintext data is a finite set
3.3. Secret Sharing
Secret sharing distributes, preserves, and restores the secret method. And it is an important tool to achieve secure multiparty computation. Secret sharing refers to method for distributing a secret among a group of participants, each of whom is allocated a share of the secret. The secret can be reconstructed only when a sufficient number, of possibly different types, of shares are combined together; individual shares are of no use on their own [23]. Secret sharing technology extends to many areas and practice systems, such as video, voice, and other fields. It is a technology which combines theory and practice.
The general type of secret sharing is threshold method, generally with
The basic method of secret sharing contains Shamir secret sharing method, Blakley secret sharing method, Karin-Greene-Hellman secret sharing method and so on.
4. System Architecture
In this section, we present the system model, threat model, and design goal.
4.1. System Model
The architecture of people-centric sensing system is demonstrated in Figure 1.

System model.
Then, we give the components of people-centric sensing system.
The mobile nodes (MNs) are all kinds of devices with sensing the environment and reporting the sensed data, such as PDAs, smartphones, and laptops. And these MNs are carried by person or vehicles. Each MN has a unique ID in this system. The aggregation servers (ASs) provide network access services for MNs. Each AS is in charge of a certain region referred to as an area. Each MN in an area should send its sensed data to the AS which is in charge of the area. The AS should support a wide range of statistical additive and nonadditive aggregation functions such as Sum, Subtraction, Average, Count, Max/Min and Median. The server should realize three main functions. The first one is that the server manages MNs' identity-related information and stores the fake IDs table for K-anonymity. The second one is that the server is responsible for registering MNs that wish to participate. The third one is that the server stores the data that was computed by the ASs. It also provides services for the applications based on the data.
Without loss of generality, we give the assumption of PDA. We assume that all the mobile nodes can access to an AS. We assume that all ASs can access to the server. That is to say, each MN can access to the server through ASs. We assume that MNs report the data to the ASs when they achieve a request from the server. The server sends the request which specify what sensors to report and when to sense.
4.2. Threat Model
In people-centric sensing networks, the data which is received from MNs unavoidably relates to a lot of personal information. If the information is known by adversary, the user could suffer many troubles. In this section, we will analyze the threat mode. As the procedure of data aggregator is the most likely to be attacked by adversary, we are mainly concerned with the threat of this procedure. We separate the threats into two classes: the internal threat and the external threat.
(1) The Internal Threat. The main threat of internal attackers is that the ASs and MNs could be malicious. That is to say, the ASs and MNs could be compromised and controlled by adversaries. This model is rational in that the number of ASs and MNs is enormous and the ASs and MNs are owned by third-parties. Therefore, it is hard to guarantee that the entire ASs and MNs are reliable. The malicious ASs and MNs could reveal the privacy easily if we take no measures.
(2) The External Threat. As the internet is an open channel, the communication link is unsafe and unreliable. The external threat mainly contains two parts.
Part 1. The adversary may eavesdrop on communications between MNs and AS. Part 2. The adversary may attempt to insert some false data to the procedure of data aggregation. That is, the adversary may damage the data integrity.
4.3. Design Goal
Our design goal in our paper is satisfying the following requirements.
As the aggregation server and nodes could be compromised and controlled by adversaries, we should prevent the sensed data of an individual node from being disclosed to the aggregation server and nodes. As the communication link is unsafe and unreliable, we should prevent the sensed data being leaked. Moreover, we must guarantee the integrity of sensed data. As the nodes could become offline or a message could be lost before reaching the aggregation server, we should ensure that these problems could not affect the correctness of the aggregate computed.
5. Scheme Design
In this section, we describe our PDA scheme in detail. As we focus on the privacy-preserving robust data aggregation of people-centric sensing networks, we should support a wide range of statistical additive and nonadditive aggregation functions safely. The additive and nonadditive aggregation functions include Sum, Subtraction, Average, Count, Max/Min and Median. We will use Sum as an example to declare the scheme's realization. That is, we should prevent the sensed data of an individual node from being disclosed to the AS and other MNs during the process of computing. Moreover, we should be robust to node or communication failure. We could get the issue model as follow.
Issue Model. There are n MNs
In order to solve the issue, we present a novel scheme, PDA. The whole idea of scheme is shown in Figure 2. In order to understand the scheme intuitional, we will introduce an instance. In the instance, Alice and Bob have sensed data for computing. We assume

The idea of PDA.
There are five steps to realize our scheme.
5.1. Service Registration
Before the aggregation process of data, each MN needs to register for the service at our scheme. The service registration contains initialization and MN's registration.
During the initialization, the server should periodically generate fake IDs for K-anonymity and store them in a fake ID pool. The fake IDs can be generated using a cryptographic hash function, such as SHA-1. The equation is given as follows:
During MN's registration, the MN should send a requested message to the server. After receiving the request, the server verifies if the MN is legal or not. After authentication, the server should pick k fake IDs from its fake ID pool. And one of them is used to replace the identity of the MN. Let this fake ID be
5.2. Authentication
As the feature of people-centric sensing network, the MNs are always moving from an AS to another one. To let the AS authenticate their identity, the MNs should send an authentication request to the AS. The request contains the MN's ID table. After receiving the authentication request, the AS forwards this request to the server. Upon receiving the message, the MS should verify the ID table's legality. If the verification succeeds, it sends a reply to the AS. On the reception of the reply, the AS sends an OK message to the MN which contains some parameters. These parameters contain a prime number P for homomorphic encryption, a prime number Q, and a random number m for secret sharing. The meaning of these parameters will be recommended in later subsection. After authentication, the communication link is established between the MN and the AS. Then the MNs could send the sensed data to the AS after receiving a task.
In our instance, we take
5.3. Homomorphic Encryption
After the MNs receive a task from the server, they should send the sensed data to the AS. Before sending data, the MNs will execute the data processing, homomorphic encryption. In this paper, the important point is proposing PDA, so we will use a simple algorithm to achieve homomorphic encryption. The implementation process is shown as follows. A large prime P getting from the AS is used as the key. a and b are sensed data. The encryption algorithm is
In our instance, Alice and Bob use homomorphic encryption algorithm to encrypt
5.4. Secret Sharing
As the MNs could become offline or a message could be lost before reaching the AS, we should ensure that these problems could not affect the correctness of the aggregate computed. In our paper, we could achieve this goal though secret sharing.
The general type of secret sharing is threshold method, generally with

The principle of Shamir secret sharing.
The procedure of the algorithm is (assuming the secret is K) as follows.
Selecting a Polynomial. At first, we should select a polynomial
Getting the Variables. Then, we should select n groups data
Computing the Point. At last, we should compute the point
Therefore, we split the secret K to
After homomorphic encryption, the MN will operate the data which was encrypted by homomorphic encryption through secret sharing. That is, the MN will slice the data to t points. Then, the MN will send these points to t third-parties. At the same time, the MN should pick an ID from its fake ID table and send it to the third-parties.
In our instance, Alice and Bob will perform secret sharing operation for the ciphertext. We assume that the polynomial
5.5. AS Operation
After the third-parties receive the points and the ID, they should forward the information to the AS. Upon receiving the points and the ID, the AS should take three steps to get the sum of the sensed data.
Step 1 (recovering the data).
The AS picks the points which have the same ID as a group. And it recovers the data for each group based on secret sharing. The procedure of recovering is as follow. At first, the AS should pick m points and select a polynomial
Step 2 (computing the sum of data).
After recovering the data, we could compute the sum of data. In our instance, the sum of data is
Step 3 (getting the result).
As we conduct the data using homomorphic encryption, the data after computing is not the final result. In order to get the result, we should take the process of decryption which is using P modulo of the data. In our instance, the result is
Through the above analysis, we compute the sum of sensed data during the procedure of aggregation. Similarly, we could support a wide range of statistical additive and nonadditive aggregation functions [12].
6. Evaluation
In this section, we evaluate the security and performance of PDA.
6.1. Security Evaluation
In this part, we analyze the security of our scheme based on the threat model discussed in Section 3. We assume that the threats contain that (1) the ASs could be compromised by adversaries; (2) the MNs could be malicious; (3) the adversary could eavesdrop all of the communications in the network; (4) the adversary could attempt to insert some false data to the procedure of data aggregation.
For ease of threat, we explain the security features of our scheme.
The ASs could be compromised by adversaries. In PDA, the AS only gets the sensed data which has been encrypted by homomorphic encryption algorithm. As the AS cannot get the parameter r, it could not decrypt the data. Moreover, the AS cannot get the MNs which own the sensed data real ID as fake IDs. Therefore, the attacker obtains no useful information from the AS to compromise a single node's privacy. The MNs could be malicious. In PDA, the MN acts as one of third-part in the procedure of data aggregation. Each MN can only get one part of the other MNs' sensed data based on secret sharing. And fewer than m players cannot reconstruct the data. Moreover the sensed data is encrypted by homomorphic encryption algorithm. Therefore, we can compromise the privacy of a non-captured node leveraging the information. The adversary could eavesdrop all of the communications in the network. In PDA, each link between AS and MN transfers only one part of sensed data based on secret sharing. And fewer than m players cannot reconstruct the data. If the MN eavesdropped all of the communications in the network, it can reconstruct the sensed data. However, the sensed data is encrypted by homomorphic encryption algorithm. As the adversary cannot get the parameter r, it could not decrypt the data. Therefore, the adversary obtains no useful information to compromise a single node's privacy. The adversary could attempt to insert some false data to the procedure of data aggregation. In PDA, each sensed data is attached with the owner's ID. And the ID is picked from the MN's fake ID table randomly. As the adversary cannot get the fake ID table, he does not know the valid ID. The false data which was sent by the adversary is not accepted by the AS. Therefore, the adversary cannot damage the data integrity.
Through the above analysis, PDA is able to meet the security requirements of privacy-preserving data aggregation.
6.2. Performance Evaluation
In this section, we focus on the efficiency of PDA from robustness, computational complexity, and communication complexity.
6.2.1. Robustness
As the MNs could become offline or a message could be lost before reaching the AS, we should ensure that these problems could not affect the correctness of the aggregate computed. In our paper, we could achieve this goal though secret sharing. As the AS get a share from n third-parties, it can reconstruct the data but no group of fewer than
6.2.2. Computational Complexity
For the people-centric sensing system, it is assumed that there are n MNs for computing. In order to compute all of the MNs' data, we should compute each two MNs' data. Therefore, the computing complexity of this process is
6.2.3. Communication Complexity
The primary aspect to measure the performance of a scheme is its computational complexity. But, for the problem of privacy-preserving robust data aggregation, computational complexity cannot fully describe the performance of a scheme. As the participants and third-parties will communicate each other, the communication complexity is also an important aspect to measure the performance of PDA.
The communication complexity contains the cost of secret sharing and the third-party operation. It is assumed that the number of MNs is m and the number of third-parties is n. As the MNs will send the secret input data to the third-parties and the third-parties will send the result to the AS, the communication complexity of PDA is
6.3. Comparison
In this section, we summarize the features of our proposal compared with other mainly relevant algorithms in Table 2.
Comparing the features with other schemes.
The feature additive and nonadditive aggregation indicates if the scheme support a wide range of statistical additive and nonadditive aggregation functions such as Sum, Subtraction, Average, Count, Max/Min, and Median. In columns 2, 3, and 4, these features indicate if the scheme protects privacy against outside eavesdropper, other MNs, or the AS. The features of node failure and data loss resilience refer to whether the AS can compute the correct aggregate when a few MNs become offline or a few messages could be lost. In columns 6 and 7, these features denote the scheme's computational and communication complexity. The last feature indicates if the scheme is suitable for the people-centric sensing networks.
Through the above analysis, our scheme is secure, scalable, and resilient to node failure and data loss.
7. Conclusion
We present PDA, a novel privacy-preserving robust data aggregation scheme in people-centric sensing system. Based on K-anonymity, homomorphic encryption, and secret sharing, PDA can support a wide range of statistical additive and nonadditive aggregation functions such as sum, subtraction, average, count, max/min and median without leaking individual sensed data. Moreover, PDA is robust to node failure and data loss. We also evaluate the efficacy and efficiency of PDA. The result shows that our scheme can achieve the security and robust goal under a reasonable cost. We believe a privacy-aware system will make people-centric sensing networks more acceptable.
Footnotes
Acknowledgments
The work described in this paper is partially supported by the Grants of the National Basic Research Program of China (973 project) under Grants nos. 2009CB320503, 2012CB315906; the project of National Science Foundation of China under Grants nos. 61070199, 61103189, 61103194, 61103182, 61202488, and 61272482; the National High Technology Research and Development Program of China (863 Program) nos. 2011AA01A103, 2012AA01A506, and 2013AA013505; the Research Fund for the Doctoral Program of Higher Education of China under Grants nos. 20114307110006, 20124307120032; the program for Changjiang Scholars and Innovative Research Team in University (no. IRT1012); Science and Technology Innovative Research Team in Higher Educational Institutions of Hunan Province (network technology); and Hunan Province Natural Science Foundation of China (11JJ7003).
