Abstract
Spatial crowdsourcing is an emerging outsourcing platform that allocates spatio-temporal tasks to a set of workers. Then, the worker moves to the specified locations to perform the tasks. However, it usually demands workers to upload their location information to the spatial crowdsourcing server, which unavoidably attracts attention to the privacy-preserving of the workers’ locations. In this article, we propose a novel framework that can protect the location privacy of the workers and the requesters when assigning tasks to workers. Our scheme is based on mathematical transformation to the location while providing privacy protection to workers and requesters. Moreover, to further preserve the relative location between workers, we generate a certain amount of noise to interfere the spatial crowdsourcing server. Experimental results on real-world data sets show the effectiveness and efficiency of our proposed framework.
Introduction
Recent advances in large-scale utilization of mobile Internet devices and pervasive computing technology result in the emergence of crowdsourcing. 1 Traditionally, complex and complicated tasks can only be performed by an employ or contractor. However, in crowdsourcing system, tasks will be executed efficiently and accurately by a large crowd of individuals. In that case, spatial crowdsourcing (SC) 2 is emerging as a transformative platform that distributes spatio-temporal tasks to a set of specified workers. As far as we concerned, it has been extensively applied in environmental sensing, recording video, urban planning, and other particular scenes.
SC usually consists of workers, task requesters, and a centralized spatial crowdsourcing server (SC server). First of all, workers and task requesters need to follow the rules to register on the SC server. After that, the requesters are allowed to publish a series of tasks to the SC server, and the SC server sends tasks to workers who complete tasks at the lowest expense. Obviously, the workers are authorized to determine whether they finish the nearest task around them or not. According to the taxonomy in Kazemi and Shahabi, 2 SC is categorized into two totally different types of SC named as worker-selected tasks (WST) mode and sever-assigned tasks (SAT) mode. In WST mode, the SC server publishes the spatial tasks online and workers arbitrarily choose tasks near them without the need to send their specific locations to the server, that is to say, workers do not reveal their positions to the SC server in this mode. Unfortunately, it is rather complicated for us to obtain the optimal task assignment strategies in that workers do not have a global system view. On the contrary, in SAT mode, the SC server assigns location-based tasks to the nearest worker according to the specific location of workers, so we find that it necessary to send their surroundings and positions to server in advance. In addition, although the optimal task allocation scheme can be achieved easily, we are obliged to expose the private information to the SC server inevitably. Suppose the following reasonable, but terrible situation: in order to make a considerable reward, the SC server may trade personal privacy with other agencies. Then, these agencies may make the best use of the information to take unfavorable actions against workers, such as physical surveillance, identity theft, and breach of sensitive information. Therefore, the SC server should not be completely trusted in some degree. It is imperative to take corresponding measures to protect the privacy of the workers and then a large number of workers are willing to participate in crowdsourcing system eventually.
The hot issue about location privacy for location-based service has been extensively studied in the literature.3–11 Kido et al. 3 designed some interfering locations without revealing the precise location of each worker, then they send the wrapped and unrealistic location to service provider. Some existing works adopt novel cryptographic techniques called Private Information Retrieval (PIR) 4 to protect workers’ privacy. To provide location-based anonymous services, previous works develop a novel algorithm based on the spatial and temporal protection. In the literature,5–8k-anonymity focuses on the problem that the user’s location cannot be distinguished from other k users. Except that, a trusted third party named location anonymizer is introduced in k-anonymous technology, in which it is permitted to blur the location to cloaked spatial areas as well. Literature9–11 makes a further improvement in security of their proposed model, they developed an emerging mechanism of peer-to-peer anonymization. Aforementioned schemes about location privacy are not applicable to the SC due to the complex application scenarios in SC.
In recent works, differential privacy was widely used to protect workers’ position privacy in To et al. 12 However, it merely protected the location privacy of the workers, but leaked the requestors’ location. In the work of Shen et al., 13 they keep the premise that workers have to know the exact location of the requester in advance and then they adopt additive homomorphic encryption to protect the workers’ location privacy. Certainly, Shen et al. 13 create a privacy threat because there is no trust relationship between requesters and workers.
In order to solve the problem of location privacy protection in SC, we propose an effective and efficient solution. In our article, we use two-dimensional coordinates to represent the latitude and longitude of the worker. We introduce a privacy service provider (PSP), 13 which is also a trusted third party. The workers and the task requesters first transform their location coordinates locally and ensure that the accuracy of the assignment has not affected after the transformation. Then, the task requesters send the transformed coordinates directly to the SC server. Also, workers send the transformed location to the PSP, then the PSP adds some noise to these locations from the workers. Next, PSP publishes them to the SC server along with the noise. The responsibility of server is to calculate the distance between the task coordinate and the uploaded coordinates by workers, finally sends it back to the PSP. The PSP informs the worker to complete the task once it finds the worker closest to the task. Clearly, it may satisfy the premise that the task is within the maximum travel distance (MTD) of the worker. If the worker agrees to perform this task, he or she can contact the requester directly through the communication channel between worker and requester. 12 The workers and requesters do not understand the coordinates how to transformed, let alone SC server and PSP. In this way, the SC server is not able to infer the precise location of the workers and requesters. Furthermore, the real location of each worker is also not exposed to requesters and other workers. The main contributions of this article are as follows:
We identify the specific privacy challenges of task assignment in SC system, and we propose a novel framework to protect the location privacy of workers and task requesters, which can be a footstone for future studies in this area.
We introduce a secure task allocation algorithm that preserves workers’ privacy information, and we prove its security from the theoretical analysis.
We conduct extensive experiments on real-world data sets to show the effectiveness and efficiency of our designed framework.
The remainder of this article is organized as follows. We illustrate the privacy framework in section “Privacy framework.” Section “Experimental evaluation” describes the experimental evaluation, and related work and conclusion are presented in sections “Related work” and “Conclusion,” respectively.
Privacy framework
System model
We study the problem of location privacy protection during task assignment in the SAT mode. As shown in Figure 1, our system architecture consists of four parts: requesters, workers, PSP, and the SC server. The requester submits the task to the SC server, and the server then assigns it to workers who are close to the task. In the whole process, the exact location of workers and requesters cannot be disclosed to the SC server. Because the SC server may leak the location privacy of the workers and requesters, particularly as there may be many entities with different backgrounds, such as private companies, government agencies, and academic institutions. In addition, our users have signed a service agreement with the PSP, so there has established a trust relationship between users and PSP. Meanwhile, we assume that the PSP would not collude with the SC server.

Privacy framework for spatial crowdsourcing.
The requesters do not send the task location directly to the SC server when publishing a task (Step 1), instead, they transform the location locally through application on the mobile devices before that. The location of all workers has been changed in the same way, then the workers send the changed location to the PSP first (Step 2). The PSP sends these locations to the SC server after further processing. Therefore, the data arriving at the server is not the real coordinate location of the worker and the requester, but it does not affect the optimal allocation of the task. The SC server computes the distance between the task and all the workers based on these coordinates and then sends these distances back to the PSP (Step 3). The PSP finds the workers who are closest to the task and informs him or her to complete the task. If the worker agrees to perform this task, he sends the consent message directly to the requester through the communication channel between worker and requester (Step 4). Upon receipt of the consent message, the requesters send the task location directly to the worker. Finally, the workers move to the mission location to perform the task.
Secure task assignment
We first transform the location of the worker and the requester when the requester releases a task. For the location (
where a, b, and c are the three variables, and their values are randomly assigned while a is non-zero. The three variables are shared between the worker and the requester’s applications, and they will not be disclosed to any other party including the workers and requesters. Meanwhile, in order to enhance the uncertainty of the relative location between the workers, we exchange the vertical and horizontal coordinates of the workers and requesters. The MTD of each worker is also changed into a times of the original value. Each worker and requester perform above operations locally with application on their mobile devices. The distance from the worker to the task after location transformation is expressed by dis
It can be seen that the distance from the worker to the task is a times of the original. Therefore, the alteration of the locations has no influence on the optimal allocation of tasks. After that, the requesters send the FROM_REQUESTER message including task id and location to the SC server, and the workers send the FROM_WORKER message containing worker id, the location, and MTD to the PSP.
Workers can send the FROM_WORKER message to the PSP in two ways: proactive and on-demand. In proactive mode, local applications on mobile devices regularly update the values of a, b, and c, then the workers perform location transformation and send them to the PSP. Compared with the on-demand mode, this mode possesses a faster response. The communication overhead is related to the upload period time in this mode.
In on-demand mode, the local application updates the values of a, b, and c only when the requester submits the task, then it immediately enforces above operations and sends message to the PSP. The communication cost in this mode is associated with the number of workers and noises, and it raises the average task assignment time at the same time.
Upon receiving the FROM_WORKER message, the PSP stores it in a data structure called an HashMap, that is, the location as the key and the collection containing the sequence (id, MTD) as the corresponding value. We set the value in the HashMap as a collection since a coordinate may corresponds to multiple workers. Then, the PSP produces a certain amount of noise locations and sends a FROM_PSP message to the SC server. Note that the FROM_PSP message only contains the location of the workers and the noise locations, and they are mixed together. The SC server computes the distance from the task location to all the coordinates that come from the FROM_PSP message and then sends the FROM_SC-SERVER message containing the original location and the corresponding distance to the PSP. Afterward, the PSP finds the location with the smallest distance according to FROM_SC-SERVER messages.
As shown in Algorithm 1, the PSP obtains the sequence ((
The HashMap in line 4 saves the sequence ((x, y),(id,MTD)) to improve search efficiency. In line 10, we only calculate the square of the distance in order to reduce the computational overhead. The SC server does not know which worker performs the specified task, which effectively protects the worker’s location privacy. Obviously, our algorithm supports multi-task allocation as long as each message contains the task id. In the whole process above, it can be seen that the task can be accurately assigned to the most suitable worker without revealing the real location of the workers and requesters to the SC server.
Privacy security analysis
We mainly consider the privacy issues of requesters and workers during task assignment. The privacy issue after the worker accepts the task is outside our scope. Our crowdsourcing model consists of four main parts: requesters, workers, PSP, and SC server. We assume that the PSP has signed a service agreement with the worker, so a trust relationship is established between the PSP and the worker. However, the SC server is an untrustworthy entity because it can be associated with a wide variety of organizations. The location of the workers and requesters was mathematically transformed by the mobile application before reaching the SC server, so the SC server did not know their exact location. We assume that the mobile application vendor is also a trusted entity and will not collude with the server and PSP. Therefore, the values of a, b, and c are known only to the mobile application vendor and will not be exposed to the server. However, the PSP adds noise to the worker’s location to prevent the SC server from inferring the exact location from the worker’s relative position. Different levels of privacy protection can be achieved by controlling the amount of noise added. We discuss the relationship between the amount of added noise and the level of privacy protection in detail, and theoretically illustrate the security of our proposed solution in the experimental section.
Experimental evaluation
We conducted a series of experiments to evaluate the performance of the proposed scheme for location privacy protection of workers and requesters. Section “Experimental methodology” shows the experimental methodology, followed by experimental results in section “Experimental results.”
Experimental methodology
Experiments were run with two real-world data sets: Gowalla and Yelp. The Gowalla data set is a location-based social network data set that contains the check-in history of users. Suppose that Gowalla users are the SC workers and their locations are the most recent check-in points. We also regard these previous check-in points of users as the location of tasks. The Yelp data set comes from the user’s reviews on local business, such as restaurants. We consider Yelp users as the workers of the SC system and their check-in as the locations. The user’s review on restaurant represents his or her reception of an SC task, and the task location is the position of restaurant.
We suppose that the maximum travel distance for each worker is the same in our experiments. By default, we set MTD to 8 km for the Gowalla data set and MTD to 2 km for the Yelp data set. The experimental noise data are randomly selected from the locations that are not used in the two data sets. In our article, we generate noise in a linear proportion to the number of workers. It is assumed that the number of workers is W and the interference coefficient is k, and the linear interference function is
Definition 1
If the total number of workers is W and the linear interference coefficient is k, the level of privacy protection is measured by
A larger value of U corresponds to a higher level of privacy protection. The relationship between the level of privacy protection and the linear interference coefficient is shown in Figure 2. We can see that the greater the k, the higher the level of privacy protection, and it increases to more than 0.9 when k = 3. We believe that when U is greater than 0.9, the worker’s location privacy is effectively protected. Therefore, our scheme is able to protect location privacy, it hardly demands huge expenditure of noise as well. We run each experiment 10 times, then record the average runtime and the average communication overhead.

The level of privacy protection.
Experimental results
Performance evaluation
We compare the performance of proactive mode to on-demand mode on Gowalla data set and Yelp data set, respectively. The number of workers is varied from 200 to 1000 and the number of tasks is set to 10. We set the interference coefficient k to 0, 3, 4, and 5 in each figure. It can be seen in Figure 3 that as the number of workers increases, the running time increases slowly, especially in proactive mode. Moreover, the running time is also close when k ranges from 3 to 5 whether in on-demand mode or proactive mode, and it is within an acceptable range compared to the running time at k = 0, which shows that the addition of a certain amount of noise does not significantly affect the efficiency of the SC system. It also can be seen that the performance in proactive mode is significantly better than the on-demand mode by comparing Figure 3(a) with Figure 3(b)–(d). It is mainly because that in proactive mode, the worker already sends the FROM_WORKER message to the PSP before the release of the task, which saves a lot of time for the task assignment. It takes only several seconds to assign 10 tasks when the number of workers is 1000, which proves that our proposed framework is efficient.

Runtime comparison: (a) on-demand mode, Gowalla; (b) proactive mode, Gowalla; (c) on-demand mode, Yelp; and (d) proactive mode, Yelp.
Communication cost evaluation
We evaluate the communication overhead in both the proactive and on-demand modes. We record the total size of the sent data in the task assignment process. For simplicity, we only show the experimental results on the Gowalla data set. Figure 4(a) and (b) shows the relationship between the communication overhead and the number of workers. It goes without saying that there is a approximately linear correlation between the communication overhead and the number of workers in both two modes. Furthermore, the communication overhead in the proactive mode is obviously lower than the on-demand mode, because the period time of worker to send message to the PSP is long, so the communication cost is reduced eventually. In the proactive mode, the communication cost is related not only to the number of workers but also to the sending cycle. In addition, the communication cost also changes with different values of the coefficient k. In our scheme, the communication expenditure can be controlled within a certain range when the value of k is not extremely large. For 1000 workers, it costs less than 2600 KB to assign 10 tasks in the on-demand mode, less than 2000 KB in the proactive mode. The experimental results show that the communication expenditure of our framework is acceptable.

Communication cost: (a) on-demand mode, Gowalla and (b) proactive mode, Gowalla.
Related work
Crowdsourcing has been investigated in both the research community 14 and the enterprise, whereas SC aroused concerns2,15,16 only in recent years. Previous discussion on the problem of privacy-preserving mainly concentrates on location privacy in location-based services, and the precise locations are concealed by obfuscation17,18 or aggregation. 19 But they cannot be applied to SC directly.
Several works about the issue of location privacy in SC have been proposed. The work in To et al. 12 introduces an analytical model for task assignment that explores fundamental trade-off among the three criteria—privacy, utility, and system expense—and it adopts differential privacy to sanitize the location of workers. The paper by To et al. 20 provides privacy guarantees to workers based on geocasting and differential privacy. The method in Gong et al. 21 proposes a scheme that can achieve differential privacy for the location of the workers when allocating tasks to the works. The approach in Wang et al. 22 generates a novel framework to identify and eliminate risks in such systems, and has both practical and academic value. However, none of them consider how to preserve the privacy of the task requesters.
The research in Shen et al. 13 and Liu et al. 23 resolves the issue of privacy-preserving in SC using homomorphic encryption. The former shows a secure task assignment protocol based on a semi-honest third party and additive homomorphic encryption. The latter introduces a dual-server design and proposes a novel secure indexing technique called SKD-tree to improve system efficiency. Despite this, the performance of their SC framework is still undermined owing to the complexity of homomorphic encryption. Finally, the recent work in Zhu et al. 24 proposes a distributed spatial clustering algorithm to avoid revealing the location privacy of the workers to the requester and the SC server. However, the task assignment results of the solution are not optimal, and the author assumes all the workers are reliable and honest, which is not realistic in practice.
Conclusion
In this article, we introduced a novel privacy-aware scheme which allocates tasks to the workers without violating the location privacy of the workers and requesters. We alter the location information of the workers and the requesters in the same way before they are uploaded to the SC server or the PSP. In addition, we produce noise with a linear proportion to the number of workers to further enhance the security of the relative location between workers. We proposed the proactive and on-demand modes to adapt to different circumstances of the SC system. Our experiments on real-world data sets demonstrated the effectiveness and efficiency of the proposed framework from running time and communication overhead.
Footnotes
Handling Editor: Wei Li
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was partially supported by the National Natural Science Foundation of China (NSFC) (under grant nos 61772491 and U1709217), Natural Science Foundation of Jiangsu Province (under grant no. BK20161256), and Anhui Initiative in Quantum Information Technologies (AHY150300).
