Abstract
Mobile crowd-sensing is a prospective paradigm especially for intelligent mobile terminals, which collects ubiquitous data efficiently in metropolis. The existing crowd-sensing schemes based on intelligent terminals mainly consider the current trajectory of the participants, and the quality highly depends on the spatial-temporal coverage which is easily weakened by the mobility of participants. Nowadays, public transports are widely used and affordable in many cities around the globe. Public transports embedded with substantial sensors act as participants in crowd-sensing, but different from the intelligent terminals, the trajectory of public transports is schedulable and predictable, which sheds an opportunity to achieve high-quality crowd-sensing. Therefore, based on the predictable trajectory of public transports, we design a novel system model and formulate the selection of public transports as an optimization problem to maximize the spatial–temporal coverage. After proving the public transport selection is non-deterministic polynomial-time hardness, an approximation algorithm is proposed and the coverage is close to 1. We evaluate the proposed algorithm with samples of real T-Drive trajectory data set. The results show that our algorithm achieves a near optimal coverage and outperforms existing algorithms.
Keywords
Introduction
With the rapid advance of sensor technology, communication, and mobile computing, mobile crowd-sensing 1 has become a paradigm attracting much attention for collecting environmental information and distributing to the general public. With the help of mobile crowd-sensing, the cost of data collection and dissemination over wide range of area can be significantly reduced. Intelligent terminals in different places can easily collect ubiquitous data and share it with potential users in neighborhood.2,3 Equipped with various onboard sensors such as GPS, video cameras, gas sensors, and communication modules, vehicles are considered as intelligent terminals and become powerful participants in data collection. Then public services such as traffic monitoring4,5 environment monitoring 6 and urban Wi-Fi characterization, 7 etc. are greatly facilitated.
A vehicle-based mobile crowd-sensing system is typically composed of two parts: 8 cloud management platform (CMP) and vehicles embedded with various sensors. An example is shown in Figure 1. The CMP is responsible for selecting a set of vehicles to carry out crowd-sensing tasks and to process data dissemination. The vehicles can be considered as sensing nodes distributed in the city area. In general, it is important to decide which public transports (PTs) to participate in collaborative sensing. Because multiple vehicles may introduce redundancy since only one in a segment is sufficient to conduct the task. Furthermore, modern crowd-sensing application is not only initiated by data centers deployed for large companies, like Google, Alibaba, and Facebook, but also for individuals. It is affordable for large companies to select all participants to carry out task, but the small ones are not able to pay for this. Therefore, the budget of CMP needs to be limited2,9,10 and it should constrain the number of selected crowd-sensing participants.

An example of vehicle-based crowd-sensing application.
The quality of vehicle-based crowd-sensing is easily influenced by the changing trajectory of vehicles. 11 On one hand, if there is no vehicle operating in a specific region at one time, the collected data will be discrete in time. On the other hand, in different time periods, if a region is covered utmost for once, which means the data will be discrete in time. Therefore, the performance of crowd-sensing is sensitive to space and time; then the spatial–temporal coverage (STC) is regarded as a fundamental indicator of the vehicle-based mobile crowd-sensing. 1 Specifically, STC intends to cover as many regions as possible and make sure one region is covered at least once for a period of time. In reality, we are supposed to be aware that the STC is easily weakened by the trajectory of vehicles as they move randomly. However, PTs strictly follow a schedule and route; the trajectory of PTs is predictable in spite of the highly dynamic mobility. Considering the future trajectory of PTs, the performance can be effectively improved, which is different from smartphone-based crowd-sensing only considering current trajectory. 12
In this article, we investigate how to achieve an improved crowd-sensing by PTs with the predictable trajectory and limited budget of CMP. Analyzing the relationship between STC and the predictable trajectory of PTs, we establish a novel system model by jointly considering the current and future trajectory of PTs and propose an algorithm to select PTs to carry out crowd-sensing tasks. Furthermore, we prove the selection of public transport (SPT) problem is non-deterministic polynomial-time hardness (NP-hard) and the proposed algorithm can achieve a performance guarantee not less than
This article is organized as follows. Section “Related works” reviews the related work. Section “System model and problem formulation” introduces the system model and formulates the SPTs as an optimization problem. In section “Solution to the SPTs,” we propose a novel algorithm to solve the selection problem of PTs and analyze the performance guarantee of this algorithm. Performance evaluation and analysis are provided in section “Validations.” Finally, section “Conclusion” draws the conclusion of this article.
Related works
In recent years, many researchers focus on studying the vehicular application of crowd-sensing, for example, traffic accident evidence collection,5,13 city block monitoring, 14 bike-net for cyclist experience mapping, 15 and architectures of crowd-sensing. Authors of the literature10,16–18 proposed the participants recruitment system and formulated the recruitment of participants as a constrained coverage problem but did not consider the mobility of vehicle. Authors in Lee et al. 14 constructed a surveillance system based on vehicle with constraint network bandwidth. Gerla et al. 16 introduced a crowd-sensing service based on vehicle embedded with cameras to deliver images on demand to users. Han et al. 19 proposed an incentive mechanism for participant recruitment who interacted with a task requestor in a random order for maximizing the values of finished task. Authors of Han et al. 20 and Kang et al. 21 studied location-based crowd-sensing systems and mainly considered both spatial and temporal coverage based on current location of participants. However, these crowd sensing systems assume that the task initiators are capable of selecting all participants to perform the task and the trajectory of the participants is known. In practice, these assumptions only apply to the scenarios with small number of participants, unlimited budget of the initiators, and immovable participants. Different from the problems above, we make a step forward: not only considering the current and future trajectory of candidate, but also highlighting the limited budget. Then we establish a novel system model and formulate the SPTs as an optimization problem solved by a performance guarantee approximation algorithm.
System model and problem formulation
System model
We divide a target region R into a series of small segments. Let R denotes the set of small segments,
where the size of
In practice, because nearby PTs usually upload overlapped information which brings in redundancy, so we do not anticipate all the PTs are involved in crowd-sensing. In order to limit the number of involved PTs, we assume PTs need to be paid a sensing reward (SR) from CMP.2,9,10 Next, we define the SR.
Definition 1
SR. A PT is selected to collect data and usually granted a reward correspondingly. Let
With limited budget of CMP, not all PTs participate in crowd-sensing. We adopt an indication vector
where
As mentioned above, the quality of crowd-sensing is related to STC. Next, we define the notion of spatial–temporal coverage (STC).
Definition 2
STC determines the quality of crowd-sensing. Formally, it can be defined as
The union operator in equation (5) is used for obtaining the union set of PTs covering all the target regions at the same period of time and the sums operator in equation (5) is used for calculating the total number of all the target regions covered in all periods of time. These two steps are very important; it is helpful to avoid selecting a set of PTs with the same trajectory.
In Kang et al.,
21
we have designed a specific hardware system, which can collect various information such as temperature, humidity, air quality, flow of traffic, longitude, latitude. Based on the hardware system, we give an example to explain the STC. In Figure 2, if users request to collect traffic information in region R which is divided into a serial of segments as, R = {AB, AD, BC, BE, DE, EF, EH, DH, CF}. The scheduled trajectory of {Bus1, Bus2, Bus3, Bus4} is {BC, AB, AD, DE}, {BC, BE, EH}, {EH, HD, AD, AB, BE}, {EF, BE, AB, AD, DH}, respectively. In a period of time {

An example explains the notion of spatial–temporal coverage.
If the CMP is capable of selecting two PTs to collect information. Two following cases are considered:
It can be seen that the set of {Bus1, Bus2} covers five different places in space and the segment of {BC} is covered three times, so the total STC is 7. On the contrary, the set of {Bus3, Bus4} only covers four different segments in space and the total STC is 5. Therefore, we are more willing to select {Bus1, Bus2} to sense.
Problem statement
Based on the system model, we are ready to formalize the SPTs as a optimization problem for maximizing the STC with limited budget.
Definition 3
SPT problem is to determine a set of vehicle under the budget constraint
Actually, the collected data may have varying importance degrees at different segments and times, for example, we are more interested in hotspot with high traffic flow. Therefore, we introduce priority power (PP) to indicate a PT with a higher priority which is more likely to be selected to perform crowd-sensing. Analyzing historical data, it is easy to acquire traffic performance index (TPI)
24
which is the congestion level of each segment. Let
Definition 4
PP is the priority of a vehicle to be selected to sense, which is a function of
Therefore, PP is expressed as
With the PP, the STC can be redefined as
and the SPTs can be rewritten as
Traditionally, there is supposed to be an optimal solution for SPTs with time efficiency. Unfortunately, it turns to be a NP-hard problem, even though the trajectory of PTs is predictable. In the next section, we will prove SPTs is NP-hard and propose an improved approximation algorithm based on greedy algorithm.
Solution to the SPTs
Complexity analysis of SPTs
Theorem 1
The SPTs is NP-hard, even though the trajectories of all vehicles are predictable.
Proof
To prove SPTs problem is NP-hard, we should demonstrate it falls in NP first. Assuming there is a possible solution
To prove SPTs are NP-hard further, we can make a reduction from budgeted maximum coverage problem (MCP), which is a known NP-complete 25 to SPTs. The MCP is defined as follows.
Given a collection of sets
where
Approximate algorithm to solve SPTs
As SPTs have been proved to be NP-hard, it becomes computationally impracticable to select an optimal set of PTs when the total number PTs is large. As for a metropolis like the City of Beijing, the number of PTs under operations is over 30,000 per day by the end of 2016. 24 To achieve a desired computational efficiency, we propose an approximate algorithm called efficient combination query algorithm (ECQA) to solve SPTs. The ECQA adopts a greedy strategy to solve SPTs. The greedy policy is to select one PT with most reward efficiency (RE), until the total SR exceeds the limited budget of CMP. Next we define the RE.
Definition 5
RE is the marginal STC gained per unit SR. Mathematically, the RE for selecting PT i is computed as follows
The algorithm goes for many rounds. A PT with maximum RE is selected in each round. In equation (14), where
The pseudo-code of ECQA 1.
The ECQA has a performance guarantee
Theorem 2
The ECQA can achieve a worst performance guarantee to be
where Opt is optimum solution.
Proof
Let us redefine
where
where
where
Assuming
From equations (19) and (20), the following inequality can be hold
where e is less than 3, hence
if and only if
proves the theorem.
Validations
Extensive simulations have been conducted to evaluate the performance of our proposed algorithm. The traffic trace data set we used, the simulation setup, the compared algorithms, and the performance comparison and discussion are presented as follows.
Real traffic track used and simulation setup
In our simulation, we adopt the T-Drive trajectory data set27,28 which contains 1-week trajectory of 10,357 buses. The data set contains 4 basic characteristics, i.e., identification, arrival time, longitude and latitude. For instance, it can be denoted as [id: 10002, arrival time: 2008-02-03 10:06:48, longitude: 116.41904, latitude: 39.93963]. The total trajectory number of buses in this data set is about 15 million and the overall length of the trajectory reaches 9 million km. We import the data samples into the Google Global Mapper. As shown in Figure 3, the distribution of the trajectory of vehicles can cover the whole traffic network of Beijing. We extract a subset of buses from the whole data set for validation, and the trajectory of the subset spans from 6 a.m. to 10 p.m. on 3 February 2008. In our simulations, the size of target region is

The vehicle trajectory distribution in Beijing Metropolis according to the data set.
Algorithm in comparison
As mentioned above, the quality of crowd-sensing is determined by STC; we evaluate how the total SR
Figures 4–9 illustrate the performance with the variation of the total SR

The STC versus total sensing reward

The performance guarantee versus total sensing reward

The STC versus the number of period T.

The performance guarantee versus the number of period T.

The STC versus the initial size of solution q.

The performance guarantee versus the initial size of solution q.
Conclusion
In this article, we have introduced crowd-sensing into vehicular network to construct a vehicle-based crowd-sensing network. The quality of crowd-sensing is sensitive to the trajectory of participants, so we took advantage of scheduled mobility pattern of PTs with predictable trajectory. We have analyzed the relationship between STC and the predictable trajectory of PTs and designed a system model by considering both the current and future trajectory. Then we have formulated the SPTs as a optimization problem, which is proved to be a NP-hard problem. In order to maximize STC in polynomial time, we have proposed ECQA to achieve a performance guarantee close to 1. Finally, the simulations have been performed by adopting the T-Drive trajectory data set, and the results show the proposed ECQA outperforms other existing algorithms.
Footnotes
Handling Editor: Christos Anagnostopoulos
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This article was supported by the National Science Foundation of China, no. 61271186 and National Key R&D Program of China, no. 2017YFC0804404, and Beijing Municipal Commission of Education (The City’s Vehicle Sensing Grid Construction Based on Public Transportation Network).
