Sparse sensing data–based participant selection for people finding

Abstract

With the emerging of participatory sensing, crowdsensing-based lost people finding is arising. As a special location-centric task, participant selection is a key factor to determine success or failure of lost people finding result. Besides traditional influence, like Quality of Information contribution, candidate’s spatial proximity to the lost people is crucial in participant selection procedure. In order to evaluate how possible a candidate can approach lost people, the probability distribution of their tracing points should be predicted. However, the sparse sensing data problem has been a bottleneck of estimating people’s probable position. To overcome these issues, we study the problem of selecting optimal participants according to each candidate’s spatial proximity and Quality of Information contribution. In the first procedure, we proposed a Received Signal Strength Indicator Positioning–Determined Compressive Sensing algorithm to interpolate missing Received Signal Strength Indicator data. Then, a location-important marking method is put forward to select a set of high-quality data for estimating missing people’s location. In the third procedure, a double-condition greedy participant selection approach, which guarantees candidates’ spatial proximity and Quality of Information contribution, is executed to select optimal participants. Simulation results demonstrate that the proposed mechanism outperforms the other algorithms both in accuracy of positioning and quality of uploaded sensing data.

Keywords

People finding location-centric sparse compressive sensing participant selection

Introduction

People missing is a serious realistic problem, statistical report claims more than 1,370 people age 60 years or over go missing every day in China (see http://www.thestar.com.my/tech/tech-news/2016/10/11/online-app-helps-find-100-lost-elderly-in-china/). Locating or tracking the lost people is usually a challenging long-term task, which may last for several hours or even a couple of years. The main reason comes from two aspects. First, it may take a long time to start the people searching procedure, because (1) a careless carer or parent may be slow to catch on to that situation until the person has gone missing for a long time and (2) police will only begin looking for a missing person at a specific amount of time after the person was last seen. Second, it would take a long time to actually find the accurate location of the lost person. The major reason is that the existing technical means (e.g. smart phone–based anti-lost applications) are not applicable, especially in some extreme cases. That is, because (1) a mobile phone can be easily lost by the elderly or children, so it is hard to locate a missing person by mobile phone–based positioning. (2) The short battery life of a mobile phone limits the prime time of searching. Once the battery runs out, it would be unlikely to find lost people with mobile phone positioning. (3) Lost people may not have the opportunity to use mobile phones to share their locations with rescuers in some extreme cases, like abduction or kidnapping. Hence, an option is to develop a tiny portable device with long battery life and hide somewhere in the lost people’s body.

Existing locating or tracking solutions mainly rely on GNSS (Global Navigation Satellite System, for example, GPS), mobile base stations and RF (Radio Frequency, for example, Wi-Fi, Zigbee, RFID (Radio-Frequency Identification), and Bluetooth) technologies. Considering the particularity of lost people finding task, these solutions are not applicable. The main reasons can be stated as follows: (1) GPS-based positioning approach offers good location accuracy, but the GPS module is power-hungry. This shortage determines that GPS-based solutions do not fit into long-term task. (2) With regard to the base station–based positioning technologies, the positioning accuracy is highly dependent on the number of surrounding base stations. It is hard to guarantee that lost people can be located in the area within accuracy range. (3) Zigbee-, RFID-, or Bluetooth-based solutions can only locate objects nearby due to its lower transmit power.

Crowdsensing, a type of participative activity, in which individuals of varying knowledge, heterogeneity, and motivation are recruited to fulfill a task, has emerged as a popular data collection paradigm.^1–4 By leveraging its powerful capacity of magnifying data-sensing ability, crowdsensing can expand the temporal–spatial data-sensing range of lost people finding task. Motivated by this benefit, a crowdsensing-based lost people finding model is proposed in this article. This model consists of three parts: a task management platform, a number of candidate participants, and a target people (the lost people). We assume that (1) lost people wear a lightweight BTLE (Bluetooth Low Energy) bracelet, which could transmit Bluetooth signal in a long time period; (2) all candidate participants hold smart mobile device, which can report its own GPS coordinates and received RSSI (Received Signal Strength Indicator) value to task management platform; and (3) task management platform is responsible for publishing lost people finding task, selecting appropriate participants, estimating lost people’s possible location with reported RSSI value, and delivery incentive to participants. The whole process can be briefly broken down into the following steps. Once the task management platform publishes a people finding task, motivated by incentive, the candidates who are dispersed over a wide area may apply for this task. The task management platform iteratively selects a portion of candidates as participants to search for the lost people by sensing their Bluetooth signal. Then, the platform starts a positioning calculation process to estimate the lost people’s location if enough sensing data have been collected. If the lost people are found according to the estimated position, the task terminates otherwise a new round of participant selection is initiated.

Two important issues would be settled in crowdsensing-based lost people finding model. First, due to adopting BTLE device as signal generator, the energy consumption is no longer a constraint for long-term people finding task. Second, by iteratively involving a number of people scattered all over the places as participants to collect the lost people’s Bluetooth signal, the effective range of positioning is greatly expanded. However, the greatest challenge of this model lies with how to locate the lost people with inadequate sensing data. Random moving behavior of the lost people as well as the large searching area together makes it a pretty small chance for participants to encounter the lost people up closely for receiving their BTLE signal.

Therefore, we conclude that finding people who got lost using crowdsensing-based paradigm is the key problem we aim to study in this article, within which data sparseness is the major challenge for this task. Actually, this problem can be turned into a participant selection optimization problem. In traditional crowdsensing tasks, Quality of Information (QoI) is the only consideration in participant selection, it is a widely used index to describe task publisher’s requirement, or evaluate the actual performance of crowdsensing tasks. Broadly speaking, it relates to the ability to judge whether information is fit for use for a particular purpose. QoI can be characterized by a number of attributes, including sensing locations, required amount of data at each location, the incentive each participant required, the reputation of participant, and the energy efficiency of sensing device.⁵ For lost people finding, appropriate participants who are geographically close to lost people may contribute more effective sensing data. Thus, in addition to the QoI index (it is assumed the QoI is characterized by participant’s reputation value, incentive request and device’s energy efficiency for lost people finding task), we also take candidate’s location as key factors to select appropriate participants for contributing more effective sensing data. So, in order to address the problem of participant selection in people finding, we propose a three-phase approach. First, a RSSI Positioning–Determined Compressive Sensing (RPD-CS) algorithm is designed to interpolate missing RSSI data. In the second procedure, a location-important marking method is put forward to select a set of high-quality data for estimating missing people’s location with three-point localization method. In the third procedure, a double-condition greedy (DC-greedy) participant selection approach, which guarantees candidates’ spatial proximity and QoI contribution, is executed to select optimal participants.

The remaining of this article is organized as follow: section “Related work” reviews the literature of related works. The detailed scenario and processing flow is introduced in section “Scenario description.” Model description and detailed algorithms are elaborated in section “Model and algorithm description.” Section “Experimental evaluation” illustrates the experimental evaluation and section “Conclusion” summarizes and draws a conclusion.

Related work

Our work builds upon and extends prior work in (1) crowdsensing, (2) participant selection mechanism, (3) positioning technologies, and (4) compressive sensing (CS). So, we will review the literatures from the following four areas.

Crowdsensing

Crowdsensing, also referred to as mobile sensing, is an emerging technique which involves a group of individuals having mobile devices to collectively share sensing data and use the extracted information to measure and analyze common interest. Mobile devices equipped with various sensors have been ubiquitous, the embedded sensors can sense light, noise, location, and movement. With the huge amount of data source, mobile sensing is applied in environmental monitoring, road and traffic control, large-scale event monitoring, and other scenario. Air quality estimation is a typical application in mobile sensing. Feng et al.⁶ designed a fine-grained PM 2.5 monitoring approach with abundant images, which are collected with mobile sensing. Besides air quality prediction, electromagnetic environment monitoring is also well studied. For example, Longo et al.⁷ presented an electromagnetic device-mediated monitoring to gather and process electromagnetic field level sensed by the 3G/4G antennas on personal communication devices. Zappatore et al.⁸ presented a mobile crowd sensing platform exploiting smartphone’s built-in microphones as sound sensing devices for creating large-scale noise maps and for suggesting city managers’ suitable noise reduction interventions. In order to facilitate public facility management and provide convenient services for smart city, Wang et al.⁹ developed a mobile crowd sensing platform (PublicSense) for participatory public information reporting, issues intelligent classification, and intelligent tagging. Traffic events detection is another area where mobile sensing can play an important role. Wang et al.¹⁰ presented a pedestrian safety alarm system by leveraging mobile crowd sensing and crowd intelligence aggregation to detect temporary obstacles and make effective alerts for distracted walkers. Rui et al.¹¹ studied the problem of allocating location-dependent tasks in vehicular crowdsensing applications. Similarly, Kong et al.¹² proposed a system called CrackDetector to detect cracks and estimate their types (i.e. horizontal, vertical, net) and size (i.e. the width and the length of the crack) with the sensing data collected in mobile crowd sensing pattern. The most similar applications to our task is crowdsensing-based anti-lost systems.^13,14 These works focused on crowdsensing-based anti-lost systems by leveraging social features, but they did not explore how to locate the lost items with limited sensing data.

Participant selection

Existing works about participant selection for location-centric crowdsensing tasks mainly take data coverage ratio as primary factor. In the early stage, researchers focused on how to select a predefined number of participants to make the spatial coverage being maximized.¹⁵ Later, some works realized that as spatial coverage goes up, data redundancy in some regions increases inevitably as well. Redundant data in sensing regions not only increases the cost but also leads to a decline to the QoI. Thus, researchers attempted to leverage the spatial coverage and redundancy.¹⁶ Hereafter, besides data redundancy, various factors are introduced in participant selection procedure. For example, Li et al.¹⁷ designed a reputation-based incentives scheme for data dissemination which can be used for motivating participants to deliver reliable data messages in mobile participatory sensing networks. Messaoud and Ghamri-Doudane¹⁸ designed a Fair QoI and Energy-aware Mobile Sensing Scheme (F-QEMSS) to ensure both requestors’ and participants’ satisfaction by taking full consideration of various factors related to QoI, a heuristic tabu search algorithm is utilized to maximize the overall QoI. Compared with the existing works, we do not take coverage ratio as the major metric when we make participant selection decision. Instead, location importance is introduced as a major metric, with which the probability of participant’s spatial proximity to the lost people can be well represented.

Positioning techniques

Positioning has emerged as a critical function in people’s daily life and military applications. There are several major groups of positioning solutions depending on which type of propagation medium they are based on. The first group is RF-based technologies. RF systems estimate location by measuring different properties of RF signals. RF technologies include Wi-Fi, WSN (Wireless Sensor Network), Bluetooth, RFID, NFC (Near Field Communication), and two emerging technologies, that is, UWB (Ultra Wide Band)¹⁹ and LoRaWAN (Long Range WAN).²⁰ The second group is IR (infrared) and visible-light-based. IR-based solutions use IR signals to determine object’s position, and the systems are usually made up of a network of IR sensors linked by wires and connected to a centralized location or server. IR positioning systems have a good battery life and is at low cost in room level. But its coverage range and accuracy is limited considering the Line-of-Sight (LOS) problems.²¹ Visible light positioning technologies use visible light to determine the position of object. It is mainly based on the intensity modulation and direct detection module scheme.²² The main drawback of visible light positioning is it has limitations which may be overlooked depending on coverage area and computational costs.²¹ The third group is ultrasonic based. Ultrasonic positioning system rely on time-of-flight (ToF) measurement of ultrasonic signal, calculating using velocity and sound. It offers a number of advantages over other location systems in terms of low system cost, reliability, high energy efficiency.²³ From the perspective of implementation principles and application scenarios, existing positioning techniques could be grouped into the following categories. (1) Client-based positioning and Server-based positioning. In client-based positioning approaches, positioning is done directly on devices. Applications running on client can analyze the received signals of Wi-Fi access point or beacons and match the signal strength with its local database. On the contrary, in server-based positioning mode, a specific hardware captures the unique signals which are sent out by a Wi-Fi enabled device or a Bluetooth beacon and then transmits them to a server which calculates the position via finger printing based on the signal strength and coordinates. (2) Outdoor-oriented and Indoor-oriented positioning. Outdoor-oriented positioning techniques are mainly developed based on a reference station or a network of stations. Outdoor positioning also can rely on ubiquitous mobile network base stations, but the precision can only reach the order of several meters at best and highly depends on the number of surrounding base stations. For indoor-oriented techniques, there is no prevailing standard. Wi-Fi and Bluetooth/BTLE are the most adopted technologies in indoor-positioning due to their ubiquitous deployment.²⁴ They both use ranging based on RSSI measurements, which yield accuracy in the order of meters. (3) LOS and NLOS (Non-Line-of-Sight) scenario. According to the quality of wireless transmission channel in different scenarios, positioning approaches can be classified into LOS/NLOS modes. Plenty of researches on positioning methods in LOS environment have been conducted.²⁵ NLOS refers to the radio transmissions across a path that is partially obstructed, usually by a physical object in the innermost Fresnel zone. In NLOS environment, only signals which can reflect or diffract can reach the receiver, causing NLOS errors in Time-of-Arrival measurement.

CS

CS, a novel signal processing theory, is attracting increasing attention due to its powerful capability in reconstructing a sparse matrix based on its low rank property.²⁶ Since its appearance, CS has inspired a wave of researches in a broad area of applications including statistics, image processing, signal recovery, and machine learning. As for missing data recovery, it has been adopted in network traffic estimation,²⁷ vehicular infotainment systems,²⁸ localization in mobile networks,²⁹ and loss data reconstruction in sensor networks.³⁰ Besides, CS also inspired a new research trend in crowdsensing. Wang et al.^31,32 proposed a framework called compressive crowdsensing task allocation (CCS-TA) to dynamically select a minimum number of sub-areas for sensing task allocation in each sensing cycle, while deducing the missing data of unallocated sub-areas under a probabilistic data accuracy guarantee. Cheng et al.³³ designed a framework called DECO to detect false values for crowdsensing in the presence of missing data. By applying a tailored spatio–temporal CS approach, their framework can detect the false data and estimate both false and missing value for data correction. Yu et al.³⁴ proposed a Compressed Sensing–based and Implicit Cooperativity–based Data Gathering algorithm (CIDG) in mobile crowdsensing systems, they discovered and defined implicit cooperativity between data and further exploited it to estimate un-transmitted data, and finally recovered the original data by exploiting the compressed sensing reconstruction algorithm. However, two deficiencies make existing general CS algorithms unable to fit neatly into the demand of lost people finding. The first is that most CS algorithms have high time complexity, especially for those matrixes with high sparsity. The second reason is that the sensing data (RSSI) is sensitive to external environment, the effect of noise should be taken into account in data imputation process. Therefore, a temporal–spatial RPD-CS algorithm is proposed in this article. On one hand, RPD-CS balances the requirement of positioning accuracy and efficiency, on the other hand, the recovered data can best fit the characteristic of temporal–spatial distributed RSSI data by incorporating the effect of noise.

Scenario description

Figure 1 describes the whole process model. A lost people finding task is released by crowdsensing platform at time $t_{0}$ , claiming a set of QoI requirements. Broadly speaking, QoI relates to the ability to judge whether information is fit for use for a particular purpose.¹⁶ For the purpose of this article, it is assumed the QoI is characterized by a set of attributes, like the sensing region, sensing time period, participant’s reputation, and incentive budget it is willing to afford. Portable device with low power wasting wireless transmitting module (e.g. BTLE) is worn on the lost people’s body. It guarantees the lost people’s device can constantly transmit signal over a long time period, such that participants can get more opportunities to successfully capture the signal with their mobile phone. For convenience, the searching area is divided into grids with fixed size according to task publisher’s requirements. Before the lost people are eventually located, the crowdsensing platform will consecutively launch a participant selection procedure to locate the lost people in each sensing cycle. For example, at the end of the $t_{n - 1}$ time slot, a new round of participant selection procedure is initiated for the next sensing cycle $t_{n}$ (Figure 1(a)). Considering its particularity, the location proximity of candidate participant’s location is a significant factor in participant selection. During the data collecting stage, only a fraction of selected participants, approaching the lost people in an effective communication distance, can receive the radio signal. In most cases, the amount of received radio signals is not sufficient to locate the lost people’s accurate position. To make best use of the limited sensing data, a CS-based missing data inferring algorithm is conducted to infer the missing data (RSSI) that should have been sensed in other grids (the grids filled with blue color in Figure 1(b)). Combining the property of positioning algorithm, a portion of imputed data with more suitable spatio–temporal distribution features (corresponding to the grids filled with pink color in Figure 1(b)) is further selected for subsequent location calculation. With the sensed RSSI data and the selected imputed RSSI data, the lost people’s probable location can be deduced (Figure 1(c)). No doubt, if the deduced position happens to be the lost people’s current accurate location, the whole process will terminate. Otherwise, combined with the lost people’s historical deduced positions generated in previous sensing cycles, we can construct a possible trajectory of the lost people and consequently predict their next position in the $t_{n + 1}$ cycle (Figure 1(d)). Again, based on the assumption that candidate participant’s location importance depends on its proximity to the lost people’s probable position, a new round of participant selection is activated for the $t_{n + 1}$ cycle (Figure 1(e)).

Figure 1.

Processing flow of crowdsensing-based lost people finding: (a) selected candidate participants for t_n time slice, (b) data collection and missing data deducing at time slice t_n, (c) estimated position of the lost people at t_n time slice, (d) predicted position of the lost people at t_n + 1 time slice according to the updated accumulated predicted trajectory, and (e) selected candidate participants for t_n + 1 time slice.

To accomplish the above elaborated tasks, four challenges should be properly addressed.

How to infer missing data (RSSI) which should have been sensed in vacant grids? Sparseness of sensing data leads to difficulty in locating lost people’s tracing point. Imputed missing data can, to some extent, compensate position deviation caused by information insufficiency. In consideration of the wireless signal’s attenuation characteristic, a temporal–spatial RPD-CS algorithm is proposed.

How to select suitable data to deduce lost people’s possible position at each task cycle? Quality of each referred data (RSSI value) differs much with regard to a specific positioning algorithm. Data refinery is an essential stage to extract a best fit set of the sensing/imputed data for subsequent position calculation. Thus, a location-important imputed data marking algorithm is proposed.

How to predict the lost people’s next position with estimated tracing points? Considering the temporal and spatial continuity of motion objects, the historical inferred trajectory of the lost person will determine his or her next position to a large extent. For this scenario, Markov Chain is adopted to solve the problem.

How to select optimal participants for each sensing cycle? Spatial proximity is of crucial importance for participant selection in crowdsensing-based lost people finding scenario. Besides, QoI contribution is also an important factor that should be balanced. Thus, a DC-greedy algorithm is proposed, not only to ensure spatial proximity’s significance but also to accommodate the concerns of a participant’s QoI contribution.

Model and algorithm description

CS-based missing data imputation

CS is an effective means to recover low-rank matrix which has lots of missing data. The basic principle of CS is that, according to the Shannon–Nyquist sampling theorem,³⁵ the sparsity of a signal can be recovered from far fewer samples than required through optimization. Two conditions are necessary under which data recovery is possible with CS. The first condition is sparsity, which requires the signal to be sparse in some domain. The second condition is incoherence applied through the isometric property, which is sufficient for sparse signals. In most cases, only a few participants can receive the signal sent from the lost people, so the sensing data are sparse in both spatial and temporal dimensions. It is not difficult to reveal the low-rank structure of the sensing data matrix. Meanwhile, the missingness pattern does meet requirement of pure random distribution. Thus, CS is applicable for referring those missing RSSI value. Key concepts in CS-based data imputation can be defined as below.

Definition 1

Environment matrix (EM): denoted as $X = (x (i, j))_{n \times t}$ , which describes environmental information existing in the objective world. Here, $x (i, j)$ refers to the ideal RSSI value which should be sensed in the $i th$ grid at the $j th$ sensing cycle.

Definition 2

Binary matrix (BM): denoted as $B = (b (i, j))_{n \times t}$ , which describes the data missing situation in data collecting activity

b (i, j) = {\begin{matrix} 0 & if x (i, j) is missing \\ 1 & otherwise \end{matrix}

(1)

Definition 3

Sensing matrix (SM): denoted as $S = (s (i, j))_{n \times t}$ , which records the real sensing data collected by participants. Actually, it could be deduced by the element-wise production of $X$ and $B$ . Here, $s (i, j)$ denotes the actual RSSI value which has been sensed in the $i th$ grid at the $j th$ sensing cycle

S = X ° B

(2)

Here, the operation ° denotes Hadamard product, which is a binary operation that takes two matrices of the same dimensions and produces another matrix where each element $i, j$ is the product of elements $i, j$ of the original two matrices.

Definition 4

Reconstruction matrix (RM) is generated by interpolating missing data in $SM$ to approximate $EM$ . Here, $\tilde{x} (i, j)$ represents the referred value to the missing data in $x (i, j)$

\tilde{X} = (\tilde{x} (i, j))_{n \times t}

(3)

Definition 5

Noise Matrix (NM) is an $n \times t$ matrix, which records the influence of noise interference in radio signal transmission process. In a real-world environment, the noise interference will affect the actual RSSI value sensed in each grid. To depict the impact, $Z = (z (i, j))_{n \times t}$ is defined

z (i, j) = - 10 \times η \times l g (\frac{d_{ij}}{d_{0}}) - σ_{a}

(4)

The path loss exponent $η$ is generally set within $[1.6, 1.8]$ under LOS or $[4, 6]$ under NLOS; $d_{ij}$ is the Euclidean distance between the $i th$ grid and information source (lost person) at the $j th$ cycle, $d_{0} = 1 m$ is a reference distance for Bluetooth, and $σ_{a}$ is the Gaussian random noise.³⁶ In order to better depict the attenuation of RSSI value during radio signal propagation, the NM is introduced. Thus, the correlation between $EM$ and $SSM$ becomes

S = X ° B + Z

(5)

As stated in equation (3), the goal of data recovery task is to estimate $\tilde{X}$ . According to Single Value Decomposition (SVD), any $n \times t$ matrix $X$ can be decomposed into three matrices

X = U \sum V^{T} = \sum_{i = 1}^{\min (n, t)} σ_{i} μ_{i} υ_{i}^{T}

(6)

So, through an inverse process, we can also create an $r$ -rank approximation $\tilde{X}$ by only utilizing the $r$ largest single values

\sum_{i = 1}^{r} σ_{i} μ_{i} υ_{i}^{T} = \tilde{X}

(7)

This $\tilde{X}$ is known as the best $r$ -rank approximation that minimizes the error measured by Frobenius norm. Thus, we propose to find $\tilde{X}$ as follows

\begin{matrix} Objective : Min (rank (\tilde{X})) \\ Subject to : B ° \tilde{X} + Z = S \end{matrix}

(8)

It is still difficult to solve this minimization problem with equation (8), since it is non-convex. To solve this problem, we take advantage of the SVD-like factorization according to equation (6)

\tilde{X} = U \sum V^{T} = L R^{T}

(9)

where $L = U \sum^{1 / 12 2}$ and $R = V \sum^{1 / 12 2}$ . Hence, we just need to find matrix $L$ and $R$ which could minimize the summation of their Frobenius norms

\begin{matrix} Objective : Min (| | L | |_{F}^{2} + | | R^{T} | |_{F}^{2}) \\ Subject to : B ° (L R^{T}) + Z = S \end{matrix}

(10)

Instead of solving equation (10) directly, we solve the following equation using Lagrange multiplier method

Min (∥ B ° (L R^{T}) + Z - S ∥_{F}^{2} + λ (∥ L ∥_{F}^{2} + ∥ R^{T} ∥_{F}^{2}))

(11)

where the Lagrange multiplier $λ$ allows a tunable trade-off between rank minimization and accuracy fitness. To solve the problem in equation (11), multiple iterations are required to approximate the minimum value of the objective function. However, in consideration of computational efficiency, the times of iteration could be properly reduced by balancing the positioning accuracy. Thus, we exploit positioning error coefficient $ξ$ , which is determined by the accuracy of specific position algorithm to tune the optimal iteration times. The positioning error coefficient $ξ$ is related to the times of iterations $k$

ξ (k) = ∥ {\tilde{X}}_{next} - {\tilde{X}}_{pre} ∥_{F}^{2}

(12)

For simplicity, the above described missing data imputation algorithm is named as RPD-CS, the detailed description is illustrated in Algorithm 1.

Algorithm 1: RPD-CS based Data Imputation
Input:
$B_{n \times t}$ :binary matrix;
$S_{n \times t}$ :sensing matrix;
$Z_{n \times t}$ :noise matrix;
$r$ :rank bound;
$λ$ :tradeoff coefficient;
$ξ$ :positioning error coefficient;
Output: reconstruction matrix $\tilde{X}$ ;
$L \leftarrow randomMatrix (n, r)$ ;
$R \leftarrow zeros (t, r)$ ;
$X_{pre}, X_{next} \leftarrow zeros (n, t)$ ;
while $\underline{∥ X_{next} - X_{pre} ∥_{F}^{2} > ξ}$ do
$X_{next} = L R^{T}$ ;
$R \leftarrow Inverse (B, L, λ, r, S)$ ;
$L \leftarrow Inverse (B^{T}, R^{T}, λ, r, S^{T})$ ;
$v \leftarrow ∥ B ° (L R^{T}) + Z - S ∥_{F}^{2} + λ (∥ L ∥_{F}^{2} + ∥ R^{T} ∥_{F}^{2})$ ;
if $\underline{v < v'}$ then
$L' \leftarrow L; R' \leftarrow R; v' \leftarrow v$ ;
$X_{next} = L' R'^{T}$ ;
end
end
$X' \leftarrow L' R'^{T}$ ;
return $X'$
//return solution to contradictory equation
Procedure $Y = Inverse (B, L, λ, r, S) :$
for $\underline{i = 0; i \leq t - 1; i + +}$ do
$P_{i} \leftarrow [Diag (B (:, i)) * L; \sqrt{λ} * I_{r}]$ ;
$Q_{i} \leftarrow [S (:, i); 0_{r}]$ ;
$Y (:, i) = (P_{i}^{T} * P_{i}) \ (P_{i}^{T} * Q_{i})$ ;
end
return $Y$ ;

RPD-CS algorithm solves the optimization problem in an iterative manner. $L$ is initialized randomly so $R$ can be computed by solving the following contradictory equation

[\begin{matrix} B ° (L R^{T}) \\ \sqrt{λ} R^{T} \end{matrix}] = [\begin{matrix} S \\ 0 \end{matrix}]

(13)

This equation can be rewritten as

[\begin{matrix} Diag (B_{i}) * ({LR}_{(i)}^{T}) \\ \sqrt{λ} R_{(i)}^{T} \end{matrix}] = [\begin{matrix} S_{(i)} \\ 0 \end{matrix}]

(14)

where $i$ ranges from 1 to $t$ . This is a combination of multiple standard linear least squares problems. We then have $R_{(i)}^{T} = (P_{i}^{T} * P_{i}) \ (P_{i}^{T} * Q_{i})$ , where $P_{i} = [Diag (B_{(i)}) * L; \sqrt{λ} I_{r}]$ and $Q_{i} = [S_{(i)}; 0_{r}]$ . This procedure is reflected by the sub-function Inverse in Algorithm 1.

Location-important data marking

The data in different grids have different impacts on positioning the lost people’s possible location. It is assumed that (1) the impact of a grid is partly determined by its previous effectiveness. In other words, if the lost people’s location in previous time slots has ever been determined with the data associated with this grid, it has greater impact than other grids. (2) The nearer a grid approximates to those grids where RSSI signal has ever been captured in present or past task cycles, the more important the grid is. So, two criteria, that is, Cumulative Impact and Temporal–Spatial Proximity, are defined to evaluate a grid’s importance in position calculation. Cumulative Impact is a metric which reflects the cumulative contribution of a rigid, while Temporal–Spatial Proximity depicts the reliability of a imputed data in both temporal and spatial dimensions.

Definition 6

Cumulative Impact Vector (CIV) is a $n \times 1$ vector, that is, $W_{j} = (w_{j} (i))_{n \times 1}$ ,where $w_{j} (i)$ denotes the cumulative contribution of the $i th$ grid at the $j th$ cycle. In order to improve the positioning accuracy in each cycle, multiple groups of RSSI data would be utilized as input. Each component result deduced from individual groups affects the final positioning result, so the gap between each component and the final result indicates how much contribution the corresponding sensing data/referred data makes. Thus, it can be deemed that the more contribution a grid has ever made in the past, the more salient it is in determining the lost people’s location. Initially, $W_{0}$ is set as 0, and it would be iteratively updated in each task cycle by integrating its accumulated value (refer equation (18)).

Definition 7

Temporal–Spatial Proximity Vector (TSPV) is a $n \times 1$ vector, that is, $R_{j} = (r_{j} (i))_{n \times 1}$ . The underlying concept of temporal–spatial proximity of grid $i$ could be depicted as follows: the smaller the temporal/spatial distance between grid $i$ and those girds which have actual sensed data is, the higher the confidence of the imputed data is. So, the temporal–spatial proximity of the $i th$ grid at the $j th$ sensing cycle is defined as equation (15)

r_{j} (i) = \sum_{\forall j, S [m, j]! = 0} \frac{1}{| i - m |} + \sum_{\forall i, S [i, k]! = 0} \frac{1}{| j - k |}

(15)

Based on the two criteria discussed above, the importance of each data could be defined as equation (16)

weight (i, j) = w_{j} (i) + r_{j} (i)

(16)

Accordingly, top $k$ -ranked referred grids are then marked as location-important grids. The detailed description of data marking is elaborated in Algorithm 2.

Algorithm 2: Location-important Data Marking
Input:
$S_{n \times t}$ :sensing matrix
$W_{n \times t}$ :weight matrix
$k$ :maximum number of marked missing points
Output: markable matrix $M$
$M \leftarrow ones (n, t)$ ;
for $\underline{col in S . columns ()}$ do
Init $markList$ as $EmptyList$ ;
for $\underline{row in S . rows ()}$ do
ifS[row; col] == 0 and not be adjacent with known pointsthen
$weight \leftarrow 0$ ;
//calculate weight of missing data by columns;
for $k in S . columns ()$ do
if $S [i, k]! = 0 and j! = k$ then
$weight + = 1 / \| j - k \| + W [k]$ ;
end
end
//calculate weight of missing data by rows;
Init the marked point $P$ with $row, col$ and $weight$ saved, then add $P$ to $markList$ ;
end
end
Get the set of top k points $topList$ in $markList$ ;
for $\underline{item in topList}$ do
Get the row index $mRow$ and column index $mCol$ of the item;
$M [mRow, mCol] \leftarrow 0$ ;
end
end
return $M$

Position estimation with three-point localization

Three-point localization method is adopted for estimating lost people’s tracing point, so $(\begin{matrix} k \\ 3 \end{matrix})$ groups of input data are available for calculation. Here, $k$ is the number of selected top-ranked data from $\tilde{X}$ . Suppose the selected $k$ RSSI data in $j th$ cycle are represented as $(rss i_{j, 1}, \dots, rss i_{j, k})$ , the distance $d_{j, i}$ between the $i th$ signal receiver and the sender could then be deduced with RSSI distance model as $d_{j, i} = 10^{(| rss i_{j, i} | - A) / (10 * μ)}$ . Here, $A$ denotes the signal strength ( $[dBm]$ ) at the place which is 1 m away from the signal sender, and $μ$ represents the path loss coefficient. Let $(x_{j, 1}, y_{j, 1}), (x_{j, 2}, y_{j, 2}), \dots, (x_{j, k}, y_{j, k})$ denote the geographic coordinates of each selected grid, while $(x_{j}, y_{j})$ denotes the coordinates of signal sender (the lost people) in $j th$ cycle. Then, it can be deduced that $(x_{j} - x_{j, i})^{2} + (y_{j} - y_{j, i})^{2} = d_{j, i}^{2}, (1 \leq i \leq k)$ .

With each of the $(\begin{matrix} k \\ 3 \end{matrix})$ combinations, the coordinates of the signal sender $(x_{j, o}^{*}, y_{j, o}^{*}), (1 \leq o \leq (\begin{matrix} k \\ 3 \end{matrix}))$ could be resolved with least square estimation (LSE).³⁷ As mentioned in the previous subsection, in order to refine the positioning result, geometric center of individual coordinates is adopted to represent the estimated location of lost people

{\begin{matrix} x_{j}^{*} = \frac{\sum x_{j, o}^{*}}{(\begin{matrix} k \\ 3 \end{matrix})} \\ y_{j}^{*} = \frac{\sum y_{j, o}^{*}}{(\begin{matrix} k \\ 3 \end{matrix})} \end{matrix}, (1 \leq o \leq (\begin{matrix} k \\ 3 \end{matrix}))

(17)

It is important to note that the degree of positioning precision largely determines the success of lost people finding task. According to most people’s talent for face recognition, the precision in order of meters is an appropriate requirement. As stated in the previous subsection, the Cumulative Impact of the $i th$ grid is reflected by the cumulative contribution it has ever made to determine the composite positioning result. Thus, the formal definition of the Cumulative Impact of the $i th$ grid is given in equation (18). Here, the denominator is set as 3, which indicates the contribution of the group of data is evenly assigned to three grids since three-point localization is adopted in this model

\begin{matrix} w_{j} (i) + = \frac{\sqrt{{(x_{j, o}^{*} - x_{j}^{*})}^{2} + {(y_{j, o}^{*} - y_{j}^{*})}^{2}}}{3} \\ (if node i belongs to combination o) \end{matrix}

(18)

Position contribution

Assume that the lost people’s random walk obeys Markov process.³⁸ Let $P (X_{n + 1})$ denote the probability distribution of the lost people’s state in the $(n + 1) th$ time slot. Obviously, it only depends on its previous state, that is, $P (X_{n + 1} = x | X_{1}, X_{2}, \dots, X_{n}) = P (X_{n + 1} = x | X_{n})$ . Suppose the historical estimated tracing points are $(g_{1}, \dots, g_{n})$ , where $g_{j}$ is the $j th$ deduced location with coordinates $(x_{j}^{*}, y_{j}^{*})$ . Supposing the transition probability matrix $[P_{st}]_{n \times n}$ is defined, where $P_{st}$ denotes the probability, the lost people move from the $s th$ grid to the $t th$ grid in the next time slot.

It is worth noting that $P (X_{n + 1} = x)$ also indicates how possible a participant could receive the signal in a $γ$ radius area (centered at $x$ ). Here, $γ$ is the maximum transmission range of radio signal. Therefore, we can define position contribution matrix $ℓ_{n \times 1}$ as

ℓ (i) = max_{\forall k, d_{ik} < γ} {P (X_{n + 1} = k)}

(19)

Participant selection

To select an optimal set of participants in each task cycle, a DC-Greedy algorithm is proposed. Position contribution $L$ is considered as the precondition for improving the usability of data, while the QoI contribution which participants can bring, including the reputation value $Θ$ , incentive request $Φ$ and energy efficiency of device $Ω$ is considered as the second condition to ensure the fairness. The QoI contribution of participant $i$ is defined as follows, where $α, β, γ$ refers to the coefficient of its corresponding factor

Qc (i) = α \times Θ [i] + β \times Ω [i] - γ \times Φ [i]

(20)

Definition 8

Participants matrix (PM): denoted as $PM = (pm (i, j))_{n \times t}$ , which indicates whether a participant in the $i th$ grid is chosen in the $j th$ time slot

pm (i, j) = {\begin{matrix} 1 & if participant in this grid is chosen \\ 0 & otherwise \end{matrix}

(21)

Algorithm 3 describes the detailed procedure. First, grids with same position contribution value are grouped as a sub region and then all grids are sorted according to the position contribution in descending order. Second, in each region, an optimal set of users are selected according to their QoI contribution. Besides, in order to ensure necessary coverage ratio, at most $n * ℓ (i)$ participants could be recruited from one region. After the two loops of greedy selection, a set of participants are finally selected for a task cycle. Detailed description of DC-Greedy is illustrated in Algorithm 3.

Algorithm 3: Double-Condition Greedy Algorithm for Participant Selection
Input:
$G$ :set of sensing grids
$N$ :set of users
$n$ :number of users
$ℓ$ :the position contribution matrix
$Θ, α$ :reputation value matrix, reputation weight coefficient
$Φ, β$ :bid amount matrix, bid weight coefficient
$Ω, γ$ :energy efficiency matrix, energy weight coefficient
Output:
$X$ :set of selected users
$PM$ :participants matrix
$R = generateSubRegions (G, ℓ)$ ;
while $\underline{Region is not empty}$ do
Select a area $Regio n_{i} ϵ Region$ in a descending order of $ℓ$ ;
$subNum = 0$ ;
for $\underline{i in N}$ do
$Qc (i) = α \times Θ [i] + β \times Ω [i] - γ \times Φ [i]$ ;
end
while $\underline{subNum < = n \times ℓ [i] and exist users in Regio n_{i}}$ do
Select the user U with maximum $SA$ in $Regio n_{i}$ ,suppose that the user is in grid $k$ ;
$X \leftarrow U$ ;
$PM [k] = 1$ ;
$subNum + +$ ;
end
$R \leftarrow Region - Regio n_{i}$ ;
end
return $X, PM$

Privacy preserving

Privacy preserving is crucial in crowdsensing applications. In order to protect participant’s sensitive information, an encryption method is utilized to prevent participant’s personal identity and location information from being distorted or misappropriated. At the initial phase, the task platform generates a pair of keys ${K_{pub}, K_{pri}}$ for encrypting and decrypting the uploaded data (participant’s ID, his own GPS coordinates and received RSSI data). Then, the platform broadcasts the $K_{pub}$ to all candidates. After that, each participant encrypts his data with $K_{pub}$ according to Asymmetric Cryptographic Algorithm (ACA),³⁹ and then reports the encrypted data packet to the platform. The encrypted packet is of the form

{ID | coordinates | RSSI} K_{pub}

(22)

After receiving the encrypted data packet from participants, platform decrypts packets with $K_{pri}$ .

Experimental evaluation

Setup

In order to assess the proposed model, two open access datasets are utilized. The first dataset (http://crawdad.org/kth/rss/20160105) is collected by a robot in both indoor (KTH) and outdoor (Dortmund) environments. The second dataset (https://www.researchgate.net/publication/286170356) is collected by an unmanned aerial vehicle in a mixed indoor–outdoor environment. The robot and the unmanned aerial vehicle were both equipped with a wireless data receiver to capture the radio signal transmitted by sensors which were randomly deployed in different places. Both datasets contain RSS (Radio Signal Strength) data received by the robot or unmanned aerial vehicle, and RSSI metric is used to collect the RSS data in terms of dBm. Taking the following factors into consideration, we decided to simulate the lost people finding scenario with the above mentioned datasets. First, the communication mode between robot/aerial vehicle and wireless sensors coincide with that between the participants and lost people. Second, the datasets contain a sequence of time-stamped points that contain the wireless data receiver’s latitude, longitude, and received RSSI value, which is well suited for simulating the data characteristic of participants in lost people finding scenario. Third, the wireless data transmitters are randomly deployed across the task area, which is in partial compliance with the random-walking behavior of the lost people. Fourth, the opportunistic communication mode between robot/aerial vehicle is similar to the encountering problem between participants and lost people. Thus, neglecting the impact of both participants and lost people’s conscious psychological factor, the lost people finding scenario can be simulated with the datasets in considering the similar communication mode, data characteristic, and the opportunistic encountering character.

To simulate the crowdsensing-based lost people finding scenario, the following preprocessings are required. First, we store all the time-stamped points along with receiver’s latitude, longitude, and RSSI value in a geographical MySql database, and the entire region is divided into grids with fixed size. Second, the wireless data transmitters deployed in different grids are emulated as a lost people and the integrated path between adjacent transmitters is then deemed as the lost people’s trajectory. Third, the wireless data receivers (robot/unmanned vehicle) are regarded as participants, and their reputation, incentive request, and energy efficiency values are allocated with a random number generator. Only a portion of sensing data (RSSI) in some grids are available in the dataset. So, it shows an obvious sparseness in both temporal and spatial dimensions. Table 1 describes the detailed information of each dataset.

Table 1.

Detailed information of datasets.

	Indoor	Outdoor	Mixed
Coverage area	$300 m^{2}$	$10, 000 m^{2}$	$2850 m^{2}$
Grid size	$5 m \times 5 m$	$20 m \times 20 m$	$10 m \times 10 m$
Known nodes	$1689$	$5690$	$\approx 7500$
Sparsity	$50 %$	$66.7 %$	$78.9 %$
Time slots	$10$	$30$	$11$

All the environment settings and experiments are implemented by Python script files, and a server with Intel CPU i7-8700k (3.70 Ghz), 64G DDR4 memory is adopted as experimental platform.

Parameter tuning

Before evaluating the performance of our proposed model, a series of parameters have to be tuned. Iteration times in missing data imputation procedure is a parameter which has important impact on the data recovery accuracy. The number of selected location-important data is another important parameter which may affect the final result of lost people finding. So, in this part, we would elaborate the detail of parameter tuning process.

Iteration times tuning

As mentioned in subsection “CS-based missing data imputation,” data imputation is computationally intensive, so in order to balance the computation cost and the imputation accuracy, a proper iteration time is of great necessity.

First, in order to measure the accuracy of data imputation, index of $Error Ratio of Data Recovery$ is defined as

ϵ = \frac{\sqrt[2]{\sum_{i, j : b (i, j) = 0} {(x (i, j) - \tilde{x} (i, j))}^{2}}}{\sqrt[2]{\sum_{i, j : b (i, j) = 0} {(x (i, j))}^{2}}}

(23)

Actually, $ϵ$ reflects the ratio of general deviation of recovered data to real environment. In order to find a proper iteration time, experiments are conducted in indoor, outdoor, and indoor–outdoor mixed environment. A pretty clear trend reflected in Figure 2 shows that, when the number of iterations grows, the error ratio declines and gradually comes to a steady-state. Since open environment facilitates wireless signal propagation, the performance in outdoor condition is better than in indoor and mixed environments. $ϵ$ gets minimum value when iteration times are set to 200, 150, and 200, respectively, in three environments with RPD-CS algorithm. So, in subsequent experiments, these iteration parameters will be adopted. As mentioned in the previous subsection, $ξ$ is a parameter which balances the iteration times and positioning accuracy. Considering the iteration times have been determined as 200, 150, and 200, the corresponding positioning error coefficient $ξ$ are then set as 1.2, 0.4, and 0.5, respectively, according to their relation defined in equation (12)

Figure 2.

Data imputation with different CS algorithms: (a) indoor, (b) outdoor, and (c) indoor–outdoor mixed.

Comparison experiments are also conducted on the basic CS algorithm and its enhanced version Spatial–Temporal Compressive Sensing (ESTI-CS).³⁰ Figure 2 illustrates the performance of each algorithm in different environments. Our proposed RPD-CS algorithm outperforms other two algorithms, since it takes signal attenuation into consideration, meanwhile it reduces unnecessary iterations while keeping a relative accuracy of data recovery. Take outdoor dataset as an example, the error ratio of RPD-CS is about 16% and 10% lower than CS and ESTI-CS, respectively, when coming to steady state.

Marking ratio tuning

Remember that, in order to improve the positioning accuracy, most top $k$ -ranked location-important data will be selected for calculating the lost people’s possible position. The setting of $k$ may finally affect the model’s performance. So, in the next part, we will elaborate the process of parameter setting.

First, in order to investigate how data marking affects positioning accuracy, index Mean Deviation of Positioning is introduced as

η = \sum_{j : 0 \to t - 1} \frac{distance (g_{j}, g_{j}^{*})}{t}

(24)

Here, $g_{i}$ indicates the calculated position of lost people in the $j th$ task cycle, while $g_{j}^{*}$ is the corresponding actual position, and the function $distance (g_{j}, g_{j}^{*})$ represents the absolute distance. Generally, $η$ describes the cumulative deviation in past task cycles. As could be seen in Figure 3, $η$ decreases substantially while marking ratio $θ$ increases and finally comes to stable when $θ$ becomes greater than 0.4. Here, the marking ratio $θ = k / n$ is a proportion of the number of marked location-important grid $k$ to the total number of task grids $n$ . So, in subsequent experiments at the most 40% grids are marked for each time cycle.

Figure 3.

Performance of positioning accuracy.

As shown in Table 2, RPD-CS achieves 2%–4%, 10%–14%, and 4%–6% lower Error Ratio of Data Recovery than ESTI-CS and CS for indoor, outdoor, and mixed environments. Meanwhile, RPD-CS outperforms ESTI-CS and CS in Mean Deviation of Positioning which controls the positioning accuracy near 10 m.

Table 2.

Performance comparison in data imputation and position calculation.

	Indoor				Outdoor				Mixed
	$ϵ_{\min}$	$π_{\min}$	$ξ_{\min}$	$η_{\min}$	$ϵ_{\min}$	$π_{\min}$	$ξ_{\min}$	$η_{\min}$	$ϵ_{\min}$	$π_{\min}$	$ξ_{\min}$	$η_{\min}$
CS	$50.2 %$	$\approx 80$	$2.7$	$24.2 m$	$44.0 %$	$\approx 100$	$0.45$	$18.9 m$	$60.0 %$	$\approx 100$	$1$	$38.7 m$
ESTI-CS	$48.3 %$	$\approx 200$	$1.4$	$14.6 m$	$40.1 %$	$\approx 100$	$0.45$	$12.8 m$	$58.2 %$	$\approx 100$	$1$	$22.3 m$
RPD-CS	$46.2 %$	$\approx 150$	$1.8$	$8.5 m$	$30.1 %$	$\approx 50$	$0.47$	$6.5 m$	$53.8 %$	$\approx 200$	$0.5$	$10.7 m$

CS: Compressive Sensing; ESTI-CS: enhanced version Spatial–Temporal Compressive Sensing; RPD-CS: Received Signal Strength Indicator Positioning–Determined Compressive Sensing.

Performance evaluation

Based on the pre-set parameters, the following experiments are designed to evaluate the performance of our proposed participant selection approach from three aspects: how much QoI gain the selected participants can contribute? how possible the selected participants can approach the lost people? and how many rounds would it take to finally locate the lost people?

As previously stated, participants’ reputation, incentive request, and energy efficiency are randomly allocated in the simulation environment. Considering their equal importance, we set their weight, that is, $α$ , $β$ , and $γ$ as $1 / 3$ , respectively, in the following experiments.

QoI contribution comparison

QoI is an important index for evaluating the data quality in crowdsensing tasks. Appropriate participant selection algorithm can benefit higher QoI gain. So, in this part, we would make comparisons between our proposed DC-Greedy algorithm with the other participant selection approaches according to QoI Contribution Ratio. Based on the definition of QoI Contribution (refers to equation (20)), we further define QoI Contribution Ratio $ρ$ as

ρ = \frac{\sum_{j \in X} Qc (j)}{\sum_{i \in N} Qc (i)}

(25)

Actually, it reflects the relative quality of selected participants, the greater the index $ρ$ is, the more participants with high reputation, energy efficiency, and low incentive request are recruited. Figure 4 describes the changes of QoI Contribution Ratio with varying user coverage. Here, the $user coverage τ$ is defined as $τ = n / g$ , where $n$ represents the number of users and $g$ represents the number of sensing grids. User coverage $τ$ indicates the statistical value of how many grids can be covered by the candidate participants on average. As illustrated in Figure 4, DC-Greedy outperforms Greedy,⁴⁰ QEMSS,¹⁸ and CCS-TA³¹ in indoor, outdoor, as well as indoor–outdoor environments. It is also interesting to see that, with the increasing of user coverage, QoI increases at the initial phase, but turns to decline with the growth of user coverage. This occurs because a higher user coverage may cause more inappropriate candidate users when brought in.

Figure 4.

User coverage versus QoI contribution ratio: (a) indoor, (b) outdoor, and (c) indoor–outdoor mixed.

Proximity comparison

In order to further analyze the performance of participant selection, the following experiments are conducted to measure the degree of participants’ proximity to the lost people. Thus, the index of Deviation of Distance for Participant Selection is introduced as

μ = \sum_{j : 0 \to t - 1} \sum_{i \in X} \frac{distance (g_{ij}, (g_{i}^{*}))}{| X |}

(26)

where $g_{ij}$ denotes the location of the $i th$ participant in the $j th$ time slot, $g_{j}^{*}$ is the actual position of lost people in the $j th$ time slot, and $| X |$ represents the number of participants. Figure 5 shows how $μ$ changes with varying user coverage. As could be seen, under different user coverage settings, DC-Greedy achieves minimum deviation, which means the participants selected with DC-Greedy have greater opportunity to get close the lost people and finally find him.

Figure 5.

User coverage versus deviation of distance: (a) indoor, (b) outdoor, and (c) indoor–outdoor mixed.

Table 3 shows DC-Greedy algorithm achieves 12%–23%, 13%–29% and 0%–7% higher QoI Contribution Ratio than Greedy, QEMSS and CCS-TA for indoor, outdoor and mixed environments. Meanwhile, only DC-Greedy algorithm controls the index Deviation of Distance for Participant Selection near 10 m for indoor and outdoor environments and controls near 20 m in mixed environment.

Table 3.

Performance of participant selection.

	Indoor		Outdoor		Mixed
	$ρ_{avg}$	$μ_{avg}$	$ρ_{avg}$	$μ_{avg}$	$ρ_{avg}$	$μ_{avg}$
Greedy	$0.412$	$58.4 m$	$0.356$	$49.6 m$	$0.338$	$55.6 m$
QEMSS	$0.475$	$33.2 m$	$0.518$	$31.4 m$	$0.416$	$34.6 m$
CCS-TA	$0.363$	$29.2 m$	$0.400$	$27.5 m$	$0.333$	$40.09 m$
DC-Greedy	$0.590$	$15.2 m$	$0.648$	$9.7 m$	$0.402$	$20.12 m$

QEMSS: Quality of Information and Energy-aware Mobile Sensing Scheme; DC-greedy: double-condition greedy.

Achievement orientation comparison

In this part, experiments are conducted to compare the overall performance of different schemes. More specifically, we will combine different data imputation algorithms and participant selection algorithms to see how many rounds will they take to finally locate the lost people. In order to facilitate comparison, all schemes adopt same location-important data marking approach and three-point localization method. As illustrated in Table 4, the combination of RPD-CS and DC-Greedy outperforms the other schemes. It only takes 8, 15, and 7 rounds, respectively, to finally locate the lost people. The major reason is that the PRD-CS and DC-Greedy model can provide relative accurate imputed data for calculating lost people’s possible position, meanwhile a QoI driven participant selection approach ensures more suitable participants can be recruited for the lost people finding task (Table 4).

Table 4.

Achievement orientation comparison.

	Indoor	Outdoor	Mixed
CS+Greedy	$Failed$	$24$	$11$
ESTI-CS+QEMSS	$Failed$	$23$	$10$
ESTI-CS+CCS-TA	$10$	$26$	$Failed$
RPD-CS+DC-Greedy	$8$	$15$	$7$

CS: Compressive Sensing; ESTI-CS: enhanced version Spatial–Temporal Compressive Sensing; QEMSS: Quality of Information and Energy-aware Mobile Sensing Scheme; RPD-CS: Received Signal Strength Indicator Positioning–Determined Compressive Sensing; DC-greedy: double-condition greedy.

Conclusion

Participatory sensing has opened a new methodology for addressing lost people finding issue. However, people’s random walk in large open area bring challenges for accurate positioning or tracing target person. To address this problem, we present a novel framework. First, in order to solve sensing data sparse problem, a RPD-CS algorithm is proposed to infer the missing RSSI data in numerous task grids. Second, a location-important imputed data marking method is applied to improve the quality of data for estimating people’s position with three-point localization algorithm. Third, trajectory prediction is applied to predict the probability distribution of lost people’s next position with the series of estimated historical tracing points. Finally, a DC-greedy participant selection approach, which guarantees candidate’s spatial proximity and QoI contribution, selects the most appropriate set of participants. Experimental evaluation shows that our solution achieves higher quality of sensing data than other algorithms. Meanwhile, it ensures the required accuracy of position calculation. There are many potential future directions of this work. It would be interesting to further study the incentive and privacy mechanisms in our future work.

Footnotes

Handling Editor: Zhiyuan Tan

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was partly supported by the National Natural Science Foundation of China (No. 61602051, 61802022 and 61802027) and the Fundamental Research Funds for the Central Universities (No. 2017RC11).

ORCID iD

Ye Tian

References

Gao

Liu

Wang

et al . Maximizing data credibility under budget constraint for participatory sensing. In: 2015 IEEE 17th International Conference on High Performance Computing and Communications, New York, 24–26 August 2015, pp.134–139. New York: IEEE.

Gao

Liu

Wang

et al . A survey of incentive mechanisms for participatory sensing. IEEE Comm Surveys Tutor 2015; 17(2): 918–943.

Wang

Zhang

Wang

et al . Sparse mobile crowdsensing: challenges and opportunities. IEEE Comm Magazine 2016; 54(7): 161–167.

Zhang

Liu

Tang

et al . Learning-based energy-efficient data collection by unmanned vehicles in smart cities. IEEE Trans Ind Informat 2018; 14(4): 1666–1676.

Tian

Sangaiah

et al . Privacy-preserving scheme in social participatory sensing based on Secure Multi-party Cooperation. Comp Comm 2018; 119: 167–178.

Feng

Wang

Tian

et al . Estimate air quality based on mobile crowd sensing and big data. In: 2017 IEEE 18th international symposium on a world of wireless, mobile and multimedia networks (WoWMoM), Macau, China, 12–15 June 2017, pp.1–9. New York: IEEE.

Longo

Zappatore

Bochicchio

. Towards massive open online laboratories: an experience about electromagnetic crowdsensing. In: Proceedings of 2015 12th international conference on remote engineering and virtual instrumentation (REV), Bangkok, Thailand, 25–27 February 2015, pp.43–51. New York: IEEE.

Zappatore

Longo

Bochicchio

. Using mobile crowd sensing for noise monitoring in smart cities. In: 2016 international multidisciplinary conference on computer and energy science (Splitech), Split, 13–15 July 2016, pp.1–6. New York: IEEE.

Wang

Guo

et al . PublicSense: a crowd sensing platform for public facility management in smart cities. In: 2016 International IEEE conferences on ubiquitous intelligence & computing, advanced and trusted computing, scalable computing and communications, cloud and big data computing, internet of people, and smart world congress (UIC/ATC/ScalCom/CBDCom/IoP/SmartWorld), Toulouse, 18–21 July 2016, pp.114–120. New York: IEEE.

10.

Wang

Guo

Wang

et al . CrowdWatch: dynamic sidewalk obstacle detection using mobile crowd sensing. IEEE Internet Things J 2017; 4(6): 2159–2171.

11.

Rui

Zhang

Huang

et al . A location-dependent task assignment mechanism in vehicular crowdsensing. Int J Distribut Sens Networks. Epub ahead of print 20 September 2016. DOI: 10.1177/1550147716669627.

12.

Kong

Chen

et al . Detecting type and size of road crack with the smartphone. In: 2017 IEEE international conference on computational science and engineering (CSE) and IEEE international conference on embedded and ubiquitous computing (EUC), Guangzhou, China, 21–24 July 2017, pp.572–579. New York: IEEE.

13.

Harburg

Kim

Gerber

et al . CrowdFound: a mobile crowdsourcing system to find lost items on-the-go. In: Proceedings of the 33rd annual ACM conference extended abstracts on human factors in computing systems, 2015, pp.1537–1542, https://www.scholars.northwestern.edu/en/publications/crowdfound-a-mobile-crowdsourcing-system-to-find-lost-items-on-th

14.

Song

Shin

Jang

et al . Effective opportunistic crowd sensing IoT system for restoring missing objects. In: 2015 IEEE international conference on services computing, 2015, pp.293–300, https://ieeexplore.ieee.org/document/7207366

15.

Reddy

Estrin

Srivastava

. Recruitment framework for participatory sensing data collections. In: Floréen

Krüger

Spasojevic

(eds) International conference on pervasive computing. Berlin; Heidelberg: Springer, 2010, pp.138–155.

16.

Song

Liu

et al . QoI-aware multitask-oriented dynamic participant selection with budget constraints. IEEE Trans Vehic Tech 2014; 63(9): 4618–4632.

17.

Wang

et al . Reputation-based incentives for data dissemination in mobile participatory sensing networks. Int J Distribut Sens Networks 2015; 11(12): 172130.

18.

Messaoud

Ghamri-Doudane

. Fair QoI and energy-aware task allocation in participatory sensing. In: 2016 IEEE wireless communications and networking conference, 2016, pp.1–6. https://ieeexplore.ieee.org/document/7565025

19.

Liu

Tang

. An overview of location semantics technologies and applications. Int J Semantic Comput 2015; 9(3): 373–393.

20.

Fargas

Petersen

. GPS-free geolocation using LoRa in low-power WANs. In: 2017 global internet of things summit (Giots), 2017, pp.1–6, https://ieeexplore.ieee.org/document/8016251

21.

Sakpere

Oshin

Mlitwa

NBW

. A state-of-the-art survey of indoor positioning and navigation systems and technologies. South Afr Comp J 2017; 29(3): 145–197.

22.

Aminikashani

Kavehrad

. Indoor positioning with OFDM visible light communications. In: 2016 13th IEEE annual consumer communications & networking conference (CCNC), 2016, pp.505–510, https://ieeexplore.ieee.org/abstract/document/7444832

23.

Han

Zhu

et al . An indoor ultrasonic positioning system based on TOA for Internet of Things. Mobile Informat Syst 2016; 2016: 4502867.

24.

Shang

et al . Improving Wi-Fi indoor positioning via AP sets similarity and semi-supervised affinity propagation clustering. Int J Distribut Sens Networks 2015; 11(1): 109642.

25.

Chen

Gesbert

. Optimal positioning of flying relays for wireless networks: a LOS map approach. In: 2017 IEEE international conference on communications (ICC), 2017, pp.1–6, https://ieeexplore.ieee.org/abstract/document/7996921

26.

Donoho

Javanmard

Montanari

. Information-theoretically optimal compressed sensing via spatial coupling and approximate message passing. IEEE Trans Informat Theory 2013; 59(11): 7434–7464.

27.

Malboubi

Peng

Sharma

et al . A learning-based measurement framework for traffic matrix inference in software defined networks. Comp Electr Eng 2018; 66: 369–387.

28.

Guo

Song

et al . A survey on compressed sensing in vehicular infotainment systems. IEEE Comm Surveys Tutor 2017; 19(4): 2662–2680.

29.

Rallapalli

Qiu

Zhang

et al . Exploiting temporal stability and low-rank structure for localization in mobile networks. In: Proceedings of the 16th annual international conference on mobile computing and networking, 2010, pp.161–172, https://dl.acm.org/citation.cfm?id=1860015

30.

Kong

Xia

Liu

et al . Data loss and reconstruction in sensor networks. In: 2013 Proceedings IEEE INFOCOM, 2013, pp.1654–1662, https://ieeexplore.ieee.org/document/6566962

31.

Wang

Zhang

Pathak

et al . CCS-TA: quality-guaranteed online task allocation in compressive crowdsensing. In: Proceedings of the 2015 ACM international joint conference on pervasive and ubiquitous computing, 2015, pp.683–694, https://dl.acm.org/citation.cfm?id=2807513

32.

Wang

Ngai

et al . Energy-efficient collaborative outdoor localization for participatory sensing. Sensors 2016; 16(6): E762.

33.

Cheng

Niu

Kong

et al . Compressive sensing based data quality improvement for crowd-sensing applications. J Network Comp Appl 2017; 77: 123–134.

34.

Xia

Zhou

. Compressed sensing and implicit cooperativity based data gathering algorithm in mobile crowdsensing systems. In: IEEE SmartWorld, ubiquitous intelligence and computing, advanced and trusted computing, scalable computing and communications, cloud and big data computing, internet of people and smart city innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI), 2018, pp.1124–1129, https://www.semanticscholar.org/paper/Compressed-Sensing-and-Implicit-Cooperativity-Based-Yu-Xia/3753595844658236f8a324f689fcedc95693418f

35.

Dias

Rane

Bandewar

. Survey of compressive sensing. Int J Sci Eng Res 2012; 3(2), https://pdfs.semanticscholar.org/f832/a500c88b126bc84e587905040a9781f3021e.pdf

36.

Artemenko

Nayak

Menezes

et al . Evaluation of different signal propagation models for a mixed indoor-outdoor scenario using empirical data. In: Mitton

Erol

Gallais

(eds) International conference on ad hoc networks. Cham: Springer, 2015, pp.3–14.

37.

Huang

Wang

Chen

et al . Double least-squares projections method for signal estimation. IEEE Trans Geosci Remote Sens 2017; 55(7): 4111–4129.

38.

Tompitak

Mossalam

Barkema

et al . Can Markov chain models predict nucleosome positioning? Biophys J 2016; 110(3): 404a.

39.

Krumm

. A survey of computational location privacy. Person Ubiquitous Comp 2009; 13(6): 391–399.

40.

Gedeon

Schweizer

. Understanding spatial and temporal coverage in participatory sensor networks. In: 2015 IEEE 40th local computer networks conference workshops (LCN Workshops), 2015, pp.699–707, https://ieeexplore.ieee.org/document/7365917