Geolocating a WeChat user based on the relation between reported and actual distance

Abstract

The combination of social networks and the Internet of Things has raised a new wave of network technology application. However, the presence of malicious social network users poses a potential threat to Internet of Things security, and the research on social network user geolocation technology is of great significance. The accuracy of existing geolocation methods for WeChat users depends on the stable correspondence between the reported distance and actual distance. In view of the difficulty to pinpoint users’ location in the real world due to WeChat location protection strategy, a WeChat user geolocation algorithm based on the reported and actual distance relation analysis is proposed. The proposed algorithm selects the target reported distance and determines the initial target space based on statistical characteristics of the relation between the reported distance and actual distance. What is more, stepwise strategies are taken to improve the accuracy rate of space partition. Experimental results show that, on the premise that target users can be discovered, the proposed algorithm could achieve higher accuracy compared with the classical space partition–based algorithm and the heuristic number theory based algorithm. The highest geolocation accuracy is within 10 m and 56% of geolocation results are within 60 m.

Keywords

Internet of Things security location-based social networks statistical characteristics optimization parameters stepwise strategies WeChat user geolocation

Introduction

In recent years, the Internet of Things (IoT) has developed rapidly with various applications such as industry, agriculture, transportation, logistics, and smart home,¹ which provide convenient and intelligent services for people’s production and life. Location information is one of the basic elements of information interaction in IoT, and the basis of location based services (LBS) provided by IoT.^2–4 With the rapid development of mobile Internet and the wide application of smart hardware devices, more and more social media platforms are trying to associate mobile social system with IoT.⁵ The social network is extended to be a broader concept beyond the people-to-people interaction, which also includes the people-to-thing and thing-to-thing interactions.⁶ Tencent, for example, one of the largest Chinese social media platforms, which owns WeChat, QQ, and other mobile messaging systems, launched WeChat hardware platform in 2014 which provides IoT solutions for connecting things and things as well as people and things.⁷ WeChat is a popular social network application in China with 963 million active accounts in over 200 countries by June 2017.⁸ In the same year, Facebook released its own IoT platform, Parse. The combination of mobile social networks and IoT makes mobile social networks can not only meet the demand of users for people-to-people interaction,^9–11 but also promote further development of IoT. As there are a great number of mobile social network users, malicious users among them may conduct rumor spreading, virus dissemination, and other malicious behaviors,^12–14 which seriously endanger the health of network environment and the security of IoT. Carrying out the research on mobile social network user geolocation technology is significant to provide time-efficient support for pinpointing the location of malicious users as well as geolocating IoT security busters from social networks. Meanwhile, the research can raise awareness of location privacy protection for ordinary users and help service providers afford more secure services.^15,16 Such research, combined with network application technologies such as multimedia steganography detection^17,18 and key nodes discovery of anonymous communication networks,^19–21 can discover and geolocate Internet-sensitive targets, so as to provide technical support to maintain the network security.

For the advantage that mobile social network applications can access the location of devices conveniently, a wide variety of location-based social networks (LBSNs) and particular LBSs sprang up,^22–24 which make it possible to geolocate social network users. It has been reported that the Egyptian government used the trilateration geolocation method based on the location-based characteristics of gay dating apps to geolocate imprison users.²⁵ This article is focusing on WeChat to explore whether WeChat users can be geolocated based on the provided LBS. The existing WeChat user geolocation algorithms can be divided into two categories: number theory based geolocation algorithm and successive approximation–based geolocation algorithm.

Number theory based geolocation algorithm abstracts the relation between the reported distance and actual distance as an ideal mathematical model. By setting probes equidistantly with certain rules in the region where the target user is located, the reported distances of the target user collected from multiple probes are constraint solved, and therefore the location of the target user is pinpointed. Xue et al.²⁶ have proposed one-dimensional adversarial method first to detect the target user’s location along a line and further extended the method to two-dimensional space. What is more, the method is proved theoretically that it can geolocate WeChat users with high accuracy under ideal conditions. Peng et al.²⁷ analyzed the influence of noise on the reported distance based on hypothesis, and a heuristic number theory based geolocation algorithm was proposed. By deploying multiple probes along the horizontal and the vertical edge, middle probes which get the minimum reported distance in each edge are selected as the predicted coordinates. Cheng et al.²⁸ pointed out that the existence of noise makes geolocation errors of number theory based algorithm increase as the actual distance between the probes and the target user increases. In addition, placement strategies for the first probe were proposed to improve the practicality of the number theory–based geolocation algorithm. Number theory based geolocation algorithm has high geolocation accuracy in theory, but in practical as there is no stable relation between the reported distance and actual distance, the gap between the actual results and theoretical precision is large and the theoretical precision is hard to achieve.

Successive approximation–based geolocation algorithm first determines the target user’s location within a certain region. Then the region is divided into several subregions by collecting target users’ reported distance from different positions. By determining in which subregion the target user is located, the size of the potential region is reduced. Constantly repeating the procedure so that the real location of the target user is limited within a small region. Ding et al.²⁹ geolocated WeChat user based on the improved triangle geolocation algorithm, which utilizes the band-like reported distance characteristic of WeChat. The center of the intersection of rings determined by several probes was taken as the location of the target user. In order to break the minimum reported distance limit of WeChat, Li et al.³⁰ have proposed a geolocation algorithm based on space partition. The algorithm first determines the target user within the minimum reported distance, delimits the target space based on the minimum reported distance, and then partitions the target space until the threshold is achieved.

Space partition–based geolocation algorithm is less time consuming and easy to implement. As shown in a previous paper,³⁰ the algorithm is able to geolocate 50% of users in less than 40 m and the average geolocation accuracy is about 51 m. However, actual tests show that, affected by the location protection strategy update of WeChat, there is no stable correspondence between the reported distance and actual distance. The algorithm is difficult to geolocate target users with high precision under the current conditions. To analyze the correspondence between the reported distance and actual distance and, furthermore, improve the geolocation accuracy for WeChat users in actual environment, a geolocation algorithm based on the relation between the reported distance and actual distance is proposed.

In this article, we improve space partition–based geolocation algorithm by selecting optimization parameters based on statistical characteristics of the relation between the reported distance and actual distance, and stepwise strategies are proposed to improve the accuracy rate of space partition. Experimental results show that the proposed algorithm can geolocate WeChat users with higher accuracy compared with the classical space partition–based algorithm and the heuristic number theory based algorithm, and the highest geolocation accuracy is less than 10 m.

The novelty and contributions of this work are summarized as follows:

We perform real-world test of WeChat “People Nearby” service. Based on the test, we find that there is an unstable relationship between the reported and actual distances, which makes the existing method ineffective to geolocate WeChat users precisely.

We perform statistical analysis of the relation between the reported and actual distances, and determine the target reported distance and the initial target space based on the statistical characteristics.

We propose stepwise strategies to deploy probes, which improve the accuracy of finding subspace where the target user is located.

The proposed algorithm is potentially effective for LBSNs, the reported distance of which is highly confused and has segmented properties.

The rest of the article is organized as follows: In section “Problem statement,” LBSN and the related work are discussed. Section “WeChat user geolocation algorithm based on optimization parameter selection of space partition” describes the proposed geolocation algorithm. Critical steps of the algorithm are analyzed in detail in sections “Determining the initial target space based on statistical characteristics of reported distance” and “Partitioning target space based on stepwise strategies,” respectively, which is followed by section “Experimental results and analysis” describing the geolocation experiment. Finally, section “Conclusion” concludes the article.

Problem statement

In this section, LBSNs are classified according to the way users’ location is shared; what is more, location-based social discovery (LBSD) services and location protection strategies of WeChat are introduced. Then theories and shortcomings of the classical space partition–based geolocation algorithm are discussed.

Classification of LBSNs

In LBSN, users get LBSs such as restaurant recommendation, friend suggestion, and navigation by sharing mobile device’s location to the server. According to the way in which users’ location is disclosed by the server, existing LBSNs can be divided into two categories: direct location sharing and indirect location sharing.

In direct location sharing LBSN, a normal user can get the exact location information of the user whose location is submitted to the server, such as Foursquare, Weibo, and other check-in applications. Taking Weibo for example, users can add a location tag in the published content. Other users will see the location tag when they browse the content, so as to know where the content was published. Some direct location sharing LBSNs also support sharing their location to a specific set of users, instead of exposing their location to all users, such as Google Latitude.

In indirect location sharing LBSN, the server hides the exact location information of users by utilizing methods such as obfuscation or setting granularity, for instance, WeChat, Momo, and Skout. WeChat is a typical kind of indirect location sharing LBSN. When a WeChat user is querying to the server to find nearby people, the exact location of the nearby user is hidden and coarse-grained distance information is provided by the server, such as user A is within 300 m. Based on the reported distance information, the exact location of users cannot be determined directly.

LBSD services and location protection strategies of WeChat

WeChat provides users with LBSD services based on the location of mobile devices. WeChat can obtain the user’s location using the positioning system embedded in the mobile device. When the current location of the mobile device is needed to provide LBS for users, WeChat issues a locate request to the operating system and get a position result from the operating system based on the available location source (global positioning system (GPS), WiFi, or Cell ID).

LBSD is an important kind of LBSs, which provides service for users to find others close to their geographical location, and relative distances between them are reported by the server.^31–33“People Nearby” service in WeChat is one typical kind of LBSD service. By querying “People Nearby,” users can find other users (friends or strangers) with geographic proximity and get the reported relative distance. After that, the user can be discovered by other users in the same way if his location is not cleared manually. WeChat reports the relative distance between users in bands of 100 m until the reported distance is within 1000 m, and then in bands of 1000 m. The screenshot of “People Nearby” in WeChat is shown in Figure 1.

Figure 1.

“People Nearby” service in WeChat.

To defend against the traditional triangle geolocating attack and protect users’ location privacy, most LBSD services of social network applications such as WeChat report the relative distance between users in concentric bands. By setting the minimum reported distance (100 m) and reporting relative distance in bands, the server does not report accurate relative distance directly. What is more, there is no stable correspondence between the reported distance and actual distance. The reported distance is not always correct. For example, if the actual distance between two users is 350 m, the reported distance may be 300 m or smaller. In this way, it is difficult to determine the location of users even if the reported distance is known.

Analysis of original space partition–based geolocation algorithm

Space partition–based geolocation algorithm is designed to break the minimum reported distance limit of LBSD and further enhance the geolocation accuracy. Implementation of the algorithm should meet the following prerequisites:

The reported distance of the target user from the probe is the minimum reported distance, which is 100 m for WeChat.

WeChat reports the relative distance between the probe and the target user in bands of $K$ , and the relation between the reported distance $W_{d}$ and the actual distance $d$ can be formalized as follows

W_{d} = (⌊ \frac{d}{K} ⌋ + 1) * K

(1)

where K = 100 when $d < 1000$ and K = 1000 when $d \geq 1000$ .

The illustration of space partition–based geolocation algorithm is shown in Figure 2. Noting that the moving direction in the figure does not represent the actual trajectory of the probe, it just shows the difference of orientation before and after the changing of probe’s location. The arrows in the following figures have the same meaning unless otherwise specified.

Figure 2.

Illustration of space partition–based geolocation algorithm.

For the simplicity of problem presentation, the algorithm considers the space determined by the minimum reported distance as a box rather than a circle. What is more, fake location provider–based location spoofing method is used to fake the location of probes, which is a virtual Android machine. To avoid redundant presentation, in the rest of the article, probes will query WeChat server to find nearby users after its location is changed, unless otherwise specified.

First, the location of probes is dynamically changed to scan the potential area where the target user is located, until the target user occurs in the query result list with the minimum reported distance. So the initial target space is determined as a box with 200 m side length. The current position of the probe is taken as the intermediate geolocation result. Second, the position of the probe is shifted relative to the last intermediate geolocation result; consequently, the space determined by the minimum reported distance overlaps half of the initial target space. Third, the half space where the target user is located is determined by judging whether the reported distance of the target user from the shifted probe changes. If it changes, it is derived that the target user is in the non-overlapped half. Otherwise, the target user is located in the overlapped half. In this way, the potential area is reduced to half after each round check. The intermediate geolocation result is modified to make it in the center of current potential area. Repeating the partition procedure for multiple rounds until the expected accuracy is achieved. The last intermediate result is taken as the geolocation result of the target user.

The space partition–based geolocation algorithm takes the region determined by the minimum reported distance as the target space and decides the subspace where the target user is located by checking whether the reported distance of the target user is increasing when probe’s location is shifted. The geolocation accuracy of the algorithm depends on the strict correspondence between the reported distance and the actual distance, given by formula (1). If the correspondence is satisfied, the algorithm will achieve high geolocation accuracy. However, actual test shows that there is no strict correspondence between the reported distance and actual distance. The situation that the reported relative distance is 200 m or higher is frequent even if the actual distance between two users is less than 100 m. Under the current conditions, the initial target space delimited by the algorithm may have failed to cover the location of the target user. What is worse, the algorithm may identify the target user’s location in wrong subspace easily even if the initial target space covers the target users’ location, which will enlarge the geolocation error. So there are some challenges that we must face: (1) How to select the reasonable target reported distance as the beginning of the geolocation? (2) How to delimit the initial target space that covers the location of the target user with high probability? (3) How to find the right subspace with high success rate?

In this article, we select the target reported distance and determine the initial target space based on the statistical characteristics of the reported distance instead of the minimum reported distance. What is more, we adopt different query point selection strategies during different procedures to improve the accuracy of subspace judging. In this way, the geolocation accuracy can be significantly improved.

WeChat user geolocation algorithm based on optimization parameter selection of space partition

The update of privacy protection policy of WeChat can be taken as the possible reason for the fact that there is no strict correspondence between the reported distance and actual distance in “People Nearby” service. In this section, we select optimal parameters that determine the more valid target space and decrease the probability of misjudgment by analyzing the relation between the reported distance and actual distance. In addition, stepwise strategies are adopted to improve the accuracy rate of space partition.

Assuming that the algorithm takes $W p_{i}$ as the target reported distance, which means that the probe starts to geolocate the target user when the reported distance of the target user shown in the query result list is $W p_{i}$ . For the same reason with the space partition–based geolocation algorithm, we consider the target space determined by $W p_{i}$ as a box, which centered on the position of the current probe and takes 2D as the side length. Taking R as the effective actual distance determined by $W p_{i}$ , when the actual distance is less than R, the probability that the reported distance of the target user is not greater than $W p_{i}$ is higher compared with the probability that the reported distance is greater than $W p_{i}$ . The framework of the proposed geolocation algorithm is shown in Figure 3 and the main steps are as follows.

Figure 3.

Framework of WeChat user geolocation algorithm based on the relation between the reported and actual distances.

Data collection

This step includes recording the reported distance under different actual distances by dynamically adjusting the location of WeChat users at even intervals. The target reported distance $W p_{i}$ and the corresponding effective value $R$ are obtained based on statistical analysis of the collected actual distance/reported distance pairs.

Initial target space determination

This step includes scanning the potential area by modifying the probe’s location until the probe finds the target with the target reported distance. Utilizing the statistical characteristics of the probability distribution of the actual distance upper limit corresponding to the reported distances, the maximum actual distance $D$ between the probe and the target user is determined. Therefore, the initial target space is determined as a box with a side length of 2D.

Narrowing the initial target space

First, multiple query points are selected on one side of the initial space and the distance between the adjacent query points is $R$ . Second, probes are deployed in these points so that the union of areas determined by all probes with the side length of $2 R$ can cover half of the initial target space. So the initial target space is divided into two equal subregions: the overlapped region and the non-overlapped region. Third, each probe collects the reported distance of nearby users several times and the most frequently reported distance of the target user is taken as the reported distance of the target user.

If and only if the reported distances obtained by all the probes are higher than the target reported distance, the location of the target user is judged in the non-overlapped region; otherwise, the target user is determined in the overlapped region. This procedure is repeated until the larger side length of the current target space is less than $2 R$ , so that the current target space can be divided into two equal regions by deploying only one probe.

Geolocating the target user

This step includes selecting a single query point in each round, just as the original space partition–based algorithm, and partitioning the space based on the strategy of taking the most frequent reported distance as the reported distance of the target user. Iterating this step until the side length of the current target space is reduced to the preset threshold and the central point of the final target space is selected as the geolocation result.

Particularly, the key points of the proposed algorithm are as follows: how to determine the target space and how to partition the target space correctly, which will be discussed in sections “Determining the initial target space based on statistical characteristics of reported distance” and “Partitioning target space based on stepwise strategies,” respectively.

Determining the initial target space based on statistical characteristics of reported distance

How to determine the target reported distance and the corresponding geographical spatial scope are the first problems to be solved geolocating the target user. In this article, we determine the initial target space range based on the statistical characteristics of the actual distance upper limit corresponding to each reported distance (from 100 to 900 m).

Selection of target reported distance

The selection of the target reported distance should follow the following principles:

The upper limit of actual distances corresponding to the target reported distance is relatively stable.

The smaller reported distance should be selected under the same condition.

The target reported distance affects the scale of the initial target space range directly. The more stable the interval of actual distance corresponding to the target reported distance is, the easier it is to determine the initial space range.

The original space partition–based geolocation algorithm selects the minimum reported distance as the target reported distance and finds the subspace by checking whether the reported distance of the target user obtained by shifted probe is changed. The geolocation accuracy depends heavily on the stability between the minimum reported distance and the actual distance, as shown in formula (1). To verify the rationality of the algorithm, a real-world test is made. In our test, two WeChat accounts running on the mobile phone are deployed. As GPS could achieve the highest localization accuracy, the location of both accounts is faked by the same fake GPS application, Traveling Around. So we can set the location of WeChat accounts by inputting the latitude and longitude of the desired location, instead of moving in the real world. The experimental setting of the test is shown in Figure 4.

Figure 4.

Experimental settings of the minimum reported distance test.

The actual distance between two WeChat users, user A and user B, is adjusted at the intervals of 10 m. The actual distance when the reported distance of user A from user B is changed from 100 m to 200 m is recorded first, as X[i] in Figure 4. Taking the above process as one round test, 100 rounds of test are made. The frequency of the recorded actual distances is shown in Figure 5.

Figure 5.

The frequency of the recorded actual distances.

We can see from Figure 5 that, in practice, the minimum reported distance of WeChat is highly obfuscated. About 68% of the recorded actual distance is less than 100 m, which means that, in original space partition–based geolocation algorithm, when the reported distance of the target user is changed, the target user may be determined in the wrong subspace with a high probability. So it is not appropriate to select the minimum reported distance as the target reported distance under current conditions.

To solve the problem, we determined the target reported distance according to statistical characteristics of the reported distance. We take the variance of the actual distance upper limit distribution as the measure and the reported distance with minimum variance is taken as the target reported distance, which makes it easier to determine the target space range and reduces the possibility of misjudgment on geolocation procedure. Selecting the smaller reported distance if there is more than one reported distance with the minimum variance can reduce the side length of the initial space as well as the geolocation error caused by miscarriage of justice.

This study does not consider the actual distance lower limit distribution corresponding to the target reported distance and takes the target space as the box rather than a concentric ring, which can reduce the complexity of determining the initial space range and increase the possibility that the user’s location is covered by the determined space. What is more, if the reported distance is not too large, the determined box is approximately equal to the concentric ring.

Determination of initial target space range

As discussed in section “ Problem statement,” the original space partition–based geolocation algorithm takes the initial target space as a box with double reported distance (200 m) as the side length, when the reported distance of the target user is 100 m. However, for there is no stable correspondence between the reported distance and actual distance and we determine the target reported distance according to statistical characteristics of the reported distance, it is not suitable to select side length based on the same strategy.

To find the reasonable boundary of the initial target space, the cumulative probability of the actual distance upper limit distribution is used. The actual distance upper limit is the largest actual distance corresponding to the target reported distance in each round test. Table 1 shows the example of actual distance upper limit selection based on one round real-world test data.

Table 1.

The example of actual distance upper limit selection.

Actual distance (m)	Reported distance (m)
140	200
150	200
160	200
170	200
180	300

As shown in Table 1, if the target reported distance is 200 m, 170 m is taken as one actual distance upper limit record because it is the largest actual distance that makes the reported distance 200 m in this round test.

Calculating the value $D$ that makes the cumulative probability of the actual distance upper limit distribution for $W p_{i}$ up to $P$ , that is

P_{i} (d \leq D) = P

(2)

where $d$ is the actual distance. The initial target space where the target user is located is determined as a box centered on the location of the probe, with $2 D$ as the side length. The probability that the target user is covered by the determined initial target space is more than $P$ when the target user is found by the probe with the target reported distance.

The value of $D$ is determined according to the cumulative probability instead of the maximum actual distance value, which can make the target space cover the location of the target user with a high probability (if $P$ is large enough) as well as narrow the initial target space to reduce time consumption.

Partitioning target space based on stepwise strategies

Since the initial space is determined, we get the rough range where the target user is located. To geolocate the target user, we must partition the initial space to achieve higher accuracy. This study delimits the effective space range of the target reported distance and proposes stepwise strategies to improve the accuracy rate of space partition.

Delimiting effective space of target reported distance

For the value $D$ is selected to ensure that the determined initial target space covers the target user’s location with a high probability, the reported distance is very likely to be larger than $W p_{i}$ even if the actual distance is smaller than $D$ . For example, if the actual distance between the target user and the probe is smaller than but approximately equal to $D$ , the reported distance is more likely to be larger than $W p_{i}$ . If we shift the probe by $D$ in each round partition and the position of the target user is close to the middle line of the current space, wrong subspace will be selected based on the original subspace judgment strategy.

To mitigate this kind of misjudgment, we delimit the effective distance $R$ which is smaller than $D$ . Taking $R$ as the effective distance relative to $W p_{i}$ , meanwhile, as the distance between adjacent query points in each round partition. When the actual distance between the target user and the probe is less than $R$ , the probability that the reported distance of the target user is not greater than $W p_{i}$ is higher compared with the probability that the reported distance is greater than $W p_{i}$ , and the effective space bounded by $2 R$ is determined. When a probe is shifted by $R$ in one dimension of the target space, the probability that the reported distances of users in the overlapped space are no more than $W p_{i}$ is higher.

Improving the accuracy of space partition by stepwise strategies

The effective space with a side length of $2 R$ cannot cover half of the initial target space as $R$ is smaller than $D$ . To improve the accuracy of space partition, multiple query points are selected on one side of the current target space instead of only one query point. The selection of query points needs to meet the following conditions:

The union region of effective spaces defined by all query points can cover half of the current target area.

Users in the overlapped region have a lower probability that the reported distance from at least one query point is larger than $W p_{i}$ .

Users in the non-overlapped region have a higher probability that the reported distances from all query points are larger than $W p_{i}$ .

This article selects query points with the intervals of $R$ and sets the location of probes in these query points. The diagram of query point selection is shown in Figure 6.

Figure 6.

The diagram of query point selection.

Deploying probes at the location of query points, the actual distance between the adjacent probes is $R$ . If the target user is in the overlapped space, there is at least one probe, the actual distance of which from the deployed probes is more likely to be $W p_{i}$ . Each probe queries the target user repeatedly. For at least one probe, it will obtain more reported distances that are no more than $W p_{i}$ with a high probability. The most frequent reported distance is taken as the reported distance of the target user for each probe and it is determined in which half space the target user is located according to the reported distance obtained by probes, which can reduce the rate of misjudgment in probability. Repeating the above process until the larger edge of the current target space is less than $2 R$ , so that the effective space can cover half of the current target space. Then one probe is enough to partition the current space. The diagram of single-probe geolocation procedure is shown in Figure 7.

Figure 7.

The diagram of single-probe geolocation procedure.

Only one probe is deployed in each round and the probe is shifted by $R$ after each round partition, just as the original space partition–based geolocation algorithm. As the actual distance between points in the overlapped region and the probe is less than $R$ , the reported distance from the probe is more likely to be $W p_{i}$ if the target user is in the overlapped region.

Adopting the same subspace judgment strategy as in the previous procedure can reduce the rate that the location of the target user is predicted in the wrong subspace.

Experimental results and analysis

To verify the effectiveness of the proposed algorithm, we implement real-world geolocation experiments using three Xiaomi mobile phones running different WeChat accounts. In the procedure of data collection, we use two of these devices to collect reported distances under different actual distances. In the procedure of geolocation, two devices are taken as the target users and the other one is taken as the probe. We successively changed the locations of the devices to simulate different target users and probes. Traveling Around, a fake GPS location application in Android, is used to fake geographic locations of probes and target users. The parameters of the proposed algorithm are determined by statistical analysis of the collected reported distance data, and the comparison with the original space partition–based geolocation algorithm and the heuristic number theory based algorithm²⁷ is reported.

Experimental settings

In terms of parameter determination, the geographic locations of two WeChat accounts are faked by the same application, just as in the minimum reported distance test in section “Determining the initial target space based on statistical characteristics of reported distance.” The location of user A is set first, and user A leaves a record in the position using “People Nearby” service. Then, the location of user B is changed dynamically to make the actual distance between users A and B enlarge in increments of 10 m. User B queries nearby users to obtain the reported distance of user A after each changes the location. In this way, the reported distances of user A under different actual distances are recorded.

In terms of geolocation comparison, the proposed algorithm is compared with the original space partition–based algorithm and the heuristic number theory based algorithm. We select a geographical area of size 1000 m × 1000 m as our test field. The location of two devices is faked within the test field randomly using the fake location application. WeChat accounts running in the corresponding devices are taken as two target users with known GPS coordinates. After the two users are geolocated, the locations of the two devices are faked randomly again with new GPS coordinates, until 50 target users are deployed. In order to ensure the fairness of the experimental comparison, we employ the same scanning strategy with the original space partition–based algorithm and query over the test field at particular intervals of 100 m, as shown in Figure 8.

Figure 8.

The diagram of test field scan.

If the target user is found in the querying result list and the reported distance of the target user is the respective target reported distance for each algorithm, the geolocation procedure is triggered. The threshold for partition is 5 m. Experimental setting of the heuristic number theory based algorithm is in accordance with Peng et al.,²⁷ and we deploy 41 edge probes along the longitudinal edge and 41 edge probes along the latitudinal edge of the test area. What is more, 100 query points along the longitudinal axis and 100 query points along the latitudinal axis are selected.

Parameter determination

Taking it as one round test when the actual distance between users A and B is changed from 0 to 1000 m, and 100 rounds of test are carried out. The maximum actual distance corresponding to $W p_{i}$ in each round test is defined as the upper limit of the actual distance, so that each reported distance has 100 records. The variance of the actual distance upper limit for each reported distance is calculated, as shown in Figure 9.

Figure 9.

The variance of actual distance upper limit for each reported distance.

The figure shows that variances for each reported distance are fairly large, and variances have an increasing tendency with the increase of the reported distance. The value of 200 m is taken as the target reported distance because it has smaller variance. It is worth noting that the variance of 100 m reported distance is abnormally large. This abnormal phenomenon is caused by frequent appearance of the following situation: even if two users are geographically close to each other, the reported distance of user A from user B is 200 m or higher, rather than 100 m, which agrees well with the result of tests in section “Selection of target reported distance.” The reason for this may be that WeChat thought it privacy-threating if report relative distance with 100 m when the actual distance between users is short. Hence, the reported distance of 100 m will be avoided deliberately.

The probability distribution of the actual distance upper limit for the reported distance of 200 m is shown in Figure 10. It is observed that the actual distance upper limit of 200 m reported distance obeys the normal distribution approximately, and the upper value has the maximum probability at about 170 m. The probability that the reported distance changes to 300 m is higher when the actual distance between users is larger than 170 m. Based on the strategy of parameter selection, the values that make the cumulative probability of the actual distance upper limit probability distribution up to 95% and 50% are calculated, respectively, and the corresponding actual distances are taken as the initial target space range $D$ and the effective distance $R$ . In this way, the probability that the determined initial target space covers the location of target users is greater than 95%, and the determined effective space range meets the constraints. The obtained values are 225 and 173 m for $D$ and $R$ .

Figure 10.

Probability distribution of actual distance upper limit for the target reported distance.

Comparison of experimental results

In the experiment of geolocating 50 target users, the original space partition–based algorithm discovers 46 target users within the minimum reported distance successfully, and 46 results are obtained. Nevertheless, all target users appear in the querying result list with 200 m reported distance.

The reason for the fact that four target users are failed to be found by the original space partition–based geolocation algorithm may be that the existence of noise makes the WeChat server reported 200 m or higher to the probe when the probe is scanning the test field, even if the actual distance between the target users and the probe is less than 100 m. When the location of probes is shifted, the same situation comes up again until the potential area is fully scanned. For the proposed algorithm, as the shifting distance is 100 m, there are at least two query points, the actual distance of which to the target user is less than $100 \sqrt{2} m$ . As shown in Figure 10, the reported distance from the probe is more likely to be 200 m.

The geolocation error distribution is shown in Figure 11 and the geolocation result comparison is shown in Table 2.

Figure 11.

Geolocation error distribution.

Table 2.

Geolocation result comparison.

	Proposed algorithm	Space partition–based algorithm	Heuristic number ttheory based algorithm
Minimum error (m)	8.2	22.1	15.8
Average error (m)	56.6	68.9	63.1
Middle error (m)	55.4	69.7	63.3

As shown in Figure 11 and Table 2, under the same experimental conditions, the minimum positioning error of the proposed algorithm is less than 20 m; however, all the positioning errors of the original algorithm are higher than 20 m. The average error of the proposed algorithm is 56.6 m, which is lower than 68.9 m for the original algorithm and 63.1 m for the heuristic number theory based algorithm. Moreover, 56% of the localization error for the proposed algorithm is less than 60 m, which is higher than 34.8% of the original algorithm and 42% of the heuristic number theory based algorithm. The middle error of the proposed algorithm is 55.4 m, which is lower than 69.7 m for the original algorithm and 63.3 m for the heuristic number theory based algorithm. It is an interesting phenomenon that the difference between the middle and average errors is very small, which means that the distribution of geolocation errors for the three algorithms is both quite uniform.

The possible reason for the high geolocation error of the original space partition–based algorithm is that, even if the reported distances of the target users are the minimum reported distance, the actual relative distances between the target users and the probe are greater than 100 m. The initial target space has failed to cover the location of target users. What is more, frequently misjudge during space partition enlarges the deviation between geolocation results and actual location of target users. For the heuristic number theory based algorithm, the unstable relationship between the reported and actual distance makes the probes determine biased position on each dimension, which makes it hard to get accurate coordinates of target users. What is more, the procedure of deploying probes along the longitudinal and latitudinal axes will enlarge the geolocation error. It is obvious to see from the geolocation result that the proposed algorithm can geolocate WeChat users accurately in practical environment. The factors that affect the geolocation accuracy of the proposed algorithm includes errors of fake location applications, location providers of the device, low-probability misjudgment during space partition, and the time that users’ location is cached in the WeChat server.

Conclusion

The combination of social networks and IoT makes the security of IoT vulnerable to the threat of malicious users from social networks. So it makes sense to study how to geolocate malicious social network users. LBSD services of mobile social networks report the relative distance of nearby users in concentric bands. However, there is no strict correspondence between the reported distance and the actual distance. In this article, the relationship between the reported distance and actual distance of WeChat is analyzed. We improve space partition–based geolocation algorithm by selecting optimization parameters based on statistical characteristics of the relation between the reported distance and actual distance, and stepwise strategies are proposed to improve the accuracy rate of space partition. Experimental results show that the proposed algorithm has higher geolocation accuracy compared with the original space partition–based geolocation algorithm and the heuristic number theory based geolocation algorithm in several ways. It should be noted that the proposed algorithm is effective only when the target user uses the LBSD service and can be discovered by other users. In future work, we will focus on the difference of the relation between the reported and actual distances in different orientations, as well as the relation between the proximity of users in query result lists and the actual relative distance to the probe. The research hopes to provide time-efficient technical support for geolocating malicious LBSN users and raise the awareness of ordinary users for location privacy protection.

Footnotes

Handling Editor: Gary Leavens

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The work presented in this paper was supported by the National Key R&D Program of China (Nos 2016YFB0801303 and 2016QY01W0105), the National Natural Science Foundation of China (Nos U1636219, 61379151, 61572052, 61572445, 61672354, and 61772549), the Key Technologies R&D Program of Henan Province (No. 162102210032), and the Plan for Scientific Innovation Talent of Henan Province (No. 2018JR0018).

References

Atzori

Iera

Morabito

. The Internet of Things: a survey. Comput Netw 2010; 54(15): 2787–2805.

Sun

Wang

. Survey on the location privacy preservation in the internet of things. J Softw 2014; 25(1): 1–10 (in Chinese with English abstract).

Ziegeldorf

Morchon

Wehrle

. Privacy in the Internet of Things: threats and challenges. Secur Commun Netw 2015; 7(12): 2728–2742.

Elkhodr

Shahrestani

Cheung

. A review of mobile location privacy in the Internet of Things. In: International conference on ICT and knowledge engineering, Bangkok, Thailand, 21–23 November 2013, pp.266–272. New York: IEEE.

Atzori

Iera

Morabito

. The Social Internet of Things (SIoT)—when social networks meet the Internet of Things: concept, architecture and network characterization. Comput Netw 2012; 56(16): 3594–3608.

Atzori

Iera

Morabito

. From “smart objects” to “social objects”: the next evolutionary step of the internet of things. IEEE Commun Mag 2014; 52(1): 97–105.

WeChat hardware platform, http://iot.weixin.qq.com/

Tencent announces 2017 second quarter and interim results, https://www.tencent.com/en-us/company.html

Kwak

Lee

Park

. What is Twitter, a social network or a news media? In: Proceedings of the international conference on world wide web, Raleigh, NC, 26–30 April 2010, pp.591–600. New York: ACM.

10.

Nemelka

Ballard

Liu

. You can yak but you can’t hide. In: Proceedings of the ACM conference on online social networks, Palo Alto, CA, 2–3 November 2015, p.99. New York: ACM.

11.

Wang

. Whispers in the dark: analysis of an anonymous social network. In: Proceedings of the Internet measurement conference, Vancouver, BC, Canada, 5–7 November 2014, pp.137–150. New York: ACM.

12.

Jiang

Wen

. Identifying propagation sources in networks: state-of-the-art and comparative studies. IEEE Commun Surv Tut 2017; 19(1): 465–481.

13.

Wang

Wen

Xiang

. Modelling the propagation of worms in networks: a survey. IEEE Commun Surv Tut 2014; 16: 942–960.

14.

Wen

Haghighi

Chen

. A sword with two edges: propagation studies on both positive and negative information in online social networks. IEEE T Comput 2015; 64(3): 640–653.

15.

Shokri

Theodorakopoulos

Papadimitratos

. Hiding in the mobile crowd: location privacy through collaboration. IEEE T Depend Secure 2014; 11(3): 266–279.

16.

Polakis

Volanis

Athanasopoulos

. The man who was there: validating check-ins in location-based services. In: Computer security applications conference, New Orleans, LA, 9–13 December 2013, pp.19–28. New York: ACM.

17.

Luo

. Selection of rich model steganalysis features based on decision rough set α-Positive region reduction. IEEE T Circ Syst Vid. Epub ahead of print29January2018. DOI: 10.1109/TCSVT .2018.2799243.

18.

Zhang

Qin

Zhang

. On the fault-tolerant performance for a class of robust image steganography. Signal Process 2018; 146(5): 99–111.

19.

Ling

Luo

. TorWard: discovery, blocking, and traceback of malicious traffic over Tor. IEEE T Inf Foren Sec 2015; 10(12): 2515–2530.

20.

Ling

Luo

. Tor bridge discovery: extensive analysis and large-scale empirical evaluation. IEEE T Parall Distr 2015; 26(7): 1887–1899.

21.

Ling

Luo

. Protocol-level hidden server discovery. In: Proceedings of the 32nd IEEE International Conference on Computer Communications (INFOCOM’13), Turin, 14–19 April 2013, pp.1043–1051. New York: IEEE.

22.

Zheng

. Location-based social networks: users. In: Zhou

Zheng

(eds) Computing with spatial trajectories. New York: Springer, 2010, pp.243–276.

23.

Hoang

Asano

Yoshikawa

. Your neighbors are my spies: location and other privacy concerns in dating apps. In: Proceedings of the 18th international conference on advanced communication technology, Pyeongchang, South Korea, 31 January–3 February 2016, pp.715–721. New York: IEEE.

24.

Chithrapavai

. Preserving location privacy in geosocial applications. In: Online international conference on green engineering and technologies, Coimbatore, India, 27 November 2015, pp.1–4. New York: IEEE.

25.

Washington Post. Could using gay dating app Grindr get you arrested in Egypt?https://www.washingtonpost.com/news/worldviews/wp/2014/09/12/could-using-gay-dating-app-grindr-get-you-arrested-in-egypt/?utm_term=.f4394783b390

26.

Xue

Liu

Ross

. I know where you are: thwarting privacy protection in location-based social discovery services. In: Proceedings of the IEEE conference on computer communications workshops, Hong Kong, China, 26 April–1 May 2015, pp.179–184. New York: IEEE.

27.

Peng

Meng

Xue

. Attacks and defenses in location-based social networks: a heuristic number theory approach. In: Proceedings of the international symposium on security and privacy in social networks and big data, Hangzhou, China, 16–18 November 2015, pp.64–71. New York: IEEE.

28.

Cheng

Mao

Xue

. On the impact of location errors on localization attacks in location-based social network services. In: Wang

Ray

Alcaraz Calero

. (eds) Proceedings of the international conference on security, privacy and anonymity in computation, communication and storage. Cham: Springer, 2016, pp.343–357.

29.

Ding

Peddinti

Ross

. Stalking Beijing from Timbuktu: a generic measurement approach for exploiting location-based social discovery. In: Proceedings of the 4th ACM workshop on security and privacy in smartphones & mobile devices, Scottsdale, AZ, 7 November 2014, pp.75–80. New York: ACM.

30.

Zhu

Gao

. All your location are belong to us: breaking mobile social networks for automated user location tracking. In: Proceedings of the 15th ACM international symposium on mobile ad hoc networking and computing, Philadelphia, PA, 11–14 August 2014, pp.43–52. New York: ACM.

31.

Polakis

Argyros

Petsios

. Where’s Wally? Precise user discovery attacks in location proximity services. In: Proceedings of the ACM SIGSAC conference on computer and communications security, Denver, CO, 12–16 October 2015, pp.817–828. New York: ACM.

32.

Wang

Xue

Liu

. Data-driven privacy analytics: a WeChat case study in location-based social networks. In: Xu

Zhu

(eds) Wireless algorithms, systems, and applications. Cham: Springer, 2015, pp.561–570.

33.

Huang

Narayanan

. Trilateration-based localization algorithm using the Lemoine point formulation. IETE J Res 2014; 60(1): 60–73.