Abstract
WiFi-based indoor localization has attracted recent research attention. Large space layout is a special and more complex indoor environment. Most existing indoor localization methods lead to poor accuracy and many of them are not suitable for large space environments. In this article, we propose a novel approach for indoor localization and navigation. In our approach, the expensive training is avoided by utilizing the concept of pre-scheduled path and automatically mapping the WiFi fingerprints to it. For online tracing, we utilize historical sensor data to delineate users’ trajectory and calculate the similarity to all possible paths on the map, then the system chooses the most similar one as the result. The proposed work is evaluated and compared with previous methods. The results show that our approach improves accuracy by 80%.
Introduction
In recent years, with the development of ubiquitous computing and mobile computing, there is a growing use of location-based services (LBS). As the precondition of indoor LBS, mobile indoor localization attracts increasing interests. Although the accuracy of previous methods can reach to the centimeter level using ultra wideband (UWB),1,2 infrared and radio frequency (RF),3,4 beacon, 5 and multi-antennas, 6 those systems need to deploy extra facilities, such as UWB senders and receivers. Given that many buildings are equipped with WiFi access points (APs), WiFi-based localization system becomes widely acceptable by users.
However, WiFi-based localization is particularly challenging since the radio signal is often affected by some external factors, such as reflection, refraction, multi-path, shadow fading, scattering, and temporal dynamics.7–9 In a large space indoor environment, these effects lead to poor accuracy if previous indoor localization methods are used for large space environments.
Currently, large space indoor layout (such as super market and shopping malls) is common in companies due to the need for large space utilization and flexible decoration. However, the large space also causes incomprehensive position cognition when an unfamiliar visitor enters. For many large space indoor environments, unfamiliar visitors generally desiderate navigation or localization service to help them perceive surroundings. Therefore, solving the localization problem in large space indoor environments is of significant value.
In most buildings, signals may encounter a considerable drop when passing through a wall. 10 Owing to the fact that the received signal strength (RSS) values of the same AP are significantly different in different rooms, and the diversity of RSS can be used to distinguish different rooms. In large space indoor environments, such approach is not possible because of lacking of walls.
Compared with other indoor conditions, indoor localization is more necessary for large space environments. In general buildings, rooms are labeled by interrelated room id/name (e.g. continuous numbers). A person can identify his location by the label when he is in the corridor, as long as the room is not too large to tell the exact location. For large space buildings, there often exists obstructions (such as containers, bookshelves, and people), thus it is difficult to tell the location for a new visitor.
In many indoor localization systems, there is a simplification that the collected WiFi fingerprint database is static. The generated data set will be used forever. However, in the real world, the WiFi signal strength will change over temperatures, 11 devices, 8 and even the human body that can also absorb signals. 12 Therefore, the RSS value varies with different temperatures, human traffic, and devices.
The static database cannot adapt the fickle RSS values and will lead to a huge positioning error. To achieve a high-accuracy localization, the fingerprint database should be updated frequently. However, updating the fingerprint database is very costly. There are also a lot of indoor positioning systems without training. For instance, Bisio et al. 13 generate the database through finite-difference time-domain (FDTD) simulations of the electromagnetic propagation that needs to measure the dielectric properties of all the materials in the indoor environment. Redpin 14 is also a training-less indoor localization system which relies heavily on users’ interaction. WILL is used to map WiFi fingerprints to rooms in a wall-partitioned indoor environment only, but not fitting large space areas. 10 Hence, a high-accuracy indoor localization without training is necessary for large space environments.
In this study, we obtain users’ motion by analyzing the sensor raw data from mobile phones. By mapping the motion to a map, we successfully remove the labeling process of traditional approaches. To achieve competitive localization accuracy, we leverage the trajectory mapping and fingerprint set matching. As a result, our approach is not desired for labeling measured data with corresponding locations and get a good localization accuracy. Our contributions can be summarized as follows:
We present a low-cost indoor localization system by avoiding the data collection phase by making pre-scheduled paths. When collectors walk along the path, the system automatically collects WiFi fingerprints and then maps them to the floor plan.
We propose a high-accuracy indoor localization algorithm for large space environments. The localization in this algorithm is computed by a trajectory mapping method. The trajectory is formed through measuring the inertial measurement unit (IMU).
We implement a demo system to evaluate the performance of our method. Our results show a substantial improvement in accuracy.
The rest of this article is organized as follows. In section “Related work,” we present the basic techniques of estimation of position using a WiFi device. We introduce our system design and present the detailed algorithm in section “System design.” In section “Simulation,” we describe the system implementation and report evaluation results. Finally, section “Conclusion and future work” concludes the article with future directions.
Related work
Previous WiFi-based indoor localization mechanisms (Figure 1) are generally divided into two categories: fingerprinting based and model based.

Taxonomy of indoor localization systems.
The fingerprinting-based approach contains two stages, offline stage and online stage. In the offline stage, the RSS collector needs to take a mobile device and walk through the building to mark locations and collect RSS values. Then, store RSS values and corresponding coordinates into fingerprint database. In the online stage, the mobile device collects the real-time RSS values and matches them to the RSS values stored in the fingerprint database and then determines the location of the user. Most of these methods utilize RF signals. Radio detection and ranging (RADAR) 15 is an early system using this technique. Horus 16 improves upon RADAR by employing a stochastic description of the RSS-location relationships and using a maximum likelihood–based method to estimate locations. However, this technique faces a great deficiency that it is pretty costly to collect and label the training data in a large scale building.
The model-based method applies a geometrical way to work out locations, and the information used in the location calculation contains RF propagation distances and all AP locations. These approaches assume the signal strength is consistent with a specific model (such as log-distance path loss model). Due to the effects of multi-path and shadow fading, the WiFi strength is quite noisy and highly depends upon environment. Although some previous works have been introduced, for instance, Wang et al. 17 present a curve right–based indoor localization approach to construct a fitted RSS-distance function for each transmitter, PinPoint, 18 Cricket, 5 and VOR, 19 there is always a large error for this kind of method.
There are many studies presented to improve the accuracy of fingerprint-based indoor localization. Since smartphones have been equipped with a number of sensors, leveraging inertial sensors to improve localization accuracy is gradually possible. Based on the inertial sensor data, we are able to know and record users’ movements, such as sitting, walking, and running.20–23 The combination of IMU and the WiFi RSS is introduced by Yang et al., 24 and Zhang et al.25,26 make a fusion of WiFi RSS, IMU, and map information to get a better accuracy. Also, the filter technique is widely used to improve accuracy, such as Chen et al. 27 and Evennou and Marx. 28 utilize Kalman filter with WiFi RSS and IMU for precision localization. Sheng et al. 29 recognize that RSS measurement points cannot be collected densely in large open indoor environments. However, they do not deal with the complexity in large space environments.
Different from the previous works, our indoor localization approach is a combination of WiFi RSS, IMU, and map information and adapts the large space layout (also fits for general environment) without the expensive training. Thus, users will not be involved in data collection.
System design
Motivation
The most costly and error-prone stage in the RSS-based system is the offline fingerprint database manual collection. Because of the time varying characteristic of WiFi signal (as shown in Figure 2), the labeled data are often the average value of many samples at the same position. 30 For orientation aware scenario, the labeled data at known locations should be collected toward different orientations. 31 Assuming that once the collection and labeling job for a collector takes a minute, the distance among reference points is 2 m. For a building which has five floors and each floor is 200 m×100 m, there will be at least 416 h of collection work to training a fingerprint database, which is evidently a huge labor time. Therefore, the RSS-based indoor localization method without or with a little survey is more efficient and interesting.

RSS value changes over time.
Because of the expensive cost, when the fingerprint database is generated, it is rarely updated, which cannot fit the change caused by variable environments (temperature, human density, devices, etc.), and will lead to low localization accuracy.
For the large space layout building, owing to lacking of walls, the considerable drop while passing through a wall is missing. This makes it more challenging for the large space indoor localization system.
Taking the above into consideration, we present an indoor localization approach for accurate positioning and navigation for large space indoor environments.
System architecture
In this work, as shown in Figure 3, the localization approach consists of four components: coarse trajectory forming (CTF), fingerprint mapping (FM), tracking service (TS), and WiFi fingerprint database. In this section, we present high-level architecture of our method.

System architecture.
There is little value to localize a user at an independent time point, but the trajectory with time property is valuable for both users and organizations who want to analyze user behavior. Consequently, the temporal and spatial attributes of user’s localization are essential for an indoor localization system.
In this view, we should solve tracking problem instead of localization problem, and the tracking task is not a simple combination of independent localization tasks in a consecutive time point.
Due to the randomness of human movement, the well-known Kalman filter that is well used in a linear situation is not appropriate for indoor tracking. In this work, we record a fixed size of historical information for tracking user trajectory.
CTF is the foundation of FM and TS. During the movement, collectors/users do not need to collect data on purpose, just carry mobile devices (such as cell phone and tablet) and turn on WiFi switch, then system will auto-collect WiFi signal strength, accelerometer sensor data, and magnetometer sensor data with timestamp. Then, the component forming a coarse trajectory can be constructed by the sensor data.
FM is used by collectors who only move around the building (this role can be carried out by security guards, cleaning staff, etc.). These collectors need to wander along the pre-scheduled path and walk around the building periodically. Figure 4 shows an example of pre-scheduled path for librarian in Library of Jilin University. The coarse trajectory generated in this walking will be mapped onto the pre-scheduled path. Then, the corresponding WiFi fingerprints in the trajectory are mapped onto the floor plan. Eventually, every point in the floor plan owns a fingerprint. The database will be updated when the new data belonging to an existing pre-scheduled path come.

An example of pre-scheduled path for librarian in JLU library.
TS is utilized by users who want to employ the localization service. The system collects the WiFi signal strength with sensor data and then stores these data temporarily. By historical sensor data, the system can form the user’s trajectory and then consult the database to obtain the most possible path.
Fingerprint database is the fundamental component for FM and TS. It stores the output of FM that are fingerprints with corresponding coordinates and is the input of TS. Updating fingerprint database is triggered by FM and TS.
In our approach, training work is greatly simplified by removing the location labeling. Collectors only need to carry their mobile devices and walk along the pre-scheduled paths. Then, the fingerprint database will be auto-generated. We use tracking to replace localizing to achieve temporal and spatial consecutiveness. Since our approach achieves a good localization accuracy in a more complex large space indoor environment, it will also fit for more general scenarios.
CTF
WiFi RSS values, accelerometer sensor data, and magnetometer sensor data are collected by collectors’ mobile devices. The record can be represented as
where
We can get the length of user’s trajectory by the accelerometer raw data with timestamp. However, high-performance sensors are prohibitively expensive, and due to the existing noise from sensor raw data, error accumulation is also large. 32 Fortunately, we can adopt the time-labeled accelerometer raw data to count steps. Then, the steps can be utilized to estimate the walking distance. 33
We consider the case that user’s trajectory T can be separated into a set of lines
Since the directions of steps are obtained by magnetometer sensor data and there may be an angle a between the mobile devices in hand and facing direction, the measured direction
The CTF algorithm is presented for forming the coarse trajectory. The input in CTF algorithm is the original records of sensor data, and output is the coarse trajectory of users. There are two for loops in CTF. The outer loop processes each record, and the inner loop transforms the raw accelerometer data to steps and deals with the steps in each record. Therefore, the algorithm can be done in O(n) time, n is the total step number. Let
Mapping fingerprints
The coarse trajectory cannot be directly used for localization because the component
Since the coarse trajectory from a collector is a tracking of pre-scheduled path, each line from this trajectory can be mapped to a real line in the floor plan. According to the
Assuming a user walks along this line with a fixed step length, then the length of each step in this line is
where n is the step number in this line. Then, the coordinate of starting point for ith step is
and the coordinate of ending point for ith step is
Then, the coordinate of WiFi signal’s fingerprint can be obtained by its corresponding record. The mapping relationship of fingerprint and coordinate can be stored in database as
Since the RSS value varies with external factors (human density, temperature, etc.), to achieve an accurate localization result, the fingerprint database should be updated with the change of environment. Because the mapping fingerprint operation occurs as long as the collectors wander around periodically, the update is triggered frequently. When the TS is available for users, the query data are not only used for localization but also updates the fingerprint database. Therefore, the fingerprint database is suitable for environments.
The following FM algorithm is presented for mapping the WiFi signal fingerprints to the coordinate of the floor plan. In FM, the input is the pre-scheduled path with its coordinate, the collected records of WiFi fingerprints, and sensor data, and the output is the map set of fingerprint and coordinate. If there are some similar pre-scheduled paths (as shown in Figure 5, the shapes of these paths are same but part of the length of lines are different), the approach cannot distinguish from the correct mapping. Therefore, the pre-scheduled path should be one of the inputs of FM. There are two for loops in this algorithm; from line 2 to line 6, that is, the outer loop and each loop processes one virtual line from the trajectory. The inner loop, from line 5 to line 6, deals with one single virtual line, and each loop handles one record in

An example of similar pre-scheduled paths.
TS
Different from the traditional ways, fingerprints obtained from FM are not the average value of multiple measurements. To overcome this shortcoming about inaccurate fingerprints, we leverage the trajectory matching to fetch all similar path sets. Then, we adopt the fingerprints matching to find out the most possible path from the path set.
The TS stores a fixed size of historical records
Assuming users walk in a constant step length, the possible length is from
For each
Then, the tracking mapping problem can be described as follows
where
Essentially, TS in our approach is a graph mapping problem. The fingerprints are the nodes, and edges are the geographical links with orientation attribute. The problem can be described as mapping a sub-graph to the whole graph and tell which is the corresponding newest node.
The graph mapping problem is NP-hard. Thus, there are many possibilities for a specific trajectory and the calculation of likelihood is costly. Therefore, we need to narrow down the number of possible trajectories. Here, a cluster technique is presented to classify the WiFi fingerprints.
Considering the nature of signal propagation that the closer to the signal source is the greater RSS will achieve. Therefore, for the fingerprints surrounding AP k, the kth RSS is bigger than the others (the distance to AP, k, is shorter than the distances to other APs). Then, we can use the index of the biggest RSS in a fingerprint as its category. As a result, the fingerprints belonging to category k are closer to AP k than others, which means fingerprints are geographically separated into n pieces.
The cluster of a fingerprint F is k if
In this approach, only a sorting operation is used to estimate which cluster a fingerprint belongs to. Thus, our cluster approach is much less complex than the ones proposed before (such as the approaches by Frey and Dueck 34 and Feng et al.30,35).
Since the cluster function is a deterministic function, the cluster of each WiFi fingerprint will not be changed after it is measured. To expedite online tracking, clustering can be done offline.
In our approach, the possible trajectory is filtrated considering whether the last fingerprint and the ending record’s fingerprint fall into the same cluster. If they belong to the same cluster, then add the trajectory to the possible trajectory set.
Another way to reduce the potential trajectories is to narrow down the gap of
The TS algorithm is presented for localizing the users’ positions. The inputs of this algorithm are the floor plan of the building, the historical records, and the WiFi fingerprint database, and the output is the trajectory which is corresponding to the historical tracking data. In this algorithm, there is a double-deck loop. The outer loop iteratively finds out the most likely length of user’s steps. The inner loop fetches the most likely trajectory for the given length of a step. The variable c in TS denotes the cluster of
The WiFi fingerprints measured by TS are not only used to localize mobile devices but also update fingerprint database to keep the database latest with the environment varying.
Simulation
Simulation setup
All experiments in this article are conducted in part of second floor in the Library of Jilin University. We deploy 12 APs in the 20 m×20 m area. The distribution of clusters generated by clustering is demonstrated in this section. We illustrate how the AP number, update period, and historical time effect the positioning accuracy. And the stability and performance are also considered in this section.
Comparison schema
In many indoor localization approaches, K-nearest neighbors (KNN) and Kalman filter based are two classic indoor localization approaches and used in many research works.27,36
KNN goes through the fingerprint database and picks up k reference points whose fingerprints match best to the real-time one.
Let
where O represents the entire fingerprints’ database or fingerprints’ set acquired from coarse localization stage, and F is the real-time WiFi fingerprint.
Kalman filter can be illustrated as
where process noise
where
The prediction for system status is
The correction for system status is
The final position estimation is obtained by the Kalman filter that integrates system state estimation with real-time WiFi fingerprint. The parameter
Localization accuracy
Since we choose the strongest RSS index as the indicator to cluster fingerprints, and the nearer the reference point to the AP, the stronger the RSS value, the fingerprints belonging to the same cluster are geographically adjacent. Moreover, the area of one cluster is related to the corresponding AP’s power. The clustering result is shown in Figure 6. Each colored point is a fingerprint, and the fingerprints with the same color belong to the same cluster.

Cluster of WiFi fingerprints at JLU library.
In a complex indoor environment, the coverage range of a single AP is limited, and the weak RSS will lead to a poor result. Thus, we evaluate relationship between the number of APs and the localization accuracy. Figure 7 shows that the mean errors of systems, respectively, using KNN, Kalman filter, and TS sharply drop as the AP number varies from 3 to 6, and then the downtrend of mean error decreases slowly. To further study their relationship, we use the concept of WiFi density.

AP number versus mean error, update period 0.5 h.
As shown in Figure 8, with the strengthening of WiFi density, the mean error of the localization systems goes down. As one AP may not cover the whole field, the minimum WiFi density and the AP number are not in linear correlation. Here, we use a homogeneous deployment way to maximize the minimum WiFi density. As the result indicates, the curves almost match the tail of Figure 7.

Density versus mean error, update period 0.5 h.
The RSS value is effected by temperature and human traffic; thus, the localization accuracy is correlative with the freshness of WiFi fingerprints. As shown in Figure 9, when the update period is exponentially enlarging, the mean errors nearly linearly increase. As the time goes on, since the variation of temperature and human traffic will not exceed the maximum value, the mean errors decline is almost stable. From Figure 9, we can see whether the WiFi fingerprints update timely, and the localization accuracy can improve about 10%.

Update period versus mean error, nine APs.
The TS stores a fixed number of historical records, and the records are used to describe the trajectory of users. Thus, the more records there are, the less particular information can be used which leads to less matched paths. Since TS compares the likelihood of the whole historical fingerprints with the possible trajectory fingerprints stored in database, the longer the fixed number is, the less influence caused by RSS fluctuating will be. As illustrated in Figure 10, the mean errors go down as the historical time magnifies. Particularly, when there is only one record, the approach degenerates to be KNN.

Mean error versus historical time.
Stability is also an importance attribute for the localization approach. Figure 11 shows a 10-min execution for TS. Let

Mean error versus execution time, historical time 5 s.
Reducing the training time is an effective way for service providers to cut their cost. Comparing with the traditional way, as illustrated in Figure 12, our approach is much less costly in training. The traditional training method needs to label the location and collect the average RSS value which cost about a minute for an reference point (RP). In our approach, the collector only needs to walk by. Although, we do not use the average RSS value, we still get a high localization accuracy because of the trajectory matching.

Training time.
We further compare our approach with the KNN method and Kalman Filter method. The KNN is just a connection of individual localization results. Kalman Filter method uses the Kalman filter in the tracking. The cumulative distribution function (CDF) is shown in Figure 13. The results of all methods are Laplace distributed, and TS has a smaller

CDF of mean error, update period 0.5 h, nine APs.
Conclusion and future work
In this work, we propose a training-less indoor localization approach by automatically collecting the WiFi RSS and sensor data and then mapping the WiFi fingerprints to the coordinates in the corresponding pre-scheduled path. Through calculating the likelihoods with all possible trajectories formed by historical sensor data and then choosing the most similar one as the localization result, TS achieves an accurate positioning approach for the large space environment which is prevalent in modern buildings. Since our approach achieves a good localization accuracy in a more complex large space indoor environment, it will also fit for more general scenarios.
In this work, we assume users always hold the mobile device in hand, but he or she may put it in his pocket and watch it when necessary, which makes the magnetometer sensor useless. Therefore, in the future, we will consider the relationship of the value of sensor data with the step length and the situation with missing magnetometer data.
Footnotes
Academic Editor: Gang Wang
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work is supported by the National Key Research and Development Program of China (Grant Nos 2016YF B0201503 and 2016YFB0701101), National Natural Science Foundation of China (NSFC) (Grant Nos 61602205, 51627805, and 61170004), Specialized Research Fund for the Doctoral Program of Higher Education (20130061110052), Major Special Research Project of Science and Technology Department of Jilin Province (20160203008GX), Key Science and Technology Research Project of Science and Technology Department of Jilin Province (20140204013GX), and EUFP7 Projects EVANS (GA-2010-269323) and MONICA (GA-2011-295222).
