Abstract
This paper presents a nonparametric bootstrap multihop localization algorithm for large-scale wireless sensor networks (WSNs) in complex environments. Unlike most of the existing schemes, this work is based on the consideration that it is not feasible to obtain a lot of available distance measurements sample for estimation and to get exact noise distributions or enough prior information for conventional statistical methods, which is a situation commonly encountered in complex environments practically. For the first time, we introduce a nonparametric bootstrap method into multihop localization to build confidence intervals for multihop distance estimation, which can eliminate the risk of small sample size and unknown distribution. On this basis, we integrate the interval analysis method with bootstrap approach for ordinary nodes localization. To reduce the computational complexity, boxes approach is utilized to approximate the irregular intersections. Simulation results show that our proposed scheme is less affected by the variation of unknown distributions and indicate that our method can achieve high localization coverage with relatively small average localization error in large-scale WSNs, especially in sparse and complex network with smaller connectivity and anchor percentage.
1. Introduction
In the last few years, there has been a rapidly growing interest in extensive monitoring applications of wireless sensor networks (WSNs). One important reason is that, as opposed to traditional solutions, WSNs can be rapidly and randomly deployed in a large-scale monitoring region by plane or unmanned aerial vehicle (UAV), of which the environments of monitoring region are always complex even inaccessible.
As new technologies, WSNs can provide the means for long-term, all-weather, real-time, accurate, and extensive monitoring unattended, which brought us a convenient way to sense and monitor the complex environments. It is considered as an ideal system for this type of extensive environment monitoring and can fulfill the needs of various practical applications, for example, marine surveillance, ocean scientific exploration, and commercial exploitation.
For most WSNs applications, localization service is an indispensable part and an essential task. The location of the sensor nodes should be determined for meaningful interpretation of the sensed information. In recent years, a certain amount of research work has been conducted in this interesting research area, and a comprehensive survey is provided in [1–3] and the references therein. Some of the solutions are mainly designed for small-scale networks, and although this is also an interesting field, we will not contribute to the region in this paper; instead, we focus on the large-scale localization context.
As we know, out of the consideration of economic rationality, generally, it is too hard to deploy the sensor nodes in a large-scale area with relatively high density and large percentage of anchor nodes. If it is used for monitoring ocean environment, the network will be deployed sparsely and only a minority of anchor nodes can be employed for localization. In these cases, it will be harder for ordinary nodes to communicate with enough anchor nodes directly so as to measure distances, which are necessary for localization.
To solve this problem, three types of localization schemes are proposed for large-scale localization, namely, mobile anchor assisted localization algorithms [4–6], recursive algorithms [7–9], and multihop algorithms [10–12], respectively. Hereinto, mobile anchor assisted localization methods commonly employ some expensive mobile equipment such as Autonomous Underwater Vehicle (AUV) which can roam across the monitoring region to assist localization. In recursive algorithms, some ordinary nodes that have been localized become secondary anchor nodes and broadcast coordinates to assist other nodes in estimating their locations. The localization process will inevitably suffer from the delay and adverse effects of error propagation and accumulation. Whereas for multihop localization, the ordinary node can infer the distances to its nonneighboring anchor nodes by approximating the length of the shortest path to the Euclidean distance, so as to get enough anchor nodes with known distances for localization. They not only can provide better real-time performance but also have no additional equipment requirement. In comparison, the characteristics of multihop methods are more suitable for large-scale WSNs localization. Therefore, multihop localization method has received more and more attention in recent years.
Certainly there are still many challenges for the multihop localization, especially in the complex environments. Here we define the complex environments as multiple complex terrain conditions (e.g., forest, rocks, marsh, underwater, etc.), random and sparse sensor node distribution, irregular radio propagation pattern, various unknown ambient noises, and multiple anisotropic network situations (e.g., H-type network, C-type network, etc.). There is no doubt that these adverse factors will seriously affect the accuracy of the multihop distance estimation which is the basis of the accurate sensor location estimation. To address this problem, most of the existing schemes have adopted conventional statistical methods and considered distance measurements affected by normal distribution noise and then determined the localization uncertainties through Monte Carlo analysis or nonlinear transformation techniques. For example, in [13], a dynamic localization scheme, the Monte Carlo localization (MCL), has been proposed. In [14], based on the discovery that the multihop distance errors obey normal distribution with various biases, a multihop localization algorithm that incorporates the distance estimation bias has been presented. But for the previous methods, only when the measurements sample size is large and the precondition that the errors obey normal distribution is satisfied, they could perform well in online localization.
However, for the large-scale WSNs in complex environment, it is too hard to get exact distribution characteristics of distance measurements and enough a priori information which is necessary for conventional statistical estimation method, and it is impossible to guarantee repeatable ranging conditions for reproducible measurements so as to obtain a large number of available measurements sample for Monte Carlo analysis or other similar methods. Hence, there is a need to develop a novel algorithm for producing coordinates through multihop localization approach as only very few measurements are available. And it is better that this methodology only needs less prior information and even no statistical assumptions on disturbances.
On the basis of analysis on the challenges of complex environments and limitations of existing studies for large-scale localization, a different approach is proposed and investigated, that is, a multihop localization method based on nonparametric bootstrap for large-scale WSNs in complex environments.
The bootstrap is a powerful technique for assessing the accuracy of a parameter estimator in situations where conventional techniques are not valid. The nonparametric bootstrap method was originally introduced by Efron in [15] and used to address CI estimate for statistics based on independent and identically distributed (i.i.d.) random variable from some unknown distribution
This paper is organized as follows. Section 2 introduces the related works and challenges for large-scale WSNs localization. Section 3 presents in details the nonparametric bootstrap and interval analysis method. Section 4 evaluates the performance of our algorithm through experiments, and Section 5 concludes this paper.
2. Related Works
2.1. Mobile Anchor Assisted Localization Algorithms
To solve the localization problems of large-scale WSNs, in [4], the authors employ a single mobile beacon to aid in localization. The sensor locations are maintained as probability distributions that are sequentially updated using Monte Carlo sampling as the mobile anchor node moves over the monitoring area. This method relies on the more powerful anchor to perform the calculation and relieves most localization tasks from the less powerful ordinary nodes.
The authors in [5] proposed a range-free localization scheme for WSNs using mobile anchor nodes equipped with four directional antennas. Therein, each mobile anchor node can determine its position via Global Position System (GPS), and then it broadcasts its coordinates as it moves through the region. The ordinary nodes detect these messages and utilize a simple processing scheme to determine their own coordinates based on those of the anchors.
In the AUV-aided localization scheme proposed in [6], AUVs keep roaming across the underwater sensor field to aid in localization. The AUVs can get coordinates from GPS while floating periodically and dive into a fixed depth and navigate through a predefined route using compass and dead-reckoning. They can estimate the distance between the AUV and ordinary nodes through a request/response message pair exchange which contains its coordinates. Therefore, the ordinary nodes can be localized after the message exchange from three different noncoplanar AUV locations.
Although these protocols exploit the mobility of the mobile equipment to overcome the lack of adequate anchor nodes, the major drawbacks of these mobile anchor assisted localization algorithms are that the mobile machine is too expensive for WSNs, and the slow speed of AUV or other machine always introduces high localization delay. In addition, the movement of the anchor nodes will be severely restricted by the complex environments.
2.2. Recursive Localization Algorithms
Liu and Zhang [7] presented an error control mechanism for recursive localization based on the characteristics of node uncertainty and the active selection strategy of anchor nodes. The error control mechanism only utilizes local knowledge and can mitigate the effect of error propagation for both range and directional sensors to a certain extent.
Yu et al. proposed a two-stage localization approach in [8]. Firstly, localization process starts from the nodes with the largest numbers of neighbor anchors which have the priority. Then, the coordinates of all neighbor nodes are exploited to improve localization accuracy. During the procedure, a number of measures are also taken to ensure the reliability of each location estimate to avoid abnormal errors and reduce error propagation.
Vemula et al. [9] formulated the sensor localization from a probabilistic point of view and incorporated anchor position uncertainty to estimate the distribution of node coordinates, including iterative least squares and Bayesian methods, Monte Carlo importance sampling, and cost-based methods.
These schemes try to inhibit the propagation and accumulation of localization errors, but the high computational complexity and increased communication cost limit their application in practice. Additionally, this kind of method is also easy to cause the localization delay.
2.3. Multihop Localization Algorithms
DV-distance and DV-hop algorithms, as the origination of multihop localization schemes for WSNs, are proposed in [10]. In both algorithms, each anchor node broadcasts a message to its immediate neighbor nodes firstly. Then, the message is propagated in a controlled flood manner so that each ordinary node can estimate the lengths of shortest paths to anchor nodes. When the ordinary node obtains the estimates to enough anchor nodes, its position can be calculated.
Wang and Xiao [11] presented an improved multihop algorithm called i-Multihop which has higher computational complexity. At the beginning, the upper bound constraints are used to filter out the incorrect distance estimations and the estimated position is pinpointed to the intersection constrained by the correct distances. And then, the distance fitting is used to fit correct distance measurements, which makes the final estimated position not to be affected by the layout of anchor nodes.
The authors in [12] proposed three multihop localization schemes based on least squares and multilateration, namely, Taylor-LS, weighted Taylor based least squares, and constrained total least squares, respectively. Additionally, a generalized Cramér-Rao lower bound is developed to analyze the performance of multihop localization approaches.
As we can see, most of the previous schemes considered the measurement uncertainty as normal distribution even zero-mean normal distribution noise and employ conventional error-processing approaches, for example, nonlinear least squares estimator (NLSE), and so forth. However, in complex large-scale WSNs, it is practically too hard to fulfill the requirements which are necessary for these approaches, such as enough a priori information, exact distribution characteristics, and large sample size.
Specifically, for the ordinary nodes, there are not so many anchors that can communicate with them to accomplish distance measurement. The ranging process also cannot be performed again and again due to the limited energy of sensor nodes. Even when cost is not a main concern, it is also impossible to guarantee repeatable conditions for reproducible distance measurement. Hence, it is not practically feasible to obtain a lot of available distance measurements sample for estimation. Moreover, in complex environments, for example, underwater environments, the ranging process will suffer from various non-Gaussian noises, for example, the multipath and waveguide effects, surface scattering, and so forth. Obviously, for the previous noises, it is too hard to get exact distribution characteristics and enough prior information which are necessary for statistical methods. It is also unreasonable to consider that the uncertainties always obey normal distribution. Hence, in this paper, the bootstrap localization method is proposed for large-scale WSNs in complex environments.
3. Nonparametric Bootstrap-Based Localization Algorithm for Large-Scale WSNs
To solve the problems mentioned earlier, in this paper, we propose a novel multihop localization scheme which integrates a nonparametric bootstrap distance estimation method with an interval analysis approach. We firstly utilize the nonparametric bootstrap method to provide a confidence interval (CI) as the estimation for distance measurement. On that basis, interval analysis approach is employed to determine the feasible set and search the optimal estimation of coordinates.
3.1. Short Review of CI Estimate
As we know, CI is an interval about the measurement result within which the values that could reasonably be attributed to the measurand may be expected to lie with a given level of confidence. A common approach to address CI estimate based on the normal distribution can be described as follows. Let
For a given confidence level
It follows that constant c satisfies
Equality (3) can be rewritten as
Interval
However, σ must be practically estimated from the distance measurements, and this method performs well only in the case when n is large as per the central limit theorem. In the case where n is small
If sample size
Let the sample variance
Then, constant c must satisfy
Equality (8) can be rewritten as
Interval
3.2. CI Based on Nonparametric Bootstrap
In most studies, the CI is commonly and directly obtained through the previous two methods, based on the hypothesis that the distance uncertainty obeys normal distribution and the sample size is sufficiently large. However, as we mentioned before, it is not the case for large-scale WSNs localization in complex environments. When improving accuracy in conventional methods is invalid, bootstrap method can be used to address the CI estimate issues. The CIs of distance measurements can be estimated by employing the nonparametric percentile bootstrap method as follows.
Let
The nonparametric percentile bootstrap methodology to determine the upper and lower bounds on the CI for distance will be described in detail in the bootstrap distance estimation subsection.
3.3. Nonparametric Bootstrap-Based Distance Estimation Approach
The basic concept of bootstrap method is to produce a large number of independent bootstrap distance estimates by resampling the original distance estimate; that is,
A bootstrap resample
Let
We sort bootstrap estimates
The desired
Furthermore, note that
Finally, we can obtain the interval
3.4. Determination of Coordinates Based on Interval Analysis Method
When we obtain the CI from a small sample set of distance measurements, the interval analysis method will be utilized to determine the positions of ordinary nodes.
Consider a network consisting of m anchors with known positions and n ordinary nodes which need to be localized. The location of node
Obviously, the desired position estimate is guaranteed to be bounded by an admissible space which can be denoted as

Feasible set of ordinary node
To reduce the computational burden, boxes approach will be utilized to approximate the irregular intersection of all feasible solutions. For each anchor ordinary pair, their admissible space represents a region between two boxes of length

Approximate intersections (top view).
As we know, interval analysis is usually used to model quantities that vary around a central value within certain bounds. With simple operations, the interval analysis allows to consistently deal with problems involving interval data. Based on the interval analysis theory, the admissible space of
Specifically, the area
Using the same previous method, we can get
Therefore, the feasible set
Clearly, the intersection of all subfeasible set
Finally, the coordinates can be given as
4. Performance Evaluation
In this section, we conduct extensive simulations to evaluate the performance of our proposed localization scheme, called Nonparametric Bootstrap Based Multihop Localization Algorithm (NBMLA). All simulations are run in Matlab R2012a. To reduce the influence of outliers, we take the average of 100 simulation runs as the final data points.
4.1. Simulation Settings
In our simulation experiments, 400 nodes with adjustable transmission range R are randomly distributed in a large-scale three dimensional region with a size of 3000 m × 3000 m × 200 m (see Figure 3). We control the density and connectivity of the network by changing the transmission range while keeping the area of deployment the same. Different anchor percentages are also considered in our simulation. Besides our scheme, we also simulate an improved localization algorithm named Taylor-LS for comparison which is almost the same as [12].

Topology of large-scale WSNs.
We mainly consider three performance metrics: localization coverage, localization error, and time complexity. Localization coverage is defined as the ratio of the localized nodes to the total sensor nodes. Average localization error is the average distance between the estimated positions and the real positions of all ordinary nodes. Distribution boxplots of node localization errors show the average errors, median errors, and maximum outliers. The analysis of time complexity mainly focuses on the communication cost and computation complexity.
4.2. Localization Coverage
In this subsection, we analyze the performance of localization coverage with different network connectivity and anchor percentage, respectively. Figure 4 shows the performance of localization coverage with changing network connectivity while anchor percentage is 15%. We can observe that the localization coverage of our scheme increases monotonically with the connectivity ranging from 4 to 13. But when the connectivity reaches a high value, the coverage rate becomes relatively large and will not change much after that. For example, when the network connectivity is 10, the localization coverage becomes 97% and then changes slower. The reason is that when the network connectivity reaches a certain point, most nodes can get enough anchor nodes by multihop approach and localize themselves.

Localization coverage versus network connectivity.
As shown in Figure 5, the localization coverage of our proposed scheme increases with the anchor number when we change the anchor nodes from 20 (5% anchors) to 80 (20% anchors) with step 10, while keeping the network connectivity as 9. We can see that when the anchor number reaches a certain value, for example, 50 anchors, the coverage rate will not change much after that. We can also see that when the anchor rate is 15% and the connectivity is 9, our scheme can localize more than 95% nodes in the large-scale WSNs. Compared with the results of Taylor-LS, NBMLA can achieve better performance in terms of localization coverage, especially in the low anchor percentage situation.

Localization coverage versus anchor number.
4.3. Localization Error
As mentioned before, in complex environment the ranging process usually encountered some non-Gaussian noises which stem from multiple factors, for example, multipath effect, multihop distance estimation, and anisotropic network condition. This makes it harder for ordinary nodes to get exact distribution characteristics of the distance measurement noises. For comparison with classical approaches and showing the performance of our algorithm in non-Gaussian noises context, the measurement noise has been assumed to follow Rayleigh distribution, with three percent of real distances as the standard deviations. In other words, the knowledge of the error bound roughly equals nine percent of real distances. This is a reasonable assumption and can be satisfied by the existing distance measurement technologies when suffering multiple adverse factors.
Figure 6 plots the relationship between the average localization error and network connectivity when the anchor percentage is 15%. Besides the given Rayleigh distribution, we also simulate a normal distribution noise situation with mean value and standard deviation (10 m, 10 m) for comparison with the classical approaches relying on normal distributions hypotheses. We can observe that the average localization error of our scheme decreases significantly with the increase of network connectivity. It should be noted that our scheme can achieve relatively high localization accuracy even with low network connectivity. This indicates the good localization performance of our proposed scheme in sparse region. And it can be seen that our scheme outperforms the Taylor-LS method in both Gaussian and non-Gaussian cases. It indicates that our scheme is less affected by the variation of distributions.

Average localization error versus connectivity.
In Figure 7, we vary the anchor percent, which is ranging from 5% to 20%, and get the accuracy comparisons with different anchors. For our scheme, with the number of anchor nodes varying from 20 to 80, the localization error decreases by 40%. However, the localization error of Taylor-LS decreases only by 20%, when the network connectivity is 9. This suggests that in sparse networks, our scheme can achieve higher localization accuracy just by increasing a little number of anchor nodes.

Average localization error versus anchor number.
4.4. Discussions
Finally, we analyze the time complexity of the schemes which mainly focuses on the communication cost and computation complexity. For a network with M anchor nodes and N ordinary nodes, the overall communication complexity of our scheme is
5. Conclusions
In this paper, we present a novel multihop localization algorithm which integrates the nonparametric bootstrap method with interval analysis approach for large-scale WSNs in complex environments. In the first phase, the nonparametric bootstrap method is utilized to build confidence interval of distance form a small sample set of available measurements with unknown noise distributions. On that basis, we adopt interval analysis approach to estimate the positions of ordinary nodes. The simulation results demonstrate that our algorithm can get high localization coverage and accuracy while resisting against the variation of unknown noise distributions. Compared with the traditional method, the proposed method can greatly improve the average localization accuracy. Further studies are being conducted to extend the method to mobile WSNs in complex environments.
Footnotes
Acknowledgments
The authors are grateful to the anonymous reviewers for their industrious work and insightful comments. This work is supported by the National Natural Science Foundation of China under Grant nos. 61001138 and 61201317.
