Abstract
Data collection from deployed sensor networks can be with static sink, ground-based mobile sink, or Unmanned Aerial Vehicle (UAV) based mobile aerial data collector. Considering the large-scale sensor networks and peculiarity of the deployed environments, aerial data collection based on controllable UAV has more advantages. In this paper, we have designed a basic framework for aerial data collection, which includes the following five components: deployment of networks, nodes positioning, anchor points searching, fast path planning for UAV, and data collection from network. We have identified the key challenges in each of them and have proposed efficient solutions. This includes proposal of a Fast Path Planning with Rules (FPPWR) algorithm based on grid division, to increase the efficiency of path planning, while guaranteeing the length of the path to be relatively short. We have designed and implemented a simulation platform for aerial data collection from sensor networks and have validated performance efficiency of the proposed framework based on the following parameters: time consumption of the aerial data collection, flight path distance, and volume of collected data.
1. Introduction
Data collection is the basic function of WSN and also an important research topic. The main research works regarding the method of data collection based on WSN can be divided into three categories.
The first category is static data collection. The static data collection is a kind of data collection when the sink node is fixed and ordinary nodes upload data using networks through one-hop or multihop routing. This kind of data collection is simple and easy to implement, given extensive research done in the literature. As long as the network is deployed, this kind of data collection can be immediately put into use. However, using this kind of data collection, remote nodes need to upload data through multihop routing paths. The relay nodes will consume more energy than the ordinary nodes, which tend to cause shorter life cycle of relay nodes and eventual disconnection of network segments. Therefore, static data collection is not always an adequate way of collecting data in the large-scale WSN, especially in specific application scenarios.
The second method is data collection with ground-based mobile sink. Such mobile data collector, usually a vehicle, is installed with sink node to visit the network through the ground transportation and collects data of all nodes in the network. For example, in an urban WSN, public transport, like buses and taxi, is used as the mobile data collector. In the underwater WSN, the autonomous underwater robot is used as the mobile data collector and collects data from the WSN that was deployed underwater. The ground data collection could solve the problem of imbalance in energy consumption and collects sensors' data even when the network segments get disconnected. Solving the energy imbalance problem prolongs the lifespan of the network and could be applied in a large-scale WSN. However, this kind of collection is heavily limited by surface transport and its function will be limited when the ground transportation is inconvenient. This can happen in practical scenarios such as in swamp, dense primeval forests, or areas that are blanketed by radiation exposure.
The third method is the data collection with aerial mobile sensor nodes. It uses aerial vehicles to collect data from the WSN deployed on the ground. This kind of data collection not only owns the flexibility of the mobile data collection suitable for large-scale WSN, but also has the following advantages:
Firstly, the deployment environment varies. Aerial data collection uses the Unmanned Aerial Vehicle (UAV) that could navigate automatically as the mobile data collector. It is free from the mobility limitation of ground transportation and could be used in particular monitored regions, which human beings could not approach. Secondly, aerial data collection gathers data much quicker. Compared with the ground data collection, aerial data collection uses controllable aerial vehicle which has a higher speed of movement. It could increase the speed of searching and visiting nodes and shorten the life cycle of data collection when the WSN is a large-scale one. Thirdly, it has a lower latency and higher bandwidth. Compared with the air to surface data collection, the aerial data collection often has fewer obstacles and larger coverage of wireless signal that could lower the communication latency and increase the bandwidth.
The aerial data collection could be applied in the data collection of a large-scale WSN and overcome the limitations of the ground data collection. However, it still has key challenges to be solved, such as the following.
This paper divides the building process of the aerial data collection into
The main contributions of this paper are as follows:
It studies the methods of data collection when a large-scale WSN is deployed in areas where ground transportation is difficult and explains the work procedure of aerial data collection and key problems that need to be solved. It provides a FPPWR (Fast Path Planning with Rule) algorithm aiming at a large-scale WSN. This algorithm ensures a relatively high accuracy and effectively improves the efficiency of the path planning of aerial vehicles. It designs and implements a simulation platform for aerial data collection. This platform could simulate the aerial data collection in a large-scale WSN based on different input parameters and system configurations and provides parameter evaluations and information feedback for networks in the real environment based on the simulation results.
The rest of the paper is organized as follows. In Section 2, we discuss the related works in the literature. Then, we propose the framework of the aerial data collection in Section 3. In Section 4, a fast path planning of the UAV in a large-scale WSN is introduced. Section 5 is about the design and implementation of the simulation platform for the aerial data collection. Finally, we conclude this paper and point out several directions for our future research in Section 6.
2. Related Works
The researches of WSN have been applied in different areas. In [2, 3], the authors generally summarized the application of WSN in the military field, environment monitoring, warehouse management, medical care, and so forth. According to the scale of the sensor networks applied in different fields, sensor networks can be divided into large-scale sensor networks and small-scale sensor networks. Small-scale sensor networks are WSNs that are deployed in homes [4], offices [5], and hospitals [6] that cover a smaller scale area. They are used to provide an intelligent human centered environment, known as the smart environment. For example, in [7], the authors proposed a data collecting and processing model based on the activity pattern, which is constructed on sensor data, and applied the model in the activity identification and behavior prediction of the sensors to improve their life quality. In this model, the bottom is a WSN that is responsible for data collection. Sensor data in this WSN is collected at the server through one-hop or multihop data collection network. To identify the activity of the sensors and analyze their behavior in real time, the authors built the system software on the top of the model through intelligent inference mechanisms and extracted and analyzed sensor data under different data accuracy. Generally speaking, a small-scale WSN has fewer nodes and its deployment is flexible. So, the relatively simple static data collection can be adopted to collect data.
Compared with small-scale networks, large-scale networks are WSN deployed in the city [8], the habitat for wildlife [9], or the water body [10] that covers a large area. Data collection in this kind of sensor network can adopt the ground data collection [11]. Taking [11] as an example, the author proposed the MULE, three-layer data collection with mobile collector. It can be applied to the data collection in a large-scale low-density WSN. When the mobile data collector approaches the sensor nodes, the mobile data collector starts to collect the sensor data with certain signal strength and temporarily stores the data in the mass storage device it carried onboard. In [12], the authors presented a heuristic searching algorithm for the path planning of the mobile data collector's anchor points in the network. Also, in [13], the same authors used the Mixed Integer Linear Programming (MILP) as the basis and proposed a walkthrough path planning algorithm that is specific to one-hop data collection. The work in [14] studied the route between the mobile data collector and the ground network nodes and provided an optimized data collection communication protocol and then increased the energy efficiency and the life cycle. To further increase the efficiency of data collection, [15] used SenCar with multiple antennas that could receive data simultaneously in the network and optimized the data collecting process based on constraints like the energy of nodes and data volumes.
In large-scale sensor networks, some cover special monitoring regions where the environment is severe for ground transportation, for example, swamps, dense primeval forests, and areas blanketed by radiation exposure. In this kind of environment, ground-based mobile data collector cannot follow the scheduled track so the ground data collection is difficult to accomplish. In order to collect data, the developed controllable UAV can be used as a mobile data collector and the ground-based sensor data can be collected through air traffic.
Research work on controllable UAV based aerial data collection from WSN is getting attention these days. In [16], the authors conducted experiments and proved that when the speed of the aerial vehicle is restricted to a certain range, sensor nodes carried on this aerial vehicle conducting wireless communication and data interaction with WSN are similar to the static nodes conducting wireless communication and interaction with WSN. This theory has laid the foundation for the follow-up research works on the aerial data collection. In [17], the authors proposed a data collection MAC layer routing protocol based on the aerial vehicle. In this protocol, the data transmits through an aerial route platform so energy consumption caused by conventional multihop routing can be avoided and the energy efficiency will be improved. In [18], the authors established cooperation between the UAV and the WSN and used the UAV to collect data of the WSN and updated the flight path of the UAV according to the feedback data of the WSN in order to achieve more efficient data collection of the WSN. In [19], the authors proposed that the framework and communication protocol be applied to the airborne WSN of the UAV. Aerial data collection based on the UAV not only collects data in a large-scale WSN, but also solves the problem of the ground data collection limitation when the ground transportation is inconvenient. These research works are mainly focused on solving a practical problem with the UAV in aerial data collection. However, the basic framework for aerial data collection mainly investigates deployment of network and location of sensor nodes and the method of parameters evaluation of aerial data collection. In [18], the author used the aerial data collection based on cluster. The network that consisted of the sensor nodes deployed in the monitored environment was divided into different cluster regions. When the aerial vehicle approaches the cluster region, it only needs to communicate with the head cluster in the data collection region to accomplish data collection in the cluster region. However, other member nodes need to transmit through at least one relay node to upload the data to the aerial vehicle. Considering the notion that the wireless signal covers a certain area, the aerial vehicle only needs to stop once to collect the sensor data through one-hop routing in the same wireless signal. To gain the minimum coverage of the network, the work in [20] proposed one kind of Grid Generating Method that could gain the minimum number of circulars in the global zone through the polynomial time approximation schemes. Aerial data collection requires the sensor nodes to have accurate location information so that the navigation of the UAV could be precise. In [21], the authors proposed a distributed node location algorithm based on probability, which can provide precise location of nodes in outdoor environment with noise interference.
3. The Overall Design of Aerial Data Collection
According to the research of aerial data collection in this paper, the whole aerial data collection process can be divided into five steps. They are (i) the deployment of networks, (ii) the nodes positioning, (iii) anchor points searching, (iv) fast path planning, and (v) data collection. In all the five steps, the fast path planning of the UAV is the key in this paper. The node positioning and anchor points search are the key algorithms this paper adopts, which will be discussed in detail in this chapter. This chapter will also give solutions to the deployment of networks and data collection. The process is shown in Figure 1.

(a) Deployment: deploy sensors with aircraft or other launch vehicles. (b) Localization: unknown nodes locate themselves by a beacon node. (c) Acquire anchor points: calculate the mini coverage of WSN. (d) Fast path planning: plan a flying path for UAV using grid. (e) Data collection: the UAV with the sink node and mass storage is used to collect data.
3.1. The Deployment of Network
The large-scale WSN whose deployment environment is hostile is often deployed through aerial vehicles spreading nodes or nodes being remotely shot. In such conditions, the nodes will be randomly spread in the monitored area. If the spreading of nodes has no rules, then in some areas of the network the nodes will be dense while in other areas of the network the nodes will be sparse or there will be no nodes at all. These will cause monitoring application to be vulnerable to missed sensing or detection. So, in the deployment of sensor network, the deployment of nodes should be fast and efficient, and also the nodes in the monitored region should be relatively well-distributed. Besides, aerial data collection positioning nodes need to be based on the distributed network. In the network, there are two categories of nodes: beacon nodes and unknown nodes [20]. Beacon nodes are the nodes whose geographical position is known, usually implemented by installing GPS module, while the unknown nodes are the nodes whose geographical position is waiting for positioning. In order to acquire the position information of all nodes, the beacon nodes and the unknown nodes should be mixed homogeneously at a proper ratio.
3.2. Node Positioning
After all the nodes have been deployed, the next step is to position the WSN. The location information of beacon nodes can be obtained through the GPS module. So, the key problem is how to use the known location information of beacon nodes to position other unknown ordinary nodes so as to prepare for the navigation of the aerial vehicle and data collection.
In the study of node positioning, positioning algorithm is based on the distance measurement. This algorithm provides relatively accurate positioning but it has high power consumption and restricted requirements of hardware, for example, the Time of Arrival (ToA) based distance measurement, the Time Difference of Arrival (TDoA) distance measurement, the Angle of Arrival (AoA) distance measurement, and the Received Signal Strength Indicator (RSSI) distance measurement.
Through study and comparison, in this paper, the distributed node location algorithm based on probability is adopted because this algorithm can provide relatively accurate positioning with the existence of obstacles and electromagnetic inference and is fit for aerial data collection. The distributed node location algorithm based on probability is an important function and will be implemented in the simulation platform for aerial data collection and will offer node positioning for the simulation system. The process of the distributed node location based on probability will be stated in Sections 3.2.1 and 3.2.2. More details about the algorithm are shown in [22].
3.2.1. The Distributed Node Location Based on Probability
The algorithm described as follows is RSSI based. Through abundant tests and static analysis, when the wireless signal strength is fixed, the relative distance between the unknown nodes and beacon nodes fits the normal distribution [20]. Figure 2(a) shows the probability distribution of relative distance between the unknown nodes and the beacon nodes when the signal strength is 70 (it is a relative value without unit). After fitting into the normal distribution function, the mean value of the normal distribution curve is 25 meters, as is shown in the figure. This means that when the signal strength is 70, the relative distance of 25 meters is more reliable. For different signal strength, the fitting results of relative distances vary. Figure 2(b) [22] shows the probability distribution of different signal strength that varies from 66 to 90.

(a) Probability distribution when signal is 70. (b) Probability distribution of different signal strength (figure source: [22]).
Based on the abovementioned probability distribution, the positioning of unknown nodes could use two beacon nodes at the same time to improve the accuracy of positioning. Figure 3(a) shows the process of two beacon nodes positioning one unknown node.

(a) Location of the unknown node. (b) The final position (figure source: [22]).
When the environment is the same, the normal distribution of signal strength and relative distance is fixed, which can be measured by its mean value and variance and then stored in the nodes that need to be positioned. If the environment where the nodes were deployed has changed, the nodes could change the corresponding mean value and variance to adapt to the new environment and get positioned accurately.
3.2.2. Algorithm Description
(1) (2) open the log file; (3) initialize the position estimate (4) (5) initialize constraint (6) set pointer to mean 1; (7) (8) read mean, stdev; (9) compute new constraint (10) (11) (12) (13) (14) (15) (16)
In this algorithm, the key step is to calculate the value of constraint If the location information comes from the beacon signal, then the value of If the location information comes from an unknown node, then the value of
The calculation of constraint Firstly, visit all the unknown nodes and use global region to initialize the location information Secondly, for every piece of location information received by the unknown node, calculate the value of constraint The value of constraint
3.3. Anchor Point Searching
After finishing the positioning of the WSN, the data centre can get a set comprised of location information of the nodes. The coverage of wireless signal of the sensor nodes can be considered as a fixed radius, which is called the covered circle.
For a large-scale sensor network, the grid division method in [20] is adopted in the aerial data collection to calculate the minimal coverage of the network and is implemented in the simulation platform for aerial data collection and offers the minimal coverage for the simulation system.
The conventional minimum circular coverage often uses the full search method. First, assume the minimum coverage in the whole region contains all the covered circles (which is called the absolute coverage). Then, reduce the number of circles. When the number of circles is reduced to
3.3.1. Grid Division
In the root grid
For the fixed horizontal partitioning index
As is shown in Figure 4, divide the grid

Grid subdivisions of
To obtain the needed optimum solution, several times of subgrid division are needed. When the value of
This optimum solution is the minimum coverage solution in the circular region. After converting the circular region into the corresponding nodes, the current set of circular regions is converted into the corresponding set of nodes, that is,
3.3.2. The Minimum Spanning Circle Algorithm
Algorithm 2 depicts the obtaining of the minimum coverage in one grid. This algorithm considers the nonempty square
(1) Initialize set (2) Initialize Count as the size of (3) (4) cover all elements (5) (6) (7) (8) Set (9) (10)
Algorithm 3 cites Algorithm 2. The main task of this algorithm is to accomplish the grid division of the whole region. And then use Algorithm 2 to calculate the minimum coverage of every nonempty square after the set of squares is obtained through division and combine all the minimum coverage of every square into global minimum coverage. Now, the grid still needs multiple times of division and each time global minimum coverage is obtained. In the end, the global minimum coverage containing the least nodes is the final solution. The key point of the algorithm is to obtain the nonempty square set
(1) (2) (3) (4) (5) Calculate the minimum cover set (6) Merge (7) (8) (9) (10) (11) For all elements in (12)
3.4. The Path Planning of Aerial Vehicles
Nodes in
Figure 1(d) shows the key step of path planning of the aerial vehicle. Figure 1(d) shows the grid division during calculation and the global regional path
3.5. Data Collection
Figure 1(e) depicts the flow chart and schematic of the ground data collection separately. When the nodes in the network finish self-positioning, they disconnect the link and enter low power mode—the wireless module enters the dormant state and the sensor module starts to collect data. The sensor nodes store the data in their own storage space with limited capacity after data collection is finished. After a certain period of time, the aerial vehicle enters the network following a scheduled path and connects with the ground nodes to determine the position of data collection and starts to collect data from the sensor nodes that are in the coverage of the signal. Considering energy saving, the sensor nodes upload data through one-hop routing.
4. FPPWR for Aerial Data Collection
The process of aerial data collection can be briefed as follows. In a series of target locations, the mobile data collector needs to find the path that costs the least, visit the target locations for only once, and return to starting point. This problem can be abstracted as the TSP problem. And how to visit the target location at the least cost is the key of solving this problem. Now, the study of TSP problem can be categorized into the following five methods. First is the dynamic programming. Second is the greedy algorithm. Third is the branch definition method. Fourth is the backtracking method and fifth is the genetic algorithm. These methods could find solutions to the TSP problem, but they have a high time complexity and are easy to plunge into local optima.
When the sensor network has large amount of target nodes, it is time consuming to use the conventional methods to solve TSP and find the flight path of the aerial vehicle. And with the growth of scale, these conventional algorithms even may not find the final solutions to the problem. Besides, the flight path based on conventional methods of solving TSP has irregularity. And this kind of irregularity may easily cause the space discontinuity in data collection and is not good for the aerial data collection in a large-scale WSN. Specific to the above problems, this paper introduces the primary flight path and provides a feasible FPPWR algorithm for aerial data collection. This algorithm mainly utilizes grid division method in [20] and divides the global nodes into every subregion. Then, it combines the paths in the subregions through the primary flight path and finally obtains a global flight path quickly with certain regulations.
4.1. The Related Work of the UAV Path Planning
There are increasing interests in research works about the aerial data collection based on the controllable UAV. The basis of these studies is that the UAV can be controlled flexibly and will accomplish the flight mission following the scheduled flight plan. Lange et al. [23] stated a kind of high performance, controllable UAVs with multiaxis. This UAV can hover or land at the specific location with the help of GPS navigation system when it carries a certain degree of load.
The path planning for aerial data collection is to visit every target node at the lowest cost. In some application, cost can be represented by the Euclidean distance. In [18], sensor networks are often deployed in the large remote depopulated areas and the ground transportation is inconvenient. The authors let several UAVs work together and conducted path planning for the node set that each UAV is responsible for. The process of one UAV visiting nodes can be considered as a TSP. The authors used the greedy algorithm to find the shortest flight path. Although this algorithm can get the flight path fast, it often cannot find the optima. In [24], the authors used the branch definition method that is specific to TSP to solve the TSP. The branch definition method is one of the ways that could find exact solutions to the problem. However, this method is time consuming. When there are too many nodes, the method cannot find the final solution. Now, a better way to solve TSP is the Lin-Kernighan-Helsgaun algorithm [25]. The time complexity of the algorithm could reach
For those sensor networks whose nodes are well-distributed, the path planning of the nodes can be considered as an exception of TSP. In the meantime, considering that the deployment of sensor network is large scale, using grid division could ensure the high accuracy of path planning and considerably reduce the time consumption.
4.2. Models and Definitions
For the large-scale sensor network which is deployed in the monitored environment,

Path planning in the grid where there exist 12
Head node is a node that can gather date in its own coverage in the sensor network, denoted by
Flight path is a path based on the
The primary flight path is a path which can access the directed path of all intermediary squares from the origin square to the end square.
Ideal path is a kind of flight path with certain volatility and following the primary flight path.
Square path, also called local path, means the flight path
FPPWR is the flight path formed by all the grid paths that have a unique directed path.
Ingress node is the first
Out node is the last
As is shown in Figure 5, it is a schematic of the grid division and flight path of 12
The reason for the existence of irregular path is the repetition access of square in the path planning process, which causes the same square to be accessed at least for one time. In order to avoid the existence of the irregular path and find the regularized path, we need to plan the access of square with certain sequence. The process can be summarized as the projecting of primary flight path. In order to achieve the sequential square set
Considering the complexity and efficiency of an algorithm, to obtain an ideal regular path, the whole path can be divided into paths in the local squares. And combine all the paths in the local squares in the primary flight path to obtain a whole path.
From the formula above, the main workload of UAV is square traversal and path planning, and the function of primary flight path is to regularize the path planning. After the path is planned, a sequential node set
4.3. UAV Regularized Fast Path Planning
In this section, the paper describes in detail how to plan the path of square and optimize the whole path, based on primary flight path. Square path planning is a kind of local path planning; there are small quantity node sets in each square, so the path planning would be efficient, which would be narrated in Section 4.3.1 for the path planning of a single square; ingress node and out node must be taken into consideration. To ensure the ingress node and out node of a square, the concept of operator is introduced, and optimal algorithm of path based on paired operators is raised. Details are shown in Section 4.3.2.
Primary flight path is a conceptual directed path, which clarified the access sequence. From the detailed path planning of a square and the optimizing of paired operators, a sequential node set including all the nodes to be accessed can be obtained. Connecting all these nodes by sequence, a directed path would be formed and that is the UAV flight path.
4.3.1. Paths in Square Areas
Because of the random nodes distribution, there could be some empty squares which would be ignored in the path planning. Based on the precedence principle, the other squares can obtain the sequential nonempty square set
With the division of grid, node set
Although
(1) Set (2) Set (3) (4) Set (5) Set (6) (7) (8) (9)
η: The size of suspicious set;
4.3.2. Optimum of the Paired Operators' Path
To optimize the distance from the present square to the next square, the concept of operator is introduced to determine the ingress node and out node of a square. The operator seeking the ingress node is defined as ingress operator denoted by
The key point of paired operators seeking ingress and out node is to choose the right seeking way while guaranteeing the minimal distance when entering into the next square from the present square and improve the efficiency of the algorithm. Aimed at the position of suspicious set, the following will explain the seeking method of corresponding paired operators.
START-END is the suspicious set including ingress node and out node when the aircraft enters and exits from the whole area. In the whole area, entrance and exit mean the beginning and end of data collection, so there would be two suspicious sets:

Workflow of FPPWR.
UP-DOWN is the paired suspicious set made up of two suspicious sets when the aircraft leaves for another line from the present line. If the present square is
LEFT-RIGHT is the paired suspicious set made up of two suspicious sets when the aircraft moves from one square to the next square. In the function of paired operators, if the aircraft is from
After the out and ingress node of square
4.4. Description of Optimized Path Algorithm of Paired Operators
The operator seeking the ingress node is defined as the ingress operator. The operator seeking the out node is defined as the out operator. Suspicious set is the node set which works out the ingress node denoted by
To find the minimal distance from the present square to the next square, out operator and ingress operator need to operate collaboratively to determine the out node and the ingress node. The two operators are called paired operators, and the corresponding out suspicious set and ingress suspicious set are called paired suspicious set.
The key point of paired operators seeking the ingress and the out node is choosing the right seeking way while guaranteeing the minimal distance when entering into the next square from the present square and improving the efficiency of the algorithm. The paired suspicious set, in accordance with the position of out suspicious set and ingress suspicious set, can be divided into three types, START-END, UP-DOWN, and LEFT-RIGHT. As shown in Figure 6, it is an optimized process of paired operators in a suspicious set capacity of 2 and the grid scale of
4.5. Analysis of the Algorithm Complexity
From Algorithm 1, the traversal of all squares and the sort of nodes in every access square are the two main loop parts in this algorithm. For the sort of nodes in a square, we apply the quick sort algorithm which needs lower time complexity to achieve the sequential node set along the primary flying path. According to the quick sorting algorithm, when
4.6. The Result of the Experiment and Its Analysis
Based on Windows operating system, FPPWR sorting algorithm is realized by C# language. The processor is Core i5 dual core, 2.4 GHz. Since this algorithm is focused on projecting large numbers of nodes, as for the node and its input, we can use the function
Table 1 describes the necessary parameter in the experiment, among which the scale of grid decides the number of the squares, generates certain scale of grid, and gets the grid division in minimal covering. The number of nodes decides the number of head nodes in each square. The number of head nodes combined with the grid number parameter can describe the deployment scale of the sensor network. The size of operator decides the number of nodes included in each operator in the experiment, which provides the limitation for suspicious set capacity in the optimized algorithm of the operator path. Figure 7 shows the result of path planning.
Algorithm parameters of the FPPWR.

Experimental result of FPPWR under conditions grid scale:
To highlight the advantage of the FPPWR algorithm, we conduct 13 experiments; the numbers of nodes are 100, 200, 300, 500, 1000, 1500, 2000, 3000, 5000, 10000, 20000, 30000, and 50000; and numbering the experiment is based on the different number of nodes. For each group experiment, besides the number of nodes, other conditions keep the same. The path, respectively, is projected by the FPPWR algorithm, greedy algorithm, and heuristic algorithm; the path distance and time consumption of the algorithm are collected. The following two parameters are used to analyze FPPWR algorithm performance: time consumption in path planning and path distance.
4.6.1. Analysis of Time Consumption
When planning the path in different node scale, time consumed by the FPPWR algorithm is various. From the number of nodes and the time consumed in path planning in Figure 6, we can conclude the relationship between time consumption and number of nodes, shown in Figure 8(a).

(a) Time consumption. (b) Uniform contrast. (c) Nonuniform contrast.
From the figure, it is possible to find that the fewer the nodes, the less the time consumption of FPPWR. When the number of nodes is less than 3000, the time consumed is less than 1 ms; when the number of nodes increases, the time consumption increases linearly.
By contrasting with LKH algorithm and greedy algorithm, the time consumption between them has a huge difference, as shown in Figure 8(a). From the figure, when the number of nodes is less than 3000, the time consumption of these three algorithms is less, while FPPWR algorithm is the least time consuming. When the number of nodes is more than 3000, the time consumption of these three algorithms begins to show a huge difference, and the time consumption of LKH is the largest. From the figure, when the number of nodes reached 50000, the time consumption of LKH is 957820.4 ms, which is about eighty thousand times bigger than the time consumption of FPPWR. The reason for this huge difference is that LKH algorithm and greedy algorithm search the solution accessing target nodes within the global scope, aiming at less path distance, while FPPWR divides the global scope into square area using grid and searches the solution with Row-prior principle. From the experiment, we can conclude that when there are a very large number of sensor nodes in the network, FPPWR algorithm is more advantageous in time consumption of computing the path planning.
4.6.2. Analysis of the Path Distance
Since the minimal path distance can be obtained through heuristic algorithm, the distance obtained through heuristic algorithm is regarded as datum which is used to compare with other distances obtained by other algorithms. After calculating its specific value, we can draw the path distance and node number chart as in Figure 8(b).
From the chart, it is possible to find that when the number of nodes is small, the short distance can be obtained from both heuristic algorithm and greedy algorithm. However, with the number of nodes increasing greatly, the FPPWR algorithm and the other two algorithms have the same effect.
In conclusion, when the number of nodes increases, the consumption of time increases greatly in greedy algorithm and heuristic algorithm. When the number reached 5000, the time consumed by heuristic algorithm is 10 times that of greedy algorithm and thousands of times that of the FPPWR algorithm. Meanwhile, as for the path distance, with the number of nodes increasing, the result of the FPPWR algorithm is close to that of the heuristic algorithm. So, if the path planning is not required to be too accurate, grid division has more advantages than other algorithms.
However, the FPPWR algorithm requires the node to be deployed evenly. For the unevenly deployed nodes, the feasible solution can be obtained through the algorithm but this solution has no advantage in finding the path distance. How the unevenly deployed nodes affect the FPPWR algorithm will be shown in the experiment.
4.7. The Effect of the Uneven Distribution to Algorithms
In this section, the paper focuses on the unevenly deployed sensor network and proves the efficiency of the FPPWR algorithm by experiments. In the experiment, the uneven deployment can be realized by deploying some empty squares. As is shown in the network deployment chart (Figure 8(b)), a flight path is obtained by the FPPWR algorithm. From this chart, it is possible to see, in such network deployment, the FPPWR algorithm cost 0.09 ms, and the path distance is 3029.94 pixels.
From Figure 9, it is possible to see that unevenly deployed nodes make the flight across several empty squares, and the flight distance is extended. In the circumstance of uneven deployment, we can have multigroup experiments of various numbers of nodes, based on the distance drawn by heuristic algorithm, the comparative chart of FPPWR algorithm, greedy algorithm, and heuristic algorithm, shown in Figure 8(c).

Experimental result of FPPWR with WSNs deployed nonuniformly.
From the data of experiment, path distance would be affected heavily citing the FPPWR algorithm when nodes are deployed unevenly: with the node number increased, the path distance deviation is larger in the FPPWR algorithm. So, in the unevenly deployed network, the FPPWR algorithm is ineffective.
5. Simulation Platform for Aerial Data Collection
For the sensor network deployed in large scale, there would be thousands of sensor nodes deployed in the monitored area. During data collection, the aircraft need to traverse all the grids with a sink node to accomplish the task of collecting all the node data. It is a time-consuming process, and the aircraft needs to bring enough fuel and enough storage space for data collection. So, how to precisely ensure that the amount of fuel, the storage space, and the length of time the aircraft needed are enough is significant. Based on this, this paper builds a simulation platform for aerial data collection based on collection steps.
5.1. Explanation for the Simulation Platform
The simulation platform can be divided into the simulation control program and the NS2 simulation routine. The simulation control program is developed by C# programming language in VS2005, which is focused on the allocation of relative data and the display of result. Besides, the following is also to be realized: the allocation of distributed nodes based on probability and the supporting algorithm related to simulation platform, such as Minimum Spanning Circle Method and the FPPWR. NS2 is developed by TCL script, which is mainly to realize the simulation of WSN in different deployment and output the result.
5.1.1. The Explanation of Basic Parameters
Table 2 shows the key parameters and notes in the simulation platform. All the parameters show the information of sensor network and nodes and deploy the corresponding simulation modules.
Algorithm parameters of the FPPWR.
5.1.2. The Effect of the Simulation Platform
The simulation platform is deployed according to the key parameters. Figure 10 shows the network coverage obtained when the network scale is 135 and the deployment is random and well-distributed.

The node coverage of simulation platform.
As is shown in Figure 10, simulation platform provides simulation network, minimal coverage, the path planning, and generation of NS2 script. Their functions are the following.
Simulation network: it generates needed sensor network according to the scale of the network and the deployment method, including ordinary nodes and movable sink nodes.
Minimal coverage: it uses the Minimum Spanning Circle Method to achieve the head nodes through transfer network, covering radius, and so forth.
Path planning: it cites the FPPWR algorithm to make quick path planning in accordance with the returned head nodes in minimal coverage.
Generation of NS2 script: it transfers the data collected by the aircraft and guided by FPPWR algorithm into TCL script and controls the simulation aircraft with sink node in NS2 WSN access node sequentially.
After the four processes, a TCL script would be generated in NS2. To achieve the simulation condition of aerial data collection in NS2, the experiment would track the parameters of received and transmitted data as well as the energy consumption and record the distance and time of the simulated aircraft. Figure 11 shows the simulation condition of sensor network in NS2 when the node scale is 135.

The simulation of network with NS2.
From Figure 11, when the time is 239.2 seconds, the sink node labeled 0 directly approaches the head node labeled 30 from the head node labeled 24 at a speed of 5 m/s. When moved to a head node, the sink node should stay for 20 seconds. Finally, at 1349.9, sink node stops moving, and the simulation network stops.
5.2. Analysis of the Simulation Experiment
After modifying the number of nodes and aircraft speed, respectively, multiple groups of experiments have been conducted, and the simulation results of each group's flight time, distance, and the collected data size are recorded. To express the simulation result of aerial data collection in NS2, the trends have been drawn to reflect the relationship between flight time, flight distance, and the size of collected data.
Figure 12, respectively, describes the relationship between consumed time during collection and the number of nodes. From the chart, with the increased number of nodes, the time consumed during collecting the data is correspondingly increased. Meanwhile, it is possible to find in the figure that the flight speed affects the time consumption greatly, especially when the large scale of node appears. And the increase of speed reduces the time consumption largely.

Relationship between time consumption and number of nodes.
Figure 13 is a relationship chart reflecting the change of flight distance of data collecting aircraft when the number of nodes changes. Since the flight distance is constant when the flight speed changed, the flight distance has no direct relationship with flight speed. And it is possible to find in the figure that the flight distance increases when the number of nodes increases. The detailed relationship has been narrated in the last part.

Relationship between the flight distance and the number of nodes.
Figure 14 is the trends reflecting how the amount of collected data changes when the number of nodes changes. From the chart, it is possible to find that when the number of nodes increases, the amount of data increases. When there are fewer nodes, the amount of data increases slowly. However, when the number of nodes increases, the amount of data increases greatly. The explanation for this phenomenon is that when the aircraft is collecting data, the node is also collecting data, and nodes near the end of the flight path can collect more data. Besides, the amount of collected data is related to flight speed: when the number of nodes keeps the same, the faster the aircraft flies, the less the data can be collected.

Relationship between the amount of data collected and the number of nodes.
Analyze comprehensively: from the simulation result of NS2, when the number of nodes increases and the flight speed remains the same, data collection costs more time, and flight distance and the amount of collected data increase, and the amount of collected data is larger than others. It is possible to find that when the number of nodes reaches 1500, the data collection of a sensor consumed 328.7 hours (13.7 days) and the flight distance is 28.2 km, and the amount of collected data reaches 90447 KB. So, from the simulation platform, a conclusion can be drawn that when the network deployment and parameter allocation are alike, the aircraft would at least fly 28.2 km and bring storage with the capacity of at least 90447 KB.
However, the speed of the aircraft affects a lot the time consumption, flight distance, and the amount of collected data. From the simulation result, the increase of speed would decrease the consumed time and the amount of collected data and improve the efficiency of data collection. These are also the advantages which are based on the aerial data collection based on UAV collecting data from a large-scale WSN.
6. Outlooks and Conclusions
Aimed at large-scale WSN, this paper studies a more practical method of aerial data collection. As for the construction of aerial data collection, it is divided into five procedures: deployment of network, allocation of distributed node, anchor point searching, path planning, and data collection. For the path of UAV and considering the notion that the node deployment is even, this paper raises the regularized fast path algorithm which improves the speed of path planning, as well as spatial continuity of data collection. To evaluate the relevant parameters in data collection, this paper designs and realizes the simulation platform for aerial data collection and takes multigroup experiments when the flight speed and network scale change. From these experiments, it is possible to find that the analysis of NS2 simulation result based on the simulation platform can evaluate the time consumption of aerial data collection, flight distance, and the parameters of storage capacity, which can provide information for practical data collection.
However, by study and analysis, it is possible to find the disadvantages of aerial data collection, which is the key point for the next step.
Firstly, the minimal coverage of data collection is based on the aircraft stay on the midair of the head node and collecting data of the node on its coverage through single-hop routing. This method is simple and efficient and has high efficiency of data collection for the neighboring nodes, but for the farther node, especially the nodes on the periphery, the data inaccuracy and packet lost would be easier because of the attenuation of communication signal. So, in the further study, how to integrate the aircraft with the ground network, determine the best anchor point, and improve efficiency would be the significant problem to be studied.
Then, aerial data collection relies too much on GPS. For the area with weaker signal or special areas without GPS signal, the controllable UAVs based on GPS navigation will lose their function. Considering the anchor capability of WSN, setting “guard” nodes on ground network to collaborate with the aircraft to collect data is the key point to be studied.
Finally, although simulation platform aimed at aerial data collection has a complete frame and functional definition, many aspects still need to be improved. With the application of simulation platform, expanding the shared function library, inputting more parameters, and improving the simulation effect are areas that still need to be studied.
Footnotes
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
Acknowledgments
The work is supported by the National Natural Science Foundation of China under Grant no. 61004112 and the Fundamental Research Funds for the Central Universities (no. CDJZR12180006). The work of Debraj De and Sajal K. Das was partially supported by NSF projects under Awards nos. CNS-1404677, CNS-1355505, and CNS-1545037.
