Abstract
The development of the unmanned aircraft systems is envisioned to greatly reduce the energy consumption of sensor nodes in data gathering process using unmanned aircraft systems as mobile sinks. In traditional sensor networks, compressive sensing and clustering are two key energy-efficient techniques for data gathering. However, how to integrate two techniques into the data gathering for unmanned aircraft system–aided wireless sensor networks effectively is still an open problem. Moreover, most clustering schemes focus on the cluster head selection strategy and simplified the problem of cluster member selection, and most compressive sensing schemes are not integrated with the clustering strategy. To this end, this article studies the problem of integrating compressive sensing with clustering for data gathering in unmanned aircraft system–aided networks. We first give a theoretical formulation of this problem. Considering the non-deterministic polynomial-time hard complexity of the problem, we present two algorithms by jointly considering the compressive ratio variation factor and the distance factor to find near-optimal solutions heuristically. Evaluations based on real data traces show that the proposed algorithms greatly reduced the energy consumption of sensor nodes efficiency.
Introduction
Although the battery-powered wireless sensor networks (WSNs) promise effective ways for plenty of applications such as environmental monitoring, civil infrastructure monitoring, and health care, the energy problem remains one of the major barriers preventing the complete exploitation of this technology.1–7 Recently, the development of the unmanned aircraft systems (UASs) makes it possible to collect data from ground sensor nodes by UASs serving as mobile sinks. Such networks are called UAS-aided networks. 8 It is convenient and useful in many applications of WSNs, especially when large-scale ground sensor nodes are deployed in dangerous areas that are difficult to reach with conventional vehicles. In a UAS-aided network, only a small number of special nodes, cluster heads (CHs), can communicate with the UAS directly. Sensor nodes cannot reach the UAS due to their low-power transmission protocols, but they can communicate with the CHs by multi-hop communications. The UAS glides over the sensor field to collect data from the CHs. Both sensor nodes and CHs are powered by batteries and they usually function unattended.
Data transmission and reception contribute most of energy consumption of sensor nodes, 1 thus an efficient data gathering scheme is critical for energy efficiency of sensor nodes and CHs. The communication techniques between CHs and UAS have been studied in Abdulla et al., 8 in which, the authors propose a game theory method to address the adaptive modulation problem caused by the mobility pattern innate to the UAS. The structure of the UAS-aided networks requires all sensor nodes must be clustered around CHs, so that they can communicate with the UAS by the relay of CHs. Many energy-efficient clustering algorithms have been proposed for WSNs.9,10 However, these works mainly focus on the CH selection strategy. Once CHs are selected, sensor nodes join in clusters simply based on the distances to the CHs, while ignoring the data features.
Compressive sensing (CS) ensures that a sparse signal can be accurately reconstructed with a relative small number of measurements.11,12 This property gives new opportunities to further reduce the energy consumption of sensor nodes by exploiting their data features. The hybrid-CS has been proved that it can not only reduce the amount of transmissions but also balance the traffic load throughout the network.13–17 However, most CS-based works assume that the data of different parts of the sensing field have the same sparsity,14–16,18,19 while the sparsity may be different in different places in real implementation. It has been proved that the amount of measurements can be remarkably reduced by leveraging the property of sparsity variances in the temporal and spatial domains. 17 Moreover, although there are works which integrate the CS with clustering,16,18,20 they are designed for homogeneous WSNs, and they assume that all parts of the sensor field have the same sparsity. The main contributions of this article include the following.
First, we propose a new problem that how to minimize the energy consumption of data gathering based on the integration of CS and clustering for UAS-aided networks. We give a theoretical analysis of this problem and formulate this problem as a mixed-integer programming problem.
Second, considering the non-deterministic polynomial-time hard (NP-hard) complexity, we present two greedy algorithms to find near-optimal solutions of this problem. The first algorithm is data gathering scheme (DGS), which guarantees that the energy consumption increment of the network is minimized when a sensor node joins in a cluster. In DGS, we build a criterion of clustering by jointly considering two factors, which are the data sparsity and the distance between each sensor node and each CH. Each sensor node can decide which cluster should join in according to this criterion. After all clusters are formed, sensor nodes transmit their data to the CH by a spanning tree with the hybrid-CS scheme, 19 then CHs transmit data to the UAS. Although DGS has an outstanding performance in the energy consumption, it causes high computational complexity to form clusters. To further improve the efficiency, we present an improved algorithm, named improved data gathering scheme (IDGS). By dividing all sensor nodes to several layers for each CH and sorting all CHs for each sensor node, IDGS reduces the number of sensor nodes that take part in the clustering process and reduces the candidate CHs for each sensor node to choose.
Third, we evaluate the proposed algorithms based on real data traces for environmental monitoring. 21 The proposed algorithms are compared with two baseline algorithms. The simulation results show that the proposed algorithms are more energy efficient than both two baseline algorithms. We also evaluate the trade-off between the power consumption and the computational complexity for the IDGS. We find that the running time can be significantly decreased with a slight increase in the power consumption compared with DGS. At last, we compare the performance of DGS with a random walk–based CS scheme which reconstructs the whole data of networks without clustering or specific routing.
The rest of the article is organized as follows. We introduce the system model in section “System model.” In section “Problem definition and formulation,” we present the definition of the problem and formulate the problem as a mixed-integer formulation problem. Then, we propose two DGS based on the integration of CS and clustering for UAS-aided networks in section “Solution techniques.” We conduct trace-driven simulations in Section “Simulation.” Section “Conclusion” concludes this article.
System model
In this section, we first introduce the basic theory of CS, then present the network model.
Hybrid-CS basics
CS theory offers a technique to recover certain signals from a small number of measurements with a probability to one, if those signals are sparse or can be sparsely represented in a proper transform domain.11,12,22 A signal x with N elements is defined to
where
where c is a small constant. The measurement matrix
At the sink node, the signal can be reconstructed by employing the following
where transformation matrix
There are many choices of CS schemes for data gathering of WSN.14,23–25 Each of them can be applied in our scheme (to gather measurements and reconstruct data for each cluster). The clustering process of our scheme proposed in section “Solution techniques” explores the effect of the sparsity variety, while it has no relationship with the CS scheme. Different CS schemes can only affect the data gathering process after the cluster has formed. So, the different performances caused using different CS schemes come from the CS scheme itself. One can also apply CS scheme to gather measurements and reconstruct data on the whole network without clustering. However, it ignores the sparsity variety of data in different areas of the sensing field, which may cause low performance. In section “Simulation,” we will compare the performance of our proposed scheme with a random walk–based CS scheme which applies CS on the whole network.
In this article, we choose the hybrid-CS to gather data for each cluster.
14
Hybrid-CS has been proved to be an efficient CS scheme for data gathering in WSN.
14
The main idea of hybrid-CS is that non-CS is applied in the earlier stages of data gathering and CS-based gathering is only applied at nodes whose traffic is greater than the number of measurements. Figure 1 is an example about how hybrid-CS works. In Figure 1, five sensor nodes and one sink node form a chain topology. Assume the required number of measurements M as three.

Examples of (a) non-CS and (b) hybrid-CS.
Network model
Figure 2 shows the UAS-aided network model, which is composed of sensor nodes, CHs, and a UAS. CHs and sensor nodes are randomly deployed in the sensing field and follow a uniform distribution. The random deployment of sensor nodes and CHs will not affect the different sparsity scenarios, since there are some applications of WSN, such as temperature, humidity, and light monitoring over a geographical area, the sparsity varies in spatial domain naturally.

The UAS-aided network topology.
The UAS collects data by gliding around the sensor field. The CHs are equipped with superior hardware that allows them to communicate with the UAS directly during the time slots assigned by the UAS. In some scenarios, the collection route of UAS can affect the power consumption of sensor nodes. The route planning of the UAS is a traditional problem and it is complicated. We would try to solve it in our future works. In this work, we assume the CHs are powerful enough that all CHs can always reach the UAS when the UAS glides over the sensing field, so that we can focus on the clustering problem for saving the power consumption of sensor nodes. The number of CHs is much less than the number of sensor nodes due to the cost consideration. We assume the UAS-aided network is not a real-time system, since the sensor nodes need to store their data before the UAS fly into the communication range of the CH.
Let the set of all sensor nodes be
means
which is the set of sensor nodes that choose
Hybrid-CS is applied when data transmit through
Problem definition and formulation
In this section, we present the problem definition, then theoretically analyze the problem and formulate this problem as a mixed-integer formulation problem.
Problem definition
In this work, we only focus on the energy efficiency of the sensor nodes. We now define the problem of Data Gathering based on the Integration of CS and clustering for UAS-aided networks (DGIU) as follows.
Definition 1
Given a UAS-aided network, the DGIU problem is to find a map
Problem formulation
Let
The power consumptions of transmitting and receiving x bits of data over distance d are given by the following equations 26
where
Thus,
where
Proposition 1
The DGIU problem is NP-hard problem.
Proof
The subproblem of DGIU (construct
Solution techniques
Considering the complexity of the DGIU problem, in this section, we give efficient heuristic clustering algorithms for data gathering in UAS-aided networks.
Analysis
The data gathering process can be divided into two steps: the clustering and the spanning tree construction. In order to address the DGIU problem, we need to find the optimal clustering scheme and the optimal spanning tree construction scheme, such that the energy consumption of hybrid-CS-based data gathering is minimized. A fixed cluster corresponds to a fixed optimal spanning tree and a fixed compression ratio. From equations (7)–(10), the power consumption of sensor nodes in a cluster can be calculated once given the spanning tree and the compression ratio of the cluster. However, since there are
Aside from the optimal spanning tree construction, the power consumption of sensor nodes in a cluster can be affected by two factors: the compression ratio and the distances between sensor nodes and the CH. The lower compression ratio, the more measurements are needed, which results in more data to transmit. The longer distance between a sensor node and the cluster, the more power or more relay hops are needed to finish the transmission. In order to reduce the energy consumption of sensor nodes, for each cluster, it should have a high compression ratio, and sensor nodes should be closed to the CH. Unfortunately, since the distribution of the sparsity ratio is not regular, these two factors are usually not proportional in a cluster. Thus, a trade-off between them is needed to minimize
DGS
Denote the set of sensor nodes that have not joined in any cluster as
where
Since either long transmission distance or more hops can increase the energy consumption, a sensor node tends to join in the closed cluster to alleviate the transmission energy consumption. A matrix
where
In order to trade off the compressive ratio and the distances between sensor nodes and the CH for each cluster, we construct a new matrix D by integrating
where
Figure 3 shows an example that how matrix D is constructed. There are two CHs,
Let

An example of constructing matrix D.
The minimum element of D is
Repeat above process until
Note that, to construct
However, we generate the measurement matrix by the method introduced in Haupt et al. 22 and Zheng et al. 2 In which, each sensor node locally draws element of the random coefficients using its network address as the seed of a pseudo-random number generator. Given the seed values and the addresses of the nodes in the network, the UAS can also easily reconstruct the random coefficients for each sensor node. Since the UAS gathers the address of each sensor nodes in the first-round data collection, it can reconstruct the random coefficients for data retrieval of the Hybrid-CS scheme independently.
Algorithm 1 shows the pseudo-code of this DGS.
The main computation complexity of DGS comes from the multiple iterations of computing
IDGS
In algorithm DGS, the sparsity of data in each cluster needs to be recalculated when cluster member changes, which is the main reason of the high complexity. Reducing the calculation times of the sparsity can reduce the complexity. Actually, if a sensor node is very close or very far to a CH, the distance factor should play more important role in judging whether the sensor node should join in this cluster. However, if a sensor node in the middle area of two or more CHs, the sparsity factor should play more important role in judging which cluster should the sensor node join in. Therefore, the recalculation of the sparsity should be focused on this case. Now we give a scheme which can reduce the complexity of the algorithm DGS based on this property.
With the first-round data (include sensed data and coordinates of all sensor nodes and CHs) collected by the UAS, the UAS can compute the distance between each sensor node and each CH. For each sensor
Let
When all sensor nodes in
The complexity of IDGS is similar to the complexity of DGS, except n is reduced to
Simulation
In this section, we evaluate the performance of the proposed clustering scheme based on real data traces from Sensorscope: Sensor Networks for Environmental Monitoring. 21 These real sensor readings are dense signals but sparse in DCT domain. The performance of DGS is also compared with a CS scheme based on random walk. We conduct all simulations using MATLAB.
Simulation setup
The network for simulation is constructed by 72 sensor nodes. Both the coordinate and the sensed data of each sensor can be downloaded from Sensorscope. 21 In order to construct a UAS-aided network, we assume that m virtual CHs are deployed randomly in the sensing field, and a virtual UAS in charge of gathering data and broadcasting configuration information glides overhead of the sensor field. Since we only focus on the energy efficiency of the sensor nodes, virtual CHs and the virtual UAS will not affect the validity of the testing results. Since the sensed data and the coordinates of sensor nodes are matched, we cannot change the deployment of sensor nodes. In order to get more convincing results, we average the results over 50 random deployments of CHs.
We evaluated the performance from two aspects, one is the energy consumption, the other is the computation complexity. Let each sensor node store 50 data. We compute the total energy consumption and the running time to gather all these data from 72 sensor nodes based on the clustering scheme obtained from each algorithm. The parameters used in our simulation are listed in Table 1. Note that the parameters of the radio model are the same with those in Heinzelman et al. 26
Parameter values in simulation.
CH: cluster head.
We assume all sensed data and all coordinates of sensor nodes and CHs are known to us. With this information, both DGS and IDGS can compute D easily and return a clustering scheme. For each cluster, a graph
The performance of DGS and IDGS
The existing clustering schemes for heterogeneous WSNs mainly focus on the CH selection strategy.9,10 Once CHs are selected, sensor nodes join in clusters only based on the distances to the CHs, while ignoring the data features. However, although some works integrate CS and clustering,16,18 they assume that there is no sparsity variation in the sensing field, which means sensor node joining in clusters still base on the distance. Therefore, in order to evaluate the performance of our proposed scheme, we compare our clustering scheme with the following two baseline DGS: clustering scheme with no CS
Figure 4 shows the variation of the energy consumption with different values of

The power consumption versus
Figure 5 shows the variation of power consumption versus m for

The power consumption versus m.
Although the energy consumption of DGS and IDGS is almost the same in Figure 5, their running time shows great differences in Figure 6. We can see that the running time of DGS increases as m increases, this is because a larger m means more elements in the matrix D need to be calculated, the increased computation burdens cause increased running time. The running time of IDGS is always less than the DGS and shows no obvious relationships with the number of CHs. Since the clustering methods of

The running time versus m.
The trade-off in IDGS
In IDGS, there is a trade-off between power consumption and time complexity according to the values of

The power consumption of IDGS for different values of

Running time of IDGS for different values of
DGS versus random walk–based CS scheme
In this section, we compare the performance of DGS with a random walk–based CS scheme. Several works24,25 have proposed the random walk–based CS scheme. In this kind of CS scheme, random measurements are collected along multiple random paths, the sink can reconstruct the original data when enough measurements are collected. Thus, all data can be reconstructed without clustering or specific routing.
We implement the random walk–based CS scheme in our simulation as follows: at the beginning, m sensor nodes are selected randomly to initialize m independent random walks of fixed length t (some nodes may be selected several times). Each sensor node gets its measurement by random linear combination of its stored data (30 data). At each step of each walk, one node chooses one of its neighbors randomly and performs a linear combination with the measurement of the neighbor. At the end of random walks, m random projections are generated and these projections will be sent to the nearest CH by shortest path routing strategy (the number of CHs is set to 5). When the UAS receives all the projections from each CH, it can reconstruct all the original data using the recovery strategy of CS.
The power consumption and the reconstruction error are computed for performance comparing. The reconstruction error is defined as
Figures 9 and 10 show the variation of power consumption and reconstruction error versus the number of random walks for random walks with walk length

The power consumption of DGS versus random walk.

The reconstruction error of DGS versus random walk.
Conclusion
This article investigates the problem of DGIUs. We first formulate this problem into a mixed-integer programming problem. Considering its NP-hard complexity, we propose two heuristic algorithms to solve this problem. The first algorithm DGS is a greedy algorithm which guarantees the energy consumption increment of the network is minimized when each sensor node joins in a cluster. Although DGS can reduce the energy consumption, it causes a high computational complexity. In order to reduce the computational complexity of DGS, we propose the second algorithm IDGS. IDGS provides a trade-off between the energy consumption and the computational complexity. Thus, given a threshold of the energy consumption, we can find parameters that lead to the minimum computational complexity and vice versa. Simulations based on real data traces show the high efficiency of the proposed algorithms.
Footnotes
Academic Editor: Wei Yu
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the National Natural Science Foundation of China (Grant Nos 61672282, 61602238, and 61373131), the Basic Research Program of Jiangsu Province (Grant Nos BK20160805 and BK20161491), the China Postdoctoral Science Foundation (Grant No. 2016M590451), PAPD, and CICAEET.
