The integration of compressive sensing and clustering for date gathering in unmanned aircraft system

Abstract

The development of the unmanned aircraft systems is envisioned to greatly reduce the energy consumption of sensor nodes in data gathering process using unmanned aircraft systems as mobile sinks. In traditional sensor networks, compressive sensing and clustering are two key energy-efficient techniques for data gathering. However, how to integrate two techniques into the data gathering for unmanned aircraft system–aided wireless sensor networks effectively is still an open problem. Moreover, most clustering schemes focus on the cluster head selection strategy and simplified the problem of cluster member selection, and most compressive sensing schemes are not integrated with the clustering strategy. To this end, this article studies the problem of integrating compressive sensing with clustering for data gathering in unmanned aircraft system–aided networks. We first give a theoretical formulation of this problem. Considering the non-deterministic polynomial-time hard complexity of the problem, we present two algorithms by jointly considering the compressive ratio variation factor and the distance factor to find near-optimal solutions heuristically. Evaluations based on real data traces show that the proposed algorithms greatly reduced the energy consumption of sensor nodes efficiency.

Keywords

Unmanned aircraft system–aided networks compressive sensing clustering data gathering

Introduction

Although the battery-powered wireless sensor networks (WSNs) promise effective ways for plenty of applications such as environmental monitoring, civil infrastructure monitoring, and health care, the energy problem remains one of the major barriers preventing the complete exploitation of this technology.^1–7 Recently, the development of the unmanned aircraft systems (UASs) makes it possible to collect data from ground sensor nodes by UASs serving as mobile sinks. Such networks are called UAS-aided networks.⁸ It is convenient and useful in many applications of WSNs, especially when large-scale ground sensor nodes are deployed in dangerous areas that are difficult to reach with conventional vehicles. In a UAS-aided network, only a small number of special nodes, cluster heads (CHs), can communicate with the UAS directly. Sensor nodes cannot reach the UAS due to their low-power transmission protocols, but they can communicate with the CHs by multi-hop communications. The UAS glides over the sensor field to collect data from the CHs. Both sensor nodes and CHs are powered by batteries and they usually function unattended.

Data transmission and reception contribute most of energy consumption of sensor nodes,¹ thus an efficient data gathering scheme is critical for energy efficiency of sensor nodes and CHs. The communication techniques between CHs and UAS have been studied in Abdulla et al.,⁸ in which, the authors propose a game theory method to address the adaptive modulation problem caused by the mobility pattern innate to the UAS. The structure of the UAS-aided networks requires all sensor nodes must be clustered around CHs, so that they can communicate with the UAS by the relay of CHs. Many energy-efficient clustering algorithms have been proposed for WSNs.^9,10 However, these works mainly focus on the CH selection strategy. Once CHs are selected, sensor nodes join in clusters simply based on the distances to the CHs, while ignoring the data features.

Compressive sensing (CS) ensures that a sparse signal can be accurately reconstructed with a relative small number of measurements.^11,12 This property gives new opportunities to further reduce the energy consumption of sensor nodes by exploiting their data features. The hybrid-CS has been proved that it can not only reduce the amount of transmissions but also balance the traffic load throughout the network.^13–17 However, most CS-based works assume that the data of different parts of the sensing field have the same sparsity,^{14–16,18,19} while the sparsity may be different in different places in real implementation. It has been proved that the amount of measurements can be remarkably reduced by leveraging the property of sparsity variances in the temporal and spatial domains.¹⁷ Moreover, although there are works which integrate the CS with clustering,^16,18,20 they are designed for homogeneous WSNs, and they assume that all parts of the sensor field have the same sparsity. The main contributions of this article include the following.

First, we propose a new problem that how to minimize the energy consumption of data gathering based on the integration of CS and clustering for UAS-aided networks. We give a theoretical analysis of this problem and formulate this problem as a mixed-integer programming problem.

Second, considering the non-deterministic polynomial-time hard (NP-hard) complexity, we present two greedy algorithms to find near-optimal solutions of this problem. The first algorithm is data gathering scheme (DGS), which guarantees that the energy consumption increment of the network is minimized when a sensor node joins in a cluster. In DGS, we build a criterion of clustering by jointly considering two factors, which are the data sparsity and the distance between each sensor node and each CH. Each sensor node can decide which cluster should join in according to this criterion. After all clusters are formed, sensor nodes transmit their data to the CH by a spanning tree with the hybrid-CS scheme,¹⁹ then CHs transmit data to the UAS. Although DGS has an outstanding performance in the energy consumption, it causes high computational complexity to form clusters. To further improve the efficiency, we present an improved algorithm, named improved data gathering scheme (IDGS). By dividing all sensor nodes to several layers for each CH and sorting all CHs for each sensor node, IDGS reduces the number of sensor nodes that take part in the clustering process and reduces the candidate CHs for each sensor node to choose.

Third, we evaluate the proposed algorithms based on real data traces for environmental monitoring.²¹ The proposed algorithms are compared with two baseline algorithms. The simulation results show that the proposed algorithms are more energy efficient than both two baseline algorithms. We also evaluate the trade-off between the power consumption and the computational complexity for the IDGS. We find that the running time can be significantly decreased with a slight increase in the power consumption compared with DGS. At last, we compare the performance of DGS with a random walk–based CS scheme which reconstructs the whole data of networks without clustering or specific routing.

The rest of the article is organized as follows. We introduce the system model in section “System model.” In section “Problem definition and formulation,” we present the definition of the problem and formulate the problem as a mixed-integer formulation problem. Then, we propose two DGS based on the integration of CS and clustering for UAS-aided networks in section “Solution techniques.” We conduct trace-driven simulations in Section “Simulation.” Section “Conclusion” concludes this article.

System model

In this section, we first introduce the basic theory of CS, then present the network model.

Hybrid-CS basics

CS theory offers a technique to recover certain signals from a small number of measurements with a probability to one, if those signals are sparse or can be sparsely represented in a proper transform domain.^11,12,22 A signal x with N elements is defined to $K - sparse$ if there is a proper transformation matrix $Ψ$ , where $x = Ψ θ$ and $θ$ has at most K non-zero elements while other elements are zero or close to zero. If $K < < N$ , instead of collecting x, we can collect

y = Φ x = x_{1} ϕ_{1} + x_{2} ϕ_{2} + \dots + x_{N} ϕ_{N}

(1)

where $Φ$ is a $M \times N (M < < N)$ measurement matrix, $ϕ_{i}$ is the ith column of the measurement matrix. CS theory suggests that a $K - sparse$ signal can be reconstructed by M measurements with high probability, if M satisfies

M \geq cK \log N

(2)

where c is a small constant. The measurement matrix $Φ$ must satisfy the restricted isometry property (RIP). The entities of $Φ$ can be produced by pseudo-random number generator, and $M = 3 K ~ 4 K$ is always sufficient to satisfy equation (2). $ρ = N / M$ is called the compression ratio of CS.

At the sink node, the signal can be reconstructed by employing the following $ℓ_{1}$ optimization problem

\hat{θ} = \underset{θ \in R^{N}}{\arg min} ∥ θ ∥_{ℓ_{1}} s . t . y = Φ Ψ θ, \hat{x} = Ψ \hat{θ}

(3)

where transformation matrix $Ψ$ can be obtained by wavelet or discrete cosine transformation (DCT) algorithm.

There are many choices of CS schemes for data gathering of WSN.^14,23–25 Each of them can be applied in our scheme (to gather measurements and reconstruct data for each cluster). The clustering process of our scheme proposed in section “Solution techniques” explores the effect of the sparsity variety, while it has no relationship with the CS scheme. Different CS schemes can only affect the data gathering process after the cluster has formed. So, the different performances caused using different CS schemes come from the CS scheme itself. One can also apply CS scheme to gather measurements and reconstruct data on the whole network without clustering. However, it ignores the sparsity variety of data in different areas of the sensing field, which may cause low performance. In section “Simulation,” we will compare the performance of our proposed scheme with a random walk–based CS scheme which applies CS on the whole network.

In this article, we choose the hybrid-CS to gather data for each cluster.¹⁴ Hybrid-CS has been proved to be an efficient CS scheme for data gathering in WSN.¹⁴ The main idea of hybrid-CS is that non-CS is applied in the earlier stages of data gathering and CS-based gathering is only applied at nodes whose traffic is greater than the number of measurements. Figure 1 is an example about how hybrid-CS works. In Figure 1, five sensor nodes and one sink node form a chain topology. Assume the required number of measurements M as three. $x_{i}$ is the data generated by the sensor node $s_{i}$ . The red nodes transmit data coded by CS while the white nodes transmit original data. $x'_{k} = \sum_{j = 1}^{3} {ϕ'}_{kj} x_{j} + {ϕ'}_{k 4} x_{4}$ , $x_{k}^{″} = \sum_{j = 1}^{3} {ϕ ″}_{kj} x_{j} + {ϕ ″}_{k 4} x_{5}$ , where $k = 1, 2, 3$ , $ϕ'_{kj}$ and $ϕ_{kj}^{″}$ are the elements of the measurement matrix $Φ'$ of node $s_{4}$ and the elements of the measurement matrix $Φ ″$ of node $s_{5}$ , respectively. We can see that the total traffic of non-CS is 15, while the total traffics of hybrid-CS are 12.

Figure 1.

Examples of (a) non-CS and (b) hybrid-CS.

Network model

Figure 2 shows the UAS-aided network model, which is composed of sensor nodes, CHs, and a UAS. CHs and sensor nodes are randomly deployed in the sensing field and follow a uniform distribution. The random deployment of sensor nodes and CHs will not affect the different sparsity scenarios, since there are some applications of WSN, such as temperature, humidity, and light monitoring over a geographical area, the sparsity varies in spatial domain naturally.

Figure 2.

The UAS-aided network topology.

The UAS collects data by gliding around the sensor field. The CHs are equipped with superior hardware that allows them to communicate with the UAS directly during the time slots assigned by the UAS. In some scenarios, the collection route of UAS can affect the power consumption of sensor nodes. The route planning of the UAS is a traditional problem and it is complicated. We would try to solve it in our future works. In this work, we assume the CHs are powerful enough that all CHs can always reach the UAS when the UAS glides over the sensing field, so that we can focus on the clustering problem for saving the power consumption of sensor nodes. The number of CHs is much less than the number of sensor nodes due to the cost consideration. We assume the UAS-aided network is not a real-time system, since the sensor nodes need to store their data before the UAS fly into the communication range of the CH.

Let the set of all sensor nodes be $S = {s_{1}, s_{2}, \dots, s_{n}}$ and the set of all CHs be $H = {c_{1}, c_{2}, \dots, c_{m}}$ , where n and m are the number of sensor nodes and the number of CHs, respectively. The sensor nodes must be clustered around the CHs, so that they can communicate with the UAS by the relay of the CHs. Let $I (\cdot)$ be a map from S to H, where

I (s_{k}) = c_{i}

(4)

means $c_{i}$ is the CH of $s_{k}$ . Let

C_{i} = {s_{k} | I (s_{k}) = c_{i}}

(5)

which is the set of sensor nodes that choose $c_{i}$ as their CH, $i = 1, \dots, m$ . Two sensor nodes can communicate with each other if their distance is less than a threshold (e.g. the maximum transmission distance). In the ith cluster, a graph $G_{i} (V_{i}, E_{i})$ can be formed, where $V_{i}$ is $C_{i} \cup c_{i}$ , and $E_{i}$ is the set of all links between nodes of $V_{i}$ . Then, a spanning tree $T_{i} (V_{i}, \bar{E_{i}})$ of $G_{i}$ , which rooted at $c_{i}$ , can be constructed for data gathering.

Hybrid-CS is applied when data transmit through $T_{i} (V_{i}, \bar{E_{i}})$ . The compression ratio of CS depends on the sparsity of the data. When the sensor networks cover a wide field, data in different clusters may have different sparsity, so that the compression ratios of different clusters are different. We denote the compression ratio of the ith cluster as $ρ_{i}$ . So, the required measurements of the ith cluster is $M_{i} = N_{i} / ρ_{i}$ , where $N_{i}$ is the number of raw data in the ith cluster. Note that most existed works ignore the sparsity variation in the temporal and spatial domain,^{14–16,18,19} which leads to relatively high transmission cost of data gathering.

Problem definition and formulation

In this section, we present the problem definition, then theoretically analyze the problem and formulate this problem as a mixed-integer formulation problem.

Problem definition

In this work, we only focus on the energy efficiency of the sensor nodes. We now define the problem of Data Gathering based on the Integration of CS and clustering for UAS-aided networks (DGIU) as follows.

Definition 1

Given a UAS-aided network, the DGIU problem is to find a map $I (\cdot)$ from S to H and a spanning tree for each cluster, such that the total energy consumption of all sensor nodes is minimized using hybrid-CS-based DGS.

Problem formulation

Let $P (x_{ab})$ denote the power consumption when $x_{ab} bits$ of data are transmitted on a link $l_{ab}$ , where $l_{ab}$ is the link between $s_{a}$ and $s_{b}$ . Let $P_{C_{i}}$ be the power consumption of sensor nodes in $C_{i}$ , and $P_{A}$ be the power consumption of all sensor nodes. Then

P_{A} = \sum_{i = 1}^{m} (P_{C_{i}})

(6)

P_{C_{i}} = \sum_{l_{ab} \in \bar{E_{i}}} P (x_{ab})

(7)

The power consumptions of transmitting and receiving x bits of data over distance d are given by the following equations²⁶

E_{T} (x, d) = (\begin{matrix} (E_{0} + ε_{fs} d^{2}) \cdot x & if d < d_{0} \\ (E_{0} + ε_{mp} d^{4}) \cdot x & if d \geq d_{0} \end{matrix}

(8)

E_{R} (l) = E_{0} \cdot x

(9)

where $E_{0}$ is the baseline to run transmitter circuitry or receiver circuitry, which depends on factors such as digital coding and modulation. $ε_{fs} d^{2}$ and $ε_{mp} d^{4}$ are amplifier energy, which depend on the distance to the receiver and the acceptable bit-error rate. The threshold $d_{0}$ is used to determine whether apply free space (fs) model or multipath (mp) model. Based on equations (8) and (9), $P (x_{ab})$ can be expressed as

P (x_{ab}) = E_{T} (x_{ab}, d_{ab}) + E_{R} (x_{ab})

(10)

Thus, $P_{C_{i}}$ can be computed once $\bar{E_{i}}$ is determined. Since hybrid-CS-based transmission does not hold the flow conservation at aggregators, the process of constructing the optimal $T_{i} (V_{i}, \bar{E_{i}})$ is different with the normal case. Let $y_{a} \in {0, 1}$ be an indicator of whether a sensor $s_{a}$ is an aggregator, and $z_{ab} \in {0, 1}$ be an indicator of whether a link $l_{ab}$ is a tree edge. Suppose each sensor node stores λ bits of data to send out. The constraints of constructing $T_{i}$ can be formulated as follows

\sum_{b : l_{ab} \in E_{i}} x_{ab} - \sum_{c : l_{ca} \in E_{i}} x_{ca} + (| C_{i} | λ - M_{i}) y_{a} \geq λ, \forall s_{a} \in C_{i}

(11)

\sum_{b : l_{ab} \in E_{i}} x_{ab} - (M_{i} - λ) y_{a} \geq λ, \forall s_{a} \in C_{i}

(12)

\sum_{b : l_{ab} \in E_{i}} {\bar{x}}_{ab} - \sum_{c : l_{ca} \in E_{i}} {\bar{x}}_{ca} \geq \frac{1}{| C_{i} |}, \forall s_{a} \in C_{i}

(13)

z_{ab} - {\bar{x}}_{ab} \geq 0, \forall l_{ab} \in E_{i}

(14)

\sum_{l_{ab} \in E_{i}} z_{ab} = | C_{i} | - 1

(15)

x_{ab} - z_{ab} \geq 0, \forall l_{ab} \in E_{i}

(16)

z_{ab} - M_{i}^{- 1} x_{ab} \geq 0, \forall l_{ab} \in E_{i}

(17)

\sum_{b : l_{c_{i} b} \in E_{i}} x_{c_{i} b} = 0

(18)

x_{ab} \in Z^{+} \cup {0}, {\bar{x}}_{ab} \geq 0, \forall l_{ab} \in E_{i}

(19)

\bar{E_{i}} = {l_{ab} | z_{ab} = 1}

(20)

where $| C_{i} |$ is the number of sensor nodes in $C_{i}$ , $x_{ab}$ and $x_{ca}$ are link-flow assignments on links $l_{ab}$ and $l_{ca}$ , respectively. ${\bar{x}}_{ab}$ and ${\bar{x}}_{ca}$ are virtual link-flow assignments which are used only to specify the connectivity of link $l_{ab}$ and $l_{ca}$ . Equations (11) and (12) extend the flow conservation constraint for every sensor node of $C_{i}$ . Equations (13)–(15) guarantee that the set of links carrying virtual positive flows form a spanning tree. Equations (16) and (17) guarantee that only links carrying positive flows are indicated as tree edges. The idea of this formulation comes from Xiang et al.,¹⁹ where each sensor has 1 bit data to sent, and there is no fixed CH as the root of a spanning tree. We generalize it to send λ bits of data for each sensor, and each cluster forms a spanning tree with a CH as the root. Then, the DGIU problem can be formulated as follows

\begin{matrix} minimize P_{A} = \sum_{i = 1}^{m} (P_{C_{i}}) \\ subject to equations (4) - (20) \end{matrix}

Proposition 1

The DGIU problem is NP-hard problem.

Proof

The subproblem of DGIU (construct $T_{i}$ ) has been proved to be NP-hard,¹⁹ thus obtaining the optimal solution of DGIU problem is NP-hard.

Solution techniques

Considering the complexity of the DGIU problem, in this section, we give efficient heuristic clustering algorithms for data gathering in UAS-aided networks.

Analysis

The data gathering process can be divided into two steps: the clustering and the spanning tree construction. In order to address the DGIU problem, we need to find the optimal clustering scheme and the optimal spanning tree construction scheme, such that the energy consumption of hybrid-CS-based data gathering is minimized. A fixed cluster corresponds to a fixed optimal spanning tree and a fixed compression ratio. From equations (7)–(10), the power consumption of sensor nodes in a cluster can be calculated once given the spanning tree and the compression ratio of the cluster. However, since there are $m^{n}$ different clustering schemes given m CHs and n sensor nodes, and the problem of constructing the optimal spanning tree is a NP-hard problem, both clustering and spanning tree construction problems are complicated. In order to improve the efficiency, we develop heuristic techniques to address the DGIU problem.

Aside from the optimal spanning tree construction, the power consumption of sensor nodes in a cluster can be affected by two factors: the compression ratio and the distances between sensor nodes and the CH. The lower compression ratio, the more measurements are needed, which results in more data to transmit. The longer distance between a sensor node and the cluster, the more power or more relay hops are needed to finish the transmission. In order to reduce the energy consumption of sensor nodes, for each cluster, it should have a high compression ratio, and sensor nodes should be closed to the CH. Unfortunately, since the distribution of the sparsity ratio is not regular, these two factors are usually not proportional in a cluster. Thus, a trade-off between them is needed to minimize $P_{A}$ .

DGS

Denote the set of sensor nodes that have not joined in any cluster as $S_{f}$ . Let the number of sensor nodes in $S_{f}$ be $N_{f}$ . Note that the initial $N_{f}$ is n. Each sensor node in $S_{f}$ can join in any one of the m clusters, and it will change the sparsity of the cluster’s data after joining in the cluster. Thus, the required number of the measurements for recovering the original data will be changed. The sparsity of the data can be obtained by the DCT, and the measurement number can be calculated by formula (2). Thus, the difference of the measurements caused by $s_{i} \in S_{f}$ joins in $C_{j} (j \in {1, 2, \dots, m})$ can be calculated, let it be $dif (i, j)$ . In order to find a sensor node and cluster combination, such that the measurement increase caused by the sensor node joins in the cluster is minimized, we inspect the measurement variance of all sensor node and cluster combinations. Since there are $N_{f}$ sensor nodes which need to join in m clusters, we can get a matrix $D_{s}$ which has $N_{f}$ rows and m columns, and the elements of $D_{s}$ are calculated as follows

D_{s} (i, j) = \frac{dif (i, j)}{\max {| dif (:, :) |}}

(21)

where ${| dif (:, :) |}$ is the set of absolute values of $dif (i, j)$ for all possible i and j. The minimum value of $D_{s}$ , $D_{s} (i, j)$ , indicates that the measurement increase caused by $s_{i}$ joins in $c_{j}$ is the minimum among all sensor node and cluster combinations.

Since either long transmission distance or more hops can increase the energy consumption, a sensor node tends to join in the closed cluster to alleviate the transmission energy consumption. A matrix $D_{c}$ that records the distance between each sensor node of $S_{f}$ and each CH is constructed. The size of $D_{c}$ is the same with the size of $D_{s}$ . And the elements of $D_{c}$ are calculated as follows

D_{c} (i, j) = \frac{d_{i, j}}{\max {d_{a, b}, a \in S_{f}, b \in C}}

(22)

where $d_{i, j}$ is the distance between $s_{i} \in S_{f}$ and $C_{j} (j \in {1, 2, \dots, m})$ , and $\max {d_{a, b}, a \in S, b \in C}$ is the maximum value of the distances between all sensor nodes and all CHs.

In order to trade off the compressive ratio and the distances between sensor nodes and the CH for each cluster, we construct a new matrix D by integrating $D_{s}$ and $D_{c}$ . D is calculated as follows

D = α * D_{s} + (1 - α) * D_{c}

(23)

where $α$ is a weight coefficient which is used to adjust the weights of the sparsity and the transmission distance in the clustering scheme. As $α$ increases, the sensor nodes tend to join in the cluster which leads to smaller number of measurements. As $α$ decreases, the sensor nodes tend to join in the closed cluster. The optimal value of $α$ can be obtained by simulation test which will be shown in section “Simulation.” When D has been built, we can find the minimum value of D, $D (i, j)$ . Then, let $s_{i}$ join in $C_{j}$ , and remove $s_{i}$ form $S_{f}$ .

Figure 3 shows an example that how matrix D is constructed. There are two CHs, $c_{1}$ and $c_{2}$ , with some sensor nodes already in their cluster. Three sensor nodes, $s_{1}$ , $s_{2}$ , and $s_{3}$ , still waiting to join in certain cluster. With the sensed data, the coordinates of sensor nodes and CHs, the UAS can construct two $3 \times 2$ matrices, $D_{s}$ and $D_{c}$ . The elements of $D_{s}$ and $D_{c}$ can be computed by formulae (21) and (22). Assume they are as follows

\begin{matrix} D_{s} = (\begin{matrix} 1 & 1 \\ 0.5 & 1 \\ 1 & 0.5 \end{matrix}), & D_{c} = (\begin{matrix} 0.5 & 0.8 \\ 0.6 & 0.3 \\ 0.5 & 1 \end{matrix}) \end{matrix}

Let $α = 0.7$ , then

D = α * D_{s} + (1 - α) * D_{c} = (\begin{matrix} 0.85 & 0.94 \\ 0.53 & 0.79 \\ 0.85 & 0.65 \end{matrix})

Figure 3.

An example of constructing matrix D.

The minimum element of D is $D (2, 1)$ , thus we let $s_{2}$ joins in c₁’s cluster, despite $s_{2}$ is closer to $c_{2}$ .

Repeat above process until $S_{f}$ is empty. When $S_{f}$ is empty, each sensor has joined in a cluster. Then, a spanning tree can be constructed by algorithm $MECDA_GREEDY$ from Xiang et al.,¹⁹ which has been proven to be efficient in constructing the optimal spanning tree for Hybrid-CS-based data gathering. Then, data can be transmitted to each CH along each spanning tree based on the Hybrid-CS scheme.

Note that, to construct $D_{s}$ and $D_{c}$ , we need to know the data of all sensor nodes and the positions of all sensor nodes and all CHs. To get this information, we let the UAS collect the first-round data and the position information from the sensor field using normal cluster scheme (e.g. low-energy adaptive clustering hierarchy (LEACH)) and traditional routing strategy without CS (e.g. the minimum spanning tree). Based on the collected information, the UAS gets a feasible solution of the DGIU problem by the proposed method. Then, the UAS sends the clustering and the spanning tree scheme to each CH, and each CH informs the sensor nodes which cluster to join in and how to form a spanning tree. The UAS can gather data by the new cluster scheme in a period of time. Since the sparsity of the data varies in the temporal domain, the UAS can adjust the clustering scheme according to the new received data after a period of time.

However, we generate the measurement matrix by the method introduced in Haupt et al.²² and Zheng et al.² In which, each sensor node locally draws element of the random coefficients using its network address as the seed of a pseudo-random number generator. Given the seed values and the addresses of the nodes in the network, the UAS can also easily reconstruct the random coefficients for each sensor node. Since the UAS gathers the address of each sensor nodes in the first-round data collection, it can reconstruct the random coefficients for data retrieval of the Hybrid-CS scheme independently.

Algorithm 1 shows the pseudo-code of this DGS.

Algorithm 1. Data gathering scheme (DGS).
input: the first round data in sensor field and the position information of the network
1: initial $S_{f}$ as all sensor nodes
2: calculate the maximum value of the distances from all sensor nodes to all CHs
3: while $S_{f}$ is not empty do
4: for each node $s_{i}$ in $S_{f}$ do
5: for each fixed CH $c_{j}$ do
6: calculate $dif (i, j)$
7: calculate $d_{i, j}$
8: end for
9: end for
10: construct $D_{s}$ by (21)
11: construct $D_{c}$ by (22)
12: construct D by (23)
13: find the minimum value of D: $D (i, j)$
14: let $s_{i}$ join in $C_{j}$
15: remove $s_{i}$ from $S_{f}$
16: end while
17: for each cluster $C_{j} \cup {c_{j}}$ do
18: construct $T_{j}$ by algorithm MECDA_GREEDY
19: end for
output: $T_{j}$ , $j = 1, 2, \dots, m$

The main computation complexity of DGS comes from the multiple iterations of computing $D_{s}$ and $D_{c}$ . The time complexity of the algorithm for DCT transformation is $O (k \log k)$ , where k is the data size. So computing one $D_{m}$ leads to a complexity of $O (N_{f} \sum_{j = 1}^{m} λ | C_{j} | \log λ | C_{j} |) = O (N_{f} (n - N_{f}) \log (n - N_{f}))$ , where n is the number of sensor nodes. The complexity of computing one $D_{c}$ is $O (m N_{f})$ , where m is the number of CHs. While $S_{f}$ reduced from S to empty, $N_{f}$ reduced from n to 0. Therefore, the total complexity of all iterations is $O (\sum_{N_{f} = 0}^{n} (N_{f} (n - N_{f}) \log (n - N_{f}) + m N_{f})) = O (n^{2} m + n^{3} \log n)$ .

IDGS

In algorithm DGS, the sparsity of data in each cluster needs to be recalculated when cluster member changes, which is the main reason of the high complexity. Reducing the calculation times of the sparsity can reduce the complexity. Actually, if a sensor node is very close or very far to a CH, the distance factor should play more important role in judging whether the sensor node should join in this cluster. However, if a sensor node in the middle area of two or more CHs, the sparsity factor should play more important role in judging which cluster should the sensor node join in. Therefore, the recalculation of the sparsity should be focused on this case. Now we give a scheme which can reduce the complexity of the algorithm DGS based on this property.

With the first-round data (include sensed data and coordinates of all sensor nodes and CHs) collected by the UAS, the UAS can compute the distance between each sensor node and each CH. For each sensor $s_{i}$ , $i = 1, \dots, n$ , record a sequence $se q_{s_{i}}$ of all CHs in ascending order of distance between $s_{i}$ and each CH $c_{j}$ , $j = 1, \dots, m$ . Select the first CH of $se q_{s_{i}}$ and let $s_{i}$ join in this cluster. This process forms m original clusters. For each cluster j, let the maximum distance between the cluster member and $c_{j}$ be $\bar{d_{j}}$ . Divide all sensor nodes of this cluster into h layers, where the kth layer is the sensor nodes whose distances to $c_{j}$ are equal or greater than $(h - k / h) * \bar{d_{j}}$ and less than $(h - k + 1 / h) * \bar{d_{j}}$ . Thus, sensor nodes in the kth layer are closer to the CH when k is larger.

Let $S_{f}$ be the set of sensor nodes in the first $μ$ layers of all clusters, $0 \leq μ \leq h$ . Reallocate sensor nodes in $S_{f}$ by algorithm DGS, where the candidate CHs that can be selected by each sensor $s_{i}$ are restricted to the first $ν$ members of $se q_{s_{i}}$ , $1 \leq ν \leq m$ . $μ$ and $ν$ are parameters that can make a trade-off between algorithm complexity and energy efficiency. When $μ = h$ and $ν = m$ , this algorithm is the same with algorithm DGS. However, proper $μ$ and $ν$ can reduce the algorithm complexity with little energy cost. We will show this in section “Simulation.”

When all sensor nodes in $S_{f}$ finish their reallocation, the new cluster partition will be formed. Then, construct the spanning tree for each cluster by algorithm $MECDA_GREEDY$ ,¹⁹ and each sensor node can transmit data to its CH based on the Hybrid-CS scheme. Algorithm 2 shows the pseudo-code of this scheme.

Algorithm 2. Improved data gathering scheme (IDGS).
input: the first round data in sensor field, the position information of the network, h, $μ$ and $ν$
1: for each sensor nodes $s_{i}$ do
2: construct $se q_{s_{i}}$
3: select the first CH of $se q_{s_{i}}$ and join in its cluster
4: end for
5: for each cluster do
6: divide the member into h layers
7: end for
8: initial $S_{f}$ as the sensor nodes in the first $μ$ layers of all clusters
9: while $S_{f}$ is not empty do
10: construct D as DGS does, except replacing m CHs by the first $ν$ CHs in $se q_{s_{i}}$
11: find the minimum value of D: $D (i, j)$
12: let $s_{i}$ join in $C_{j}$
13: remove $s_{i}$ from $S_{f}$
14: end while
15: for each cluster $C_{j} \cup {c_{j}}$ do
16: construct $T_{j}$ by algorithm MECDA_GREEDY
17: end for
output: $T_{j}$ , $j = 1, 2, \dots, m$

The complexity of IDGS is similar to the complexity of DGS, except n is reduced to $(1 - (u^{2} / h^{2})) n$ and m is reduced to $ν$ .

Simulation

In this section, we evaluate the performance of the proposed clustering scheme based on real data traces from Sensorscope: Sensor Networks for Environmental Monitoring.²¹ These real sensor readings are dense signals but sparse in DCT domain. The performance of DGS is also compared with a CS scheme based on random walk. We conduct all simulations using MATLAB.

Simulation setup

The network for simulation is constructed by 72 sensor nodes. Both the coordinate and the sensed data of each sensor can be downloaded from Sensorscope.²¹ In order to construct a UAS-aided network, we assume that m virtual CHs are deployed randomly in the sensing field, and a virtual UAS in charge of gathering data and broadcasting configuration information glides overhead of the sensor field. Since we only focus on the energy efficiency of the sensor nodes, virtual CHs and the virtual UAS will not affect the validity of the testing results. Since the sensed data and the coordinates of sensor nodes are matched, we cannot change the deployment of sensor nodes. In order to get more convincing results, we average the results over 50 random deployments of CHs.

We evaluated the performance from two aspects, one is the energy consumption, the other is the computation complexity. Let each sensor node store 50 data. We compute the total energy consumption and the running time to gather all these data from 72 sensor nodes based on the clustering scheme obtained from each algorithm. The parameters used in our simulation are listed in Table 1. Note that the parameters of the radio model are the same with those in Heinzelman et al.²⁶

Table 1.

Parameter values in simulation.

Parameter	Description	Value
n	Number of all sensor nodes	72
$E_{0}$	Baseline to run circuitry	$50 nJ / bit$
$ε_{fs}$	Amplifier parameter	$10 pJ / bit / m^{2}$
$ε_{mp}$	Amplifier parameter	$0.0013 pJ / bit / m^{4}$
$λ$	Number of stored data	50
m	Number of CHs	{1, 2, …, 20}
h	Number of all layers	10

CH: cluster head.

We assume all sensed data and all coordinates of sensor nodes and CHs are known to us. With this information, both DGS and IDGS can compute D easily and return a clustering scheme. For each cluster, a graph $G_{i} (V_{i}, E_{i})$ can be formed according to the transmission distance of each sensor node. With the input of $G_{i} (V_{i}, E_{i})$ , the number of measurement, and the position of $c_{i}$ , the MECDA_GREEDY algorithm can generate a spanning tree rooted at $c_{i}$ and an aggregator set $A_{i}$ for hybrid-CS transmission. The data are transmitted as frame in the form of IEEE 802.15.4 protocol, in which the load of a frame is less than 127 bytes. For each frame, the control message is 7 bytes, includes 5 bytes of frame header and 2 bytes of frame footer. The data type in our simulation is float point which needs 4 bytes of storage space, thus one frame can transmit 31 data. Then, the power consumption of the cluster can be computed by formulae (6)–(8). The total power consumption can be computed via formula (9).

The performance of DGS and IDGS

The existing clustering schemes for heterogeneous WSNs mainly focus on the CH selection strategy.^9,10 Once CHs are selected, sensor nodes join in clusters only based on the distances to the CHs, while ignoring the data features. However, although some works integrate CS and clustering,^16,18 they assume that there is no sparsity variation in the sensing field, which means sensor node joining in clusters still base on the distance. Therefore, in order to evaluate the performance of our proposed scheme, we compare our clustering scheme with the following two baseline DGS: clustering scheme with no CS $(No_CS)$ , and clustering scheme with the same compressive ratio CS $(Same_CR)$ . In the $No_CS$ scheme, the sensor nodes choose the closest CH to join in the cluster and all sensor nodes transmit the original data without compression in the data gathering process. In the $Same_CR$ scheme, the clustering scheme is the same with the $No_CS$ scheme, while all sensor nodes transmit the data by hybrid-CS scheme with the same compressive ratio among different clusters.

Figure 4 shows the variation of the energy consumption with different values of $α$ when $m = 5$ . We can see that the compressive ratio plays more important role than the distance between sensor nodes and CHs in the clustering scheme. In this case, the optimal value of $α$ is 0.7.

Figure 4.

The power consumption versus $α$ .

Figure 5 shows the variation of power consumption versus m for $No_CS$ , $Same_CR$ , DGS, and IDGS. The $α$ is set to 0.7 for DGS and IDGS, and the $μ$ and $ν$ are set to 6 and 3, respectively. We can see that DGS and IDGS always outperform the other two baseline schemes. DGS can save up to $50 %$ energy consumption compared with the $Same_CR$ scheme and save up to $60 %$ energy consumption compared with $No_CS$ scheme. As the number of CHs increases, the average distance between sensor nodes and CHs becomes shorter and more sensor nodes can communication with the CHs directly without compressive. Thus, the energy consumption decreases, and the gaps between different schemes become small. Since $ν \leq m$ , there is no result of IDGS when the number of CHs is less than $ν$ .

Figure 5.

The power consumption versus m.

Although the energy consumption of DGS and IDGS is almost the same in Figure 5, their running time shows great differences in Figure 6. We can see that the running time of DGS increases as m increases, this is because a larger m means more elements in the matrix D need to be calculated, the increased computation burdens cause increased running time. The running time of IDGS is always less than the DGS and shows no obvious relationships with the number of CHs. Since the clustering methods of $No_CS$ scheme and $Same_CR$ scheme are both based on distance, they show the same low running time.

Figure 6.

The running time versus m.

The trade-off in IDGS

In IDGS, there is a trade-off between power consumption and time complexity according to the values of $μ$ and $ν$ . Figures 7 and 8 show the variation of the power consumption and running time of IDGS with different values of $μ$ and $ν$ , respectively. We can see that larger $μ$ and $ν$ lead to smaller power consumption, but cause higher time complexity. This is due to the following reasons: larger $ν$ means more sensor nodes participate in the CS-based clustering, and larger $ν$ means each sensor node has more candidate CHs to choose, both of them can cause higher time complexity. Meanwhile, larger $μ$ and $ν$ lead to more candidate clustering schemes, thus smaller power consumption clustering schemes can be selected. We can also see that, when $μ \geq 6$ and $ν \geq 3$ , the power consumption has no obvious variation, while the running time shows acute changes. The reason is that larger $μ$ leads to that more sensor nodes closed to CHs participate in the CS-based clustering, which may be useless since the distance plays more important role in the clustering process if sensor nodes are closed to CHs enough. From Figures 5 and 6, we can see that, when $μ$ equals to 6 and $ν$ equals to 3, the IDGS scheme consumes similar energy with the DGS scheme, while the running time is only $31 %$ of the DGS scheme. In practice, the appropriate values of $μ$ and $ν$ can be obtained by simulation test, which can be done by the unmanned aerial vehicle (UAV).

Figure 7.

The power consumption of IDGS for different values of $μ$ and $ν (α = 0.7, m = 5)$ .

Figure 8.

Running time of IDGS for different values of $μ$ and $ν (α = 0.7, m = 5)$ .

DGS versus random walk–based CS scheme

In this section, we compare the performance of DGS with a random walk–based CS scheme. Several works^24,25 have proposed the random walk–based CS scheme. In this kind of CS scheme, random measurements are collected along multiple random paths, the sink can reconstruct the original data when enough measurements are collected. Thus, all data can be reconstructed without clustering or specific routing.

We implement the random walk–based CS scheme in our simulation as follows: at the beginning, m sensor nodes are selected randomly to initialize m independent random walks of fixed length t (some nodes may be selected several times). Each sensor node gets its measurement by random linear combination of its stored data (30 data). At each step of each walk, one node chooses one of its neighbors randomly and performs a linear combination with the measurement of the neighbor. At the end of random walks, m random projections are generated and these projections will be sent to the nearest CH by shortest path routing strategy (the number of CHs is set to 5). When the UAS receives all the projections from each CH, it can reconstruct all the original data using the recovery strategy of CS.

The power consumption and the reconstruction error are computed for performance comparing. The reconstruction error is defined as $ε = ∥ \hat{x} - x ∥_{2}^{2} / ∥ x ∥_{2}^{2}$ , where $\hat{x}$ and x represent the reconstructed data and the original data, respectively. In DGS, the data reconstruction of each cluster is independent, since the sparsity of each cluster is distinguished. The required number of measurements for each cluster is obtained by the DCT. The whole reconstruction error of DGS is computed as 0.0368, and the power consumption is computed as $0.0255 J$ which is the same with the one in Figure 5.

Figures 9 and 10 show the variation of power consumption and reconstruction error versus the number of random walks for random walks with walk length $t = 30, 90, 150$ and DGS, respectively. We can see that the power consumption of DGS is lower than most cases of the random walk, and the reconstruction error of DGS is always lower than that of random walk. The reason about the result may as follows: since we cluster sensor nodes such that each cluster has high compression ratio, the sparsity of data in each cluster may be higher than that of the whole data. DGS reconstructs data for each cluster independently, thus it has higher probability of reconstructing data successfully than the random walk. Moreover, high compression ratio responds to low required number of measurements. Thus, each sensor node needs to transmit fewer measurements to the CH in DGS, which lead to low-power consumption.

Figure 9.

The power consumption of DGS versus random walk.

Figure 10.

The reconstruction error of DGS versus random walk.

Conclusion

This article investigates the problem of DGIUs. We first formulate this problem into a mixed-integer programming problem. Considering its NP-hard complexity, we propose two heuristic algorithms to solve this problem. The first algorithm DGS is a greedy algorithm which guarantees the energy consumption increment of the network is minimized when each sensor node joins in a cluster. Although DGS can reduce the energy consumption, it causes a high computational complexity. In order to reduce the computational complexity of DGS, we propose the second algorithm IDGS. IDGS provides a trade-off between the energy consumption and the computational complexity. Thus, given a threshold of the energy consumption, we can find parameters that lead to the minimum computational complexity and vice versa. Simulations based on real data traces show the high efficiency of the proposed algorithms.

Footnotes

Academic Editor: Wei Yu

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the National Natural Science Foundation of China (Grant Nos 61672282, 61602238, and 61373131), the Basic Research Program of Jiangsu Province (Grant Nos BK20160805 and BK20161491), the China Postdoctoral Science Foundation (Grant No. 2016M590451), PAPD, and CICAEET.

References

Anastasi

Conti

Francesco

et al . Energy conservation in wireless sensor networks: a survey. Ad Hoc Netw 2009; 7(3): 537–568.

Xie

Wang

Construction of tree network with limited delivery latency in homogeneous wireless sensor networks. Wireless Pers Commun 2014; 78(1): 231–246.

Shen

Tan

Wang

et al . A novel routing protocol providing good transmission reliability in underwater sensor networks. J Internet Technol 2015; 16(1): 171–178.

Zhang

Sun

Wang

Efficient algorithm for k-barrier coverage based on integer linear programming. China Commun 2016; 13(7): 16–23.

Lin

. Designing efficient routing protocol for heterogeneous sensor networks. In: 24th IEEE international performance, computing, and communications conference (IPCCC), Phoenix, AZ, 7–9 April 2005, pp.51–58. New York: IEEE.

Bai

et al . A dynamic geographic hash table for data-centric storage in sensor networks. In: Wireless communications and networking conference (WCNC), Las Vegas, NV, 3–6 April 2006, pp.2168–2174. New York: IEEE.

Xuan

et al . Query aggregation for providing efficient data services in sensor networks. In: IEEE international conference on mobile ad-hoc and sensor systems (MASS), Fort Lauderdale, FL, 25–27 October 2004, pp.31–40. New York: IEEE.

Abdulla

AEAA

Fadlullah

Nishiyama

et al . An optimal data collection technique for improved utility in UAS-aided networks. In: Proceedings of the IEEE INFOCOM, Toronto, ON, Canada, 27 April–2 May 2014, pp.736–744. New York: IEEE.

Abbasi

Younis

A survey on clustering algorithms for wireless sensor networks. Comput Commun 2007; 30(14): 2826–2841.

10.

Katiyar

Chand

Soni

Clustering algorithms for heterogeneous wireless sensor network: a survey. Int J Appl Eng Res 2010; 1(2): 273–287.

11.

Donoho

DL.

Compressed sensing. IEEE T Inform Theory 2006; 52(4): 1289–1306.

12.

Candès

Wakin

MB.

An introduction to compressive sampling. IEEE Signal Proc Mag 2008; 25(2): 21–30.

13.

Zhao

Y-C

W-Z

et al . Throughput optimization in cognitive radio networks ensembling physical layer measurement. J Comput Sci Technol 2015; 30(6): 1290–1305.

14.

Luo

Xiang

Rosenberg

Does compressed sensing improve the throughput of wireless sensor networks? In:

IEEE international conference on communications (ICC), Cape Town, South Africa, 23–27 May 2010, pp.1–6. New York: IEEE.

15.

Xiang

Luo

Vasilakos

. Compressed data aggregation for energy efficient wireless sensor networks. In: 8th annual IEEE communications society conference on sensor, mesh and ad hoc communications and networks (SECON), Salt Lake City, UT, 27–30 June 2011, pp.46–54. New York: IEEE.

16.

Xie

Jia

Transmission-efficient clustering method for wireless sensor networks using compressive sensing. IEEE T Parall Distr 2014; 25(3): 806–815.

17.

Wang

Tang

Yin

et al . Data gathering in wireless sensor networks through intelligent compressive sensing. In: Proceedings IEEE INFOCOM, Orlando, FL, 25–30 March 2012, pp.603–611. New York: IEEE.

18.

Nguyen

Teague

KA.

Compressive sensing based data gathering in clustered wireless sensor networksrks. In: IEEE international conference on distributed computing in sensor systems (DCOSS), Marina Del Rey, CA, 26–28 May 2014, pp.187–192. New York: IEEE.

19.

Xiang

Luo

Rosenberg

Compressed data aggregation: energy-efficient and high-fidelity data collection. IEEE/ACM Trans Netw 2013; 21(6): 1722–1735.

20.

Zou

Zhang

et al . WSNs data acquisition by combining hierarchical routing method and compressive sensing. Sensors 2014; 14(9): 16766–16784.

21.

http://lcav.epfl.ch/sensorscope-en

22.

Haupt

Bajwa

Rabbat

et al . Compressed sensing for networked data. IEEE Signal Proc Mag 2008; 25(2): 92–101.

23.

Luo

Sun

et al . Efficient measurement generation and pervasive sparsity for compressive data gathering. IEEE T Wirel Commun 2010; 9(12): 3728–3738.

24.

Sartipi

Fletcher

Energy-efficient data acquisition in wireless sensor networks using compressed sensing. In: Data compression conference (DCC), Snowbird, UT, 29–31 March 2011, pp.223–232. New York: IEEE.

25.

Zheng

Yang

Tian

et al . Data gathering with compressive sensing in wireless sensor networks: a random walk based approach. IEEE T Parall Distr 2015; 26(1): 35–44.

26.

Heinzelman

Chandrakasan

Balakrishnan

An application-specific protocol architecture for wireless microsensor networks. IEEE T Wirel Commun 2002; 1(4): 660–670.

The integration of compressive sensing and clustering for date gathering in unmanned aircraft system–aided networks

Abstract

Keywords

Introduction

System model

Hybrid-CS basics

Network model

Problem definition and formulation

Problem definition

Definition 1

Problem formulation

Proposition 1

Proof

Solution techniques

Analysis

DGS

IDGS

Simulation

Simulation setup

The performance of DGS and IDGS

The trade-off in IDGS

DGS versus random walk–based CS scheme

Conclusion

Footnotes

Declaration of conflicting interests

Funding

References