Abstract
Data gathering is one of the most important operations in many wireless sensor networks (WSNs) applications. In order to implement data gathering, a tree structure rooted at the sink is usually defined. In most wireless sensor networks, nodes are powered by batteries with limited energy. Prolonging network lifetime is a critical issue for WSNs. As a technique for signal processing, compressed sensing (CS) is being increasingly applied to wireless sensor networks for saving energy. Compressive sensing can reduce the number of data transmissions and balance the traffic load throughout networks. In this paper, we investigate data gathering in wireless sensor networks using CS and aim at constructing a maximum-lifetime data-gathering tree. The lifetime of the network is defined as the number of data-gathering rounds until the first node depletes its energy. Based on the hybrid-CS data-gathering model, we first construct an arbitrary data-gathering tree and then use the random switching decision and optimal parent node selecting strategy to adjust the load of the bottleneck node and prolong the network lifetime. Simulation results show that the proposed algorithm outperforms several existing approaches in terms of network lifetime.
1. Introduction
Wireless sensor networks (WSNs) consist of a great number of nodes that sense the environment and collaboratively work to process and route the sensory data. They have a great application value in the fields of habitat monitoring, medical care, battlefield monitoring, and so on. Data gathering is a basic operation in WSNs, where sensors are responsible for collecting all sensory data and delivering them to the sink node [1]. Typically, sensors have strictly limited communication abilities and energy resources; therefore, how to reduce energy consumption in data gathering so as to prolong network lifetime is an important issue [2].
The problem of energy efficient data gathering in WSNs has been extensively investigated. Data-gathering methods based on cluster and tree are proposed in many literatures [3, 4]. The goal of such methods is to construct a network topology in order to use energy resources of sensor node effectively. However, these methods cannot overcome the “hot pot” problem; that is, nodes close to the sink would suffer from heavy loads and their energy would be exhausted soon, which would shorten the network lifetime. Data aggregation [5] and source coding technique [6] are two efficient methods to overcome the “hot pot” problem. However, data aggregation adopts a simple aggregation function that only extracts certain statistical information from the sensory data. The sink node cannot recover all sensory data. Hence, this technique can only be applied to particular applications that require partial information from WSNs. The distributed source coding technique performs noncollaborative data compression at the source nodes; it is not exactly practical in WSNs either due to the lack of prior knowledge of the data correlation or because of resulting in high communication load.
Compressed sensing is a signal processing technique for efficiently acquiring and reconstructing a signal, by finding solutions to underdetermined linear systems [7]. This technique promises to successfully recover the original signal from far fewer measurements than its original dimension, as long as the signal is sparse or compressible in some domain. Recently, the growing interest in CS has inspired many works in WSNs [8–11]. Many researches show that the CS technique suits well data gathering in WSNs because the sensory data is generally quite sparse in nature [8]. The authors in [12] made the first effort to speculate the potential of using CS theory for data gathering in multihop WSNs. However, they have not proposed a real scheme based on this initial idea. The authors in [8] firstly presented a complete framework for data gathering based on CS in large-scale WSNs where a large number of sensor nodes are densely deployed and sensor readings are spatially correlated. In Compressive Data Gathering (CDG) proposed in [8], the sensory data are transmitted as weighted sums on a routing path so as to reduce global communication cost without introducing intensive computation or complicated transmission control. However, CDG can result in many unnecessary data transmissions during data-gathering procedure.
In [13], the authors introduced two data-gathering models, that is, plain-CS and hybrid-CS. The plain-CS model which is the same as the CDG scheme uses CS encoding for all nodes in the network. The hybrid-CS model applies CS only to relay nodes that are overloaded. The hybrid-CS model marries the merits of nonaggregation data gathering and plain-CS scheme. Recently, Xiang et al. in [14] introduced a Minimum Energy Compressed Data Aggregation (MECDA) algorithm to implement the data gathering in WSNs based on hybrid-CS technique. The aim of MECDA is to allocate the traffic load properly in a given data-gathering round, through joint routing and aggregator assignment, such that the total energy consumption is minimized. Unfortunately, MECDA only takes into account the overall energy consumption of sensor network and does not consider balancing the energy consumption of each node; therefore, the energy of some sensor nodes may be exhausted quickly, which will affect the network lifetime eventually. Therefore, it is significant for the researchers to design a maximum-lifetime data-gathering method which balances the energy consumption of sensor nodes by using CS theory in designing of underlying routing.
In this paper, we propose a method of constructing maximum-lifetime data-gathering tree (MLDGT) in WSNs based on the hybrid-CS data-gathering model. The contributions of this paper are as follows: We first define the problem of maximum-lifetime data-gathering tree in WSNs based on compressed sensing, prove that it is NP-complete, and then analyze the characterizations of the optimal solutions. We propose a novel algorithm, which used random switching and optimal parent node selecting strategy to keep transferring descendants of bottleneck node to realize load balancing and extending the lifetime of WSNs. Extensive experiments have been conducted to verify that our algorithm achieves higher lifetime than existing approaches.
The rest of the paper is organized as follows. In Section 2, we give a brief overview of applying compressed sensing in data gathering of WSNs. Section 3 formally defines the problem of maximum-lifetime data-gathering tree and analyzes the complexity of the problem. The proposed algorithms are detailed and analyzed in Section 4. Simulation results are presented in Section 5 and conclusions are drawn in Section 6.
2. Preliminaries
2.1. Compressed Sensing Basics
In CS framework, the real-valued signal
According to CS theory, we can recover
2.2. Overview of Data Gathering Based on Compressed Sensing in WSNs
For the traditional data-gathering method of WSNs, nodes around the sink need to transmit more raw data and consume more energy than the downstream nodes of the sensor network. The unbalanced energy consumption has a major impact on network lifetime. In [8], Luo et al. proposed CDG for data gathering in large-scale WSNs which is an efficient way to alleviate the bottleneck.
To illustrate the idea of data gathering based on CS, we rewrite the CS coding as
However, the plain-CS model may lead to unnecessary higher traffic at the earlier stage of data gathering. Therefore, Luo et al. in [13] proposed the hybrid-CS data-gathering model. The main idea of hybrid-CS model is that the non-CS scheme is applied in the earlier stages of data gathering, and the CS-based compression is only applied at node whose incoming traffic intensity becomes larger than M or equal to M. The data-gathering process based on hybrid-CS model is depicted in Figure 1 through a simple chain-type topology. There are N sensor nodes

Hybrid-CS data gathering in a chain-type topology.
In this paper, we focus on maximizing the lifetime of data-gathering trees based on hybrid-CS model. As can be seen from the above analysis, the energy consumption characteristic of sensor node brings about a new challenge for constructing a maximum-lifetime data-gathering tree based on hybrid-CS model.
3. System Model and Problem Formulation
3.1. System Model
We consider a static wireless sensor network with N sensor nodes
The architecture for hybrid-CS data gathering that we refer to in [13, 14, 17] is shown in Figure 2. A spanning tree rooted at sink is adopted to gather the readings of the whole network. In Figure 2, the arrowed line represents the transmission link, and the data on the left side of each line is ready for transmitting. For leaf nodes, such as nodes When the number of data received from node v's children is less than M, where M is the dimension of column vector When the number of data received from node v's children nodes is more than M, node v performs the CS coding according to (3) and transmits exactly M encoded data corresponding to the aggregated column vector. We call node v the aggregating node. In Figure 2, node

The architecture for hybrid-CS data gathering.
Based on the above analysis, in a given data-gathering tree T, the outgoing traffic of node v is
3.2. Problem Formulation
In this paper, we define a data-gathering round as the process of gathering all data from nodes to the sink, regardless of how much time it takes [18].
The lifetime of a node v in tree T is defined as the number of data-gathering rounds since it began to work until its death:
This problem that is similar to the maximum-lifetime problem proposed in [18–20] is an NP-complete problem. The unique property of the problem is the energy consumption model of the node in data-gathering tree based on hybrid-CS model, which brings about new challenges for constructing data-gathering tree. Since the problem is NP-complete, we try to find an approximate solution to solve the problem. Details of the solution techniques are shown in the next section.
4. Solution Techniques
4.1. Basic Ideal of Solution Techniques
To facilitate the analysis, we first provide some relevant assumptions and definitions in this section. We denote by
Formula (9) can be transformed into an equivalent form:
Inspired by [19], in order to solve the problem of formula (13), we need to find a spanning tree in which the difference between the maximum path load and the minimum path load of leaf node is minimal. Therefore, the MLDGT problem can be described as follows:
We can enumerate all spanning trees of graph G and find a spanning tree which minimizes the maximum path load of leaf nodes. However, the number of the spanning trees of graph G can be an exponential function of the number of nodes N. Therefore, we try to find an approximate solution to minimize the maximum path load of leaf node in the data-gathering tree. Details of the algorithm are described in the next section.
4.2. Algorithm Description
In this section, we design an approximate algorithm to deal with the problem raised by formula (13). Our algorithm uses randomized tree transformation and switching method and optimal parent node selecting strategy that make full use of the property of energy consumption in data gathering based on CS to improve convergence speed of the algorithm.
Our algorithm starts from an arbitrary tree T and then keeps transferring descendants of bottleneck node to nodes with smaller path load iteratively. In order to make the algorithm converge, we define a terminate parameter σ as the maximum allowable difference in the path loads among all leaf nodes. If the current spanning tree
However, if the current spanning tree

A part of the spanning tree
In Figure 3, suppose that node
In the random switching decision, whether a node is selected for switching depends on its switching probability. For a considered node, we generated a random number, 0 or 1, according to its switching probability. If the random number is 1, the corresponding node is selected to switch; otherwise, the node is not selected. It is more likely to generate random number 1 when the switching probability is higher. The initial switching probability for each node is
In Figure 3, if node
We have used a random switching decision to select a switching node. We will design an optimal parent node selecting strategy for assigning an optimal parent node to the selected switching node. Intuitively, in graph G, each neighbor of a selected switching node is likely to be the parent node of the switching node. However, in fact, only a part of the neighbor nodes are qualified to be the parent eventually. Because the path loads of some neighbor nodes are high originally, the node switching may increase the path load of such nodes, and the oscillation may happen. Therefore, we should choose some neighbor nodes as the potential parent nodes. Suppose that
In neighbor nodes set
We need to choose a node v from the set
4.3. Details of the Algorithm
The core of our algorithm is shown in Algorithm 1. In our algorithm,
(1) Set (2) Let (3) (4) (5) Return T; (6) (7) Set (8) (9) Remove node (10) (11) (12) (13) (14) (15) (16) Update T by switching (17) (18) break; (19) (20) (21) (22) (23) (24) (25) Select (26) (27) Return T
4.4. Analysis of Algorithm
In this section, we present the time complexity of our algorithm. The calculation of
5. Performance Evaluation
In this section, we evaluate the performance of our algorithm through simulation experiments.
5.1. Simulation Setup
We evaluate the performance of our algorithm using Matlab simulations. In our simulation, we randomly deploy sensor nodes in a square region of size 100 m × 100 m. The number of sensors is varied from 100 to 300. We consider three scenarios associated with the different location of the sink in the network. In Scenario 1, the sink was placed at the corner of the deployment area, that is, at the (0 m, 0 m) coordinate. In Scenario 2, the sink was placed at one side of the deployment area, specifically at the (50 m, 0 m) coordinate. In Scenario 3, the sink was placed in the middle of the deployment area, namely, at the (50 m, 50 m) coordinate. Each sensor is randomly assigned an initial energy between 0.5 and 1 joule (J). We assumed that all nodes have the same energy consumption model. The radio transmission range of each node is set to 25 m. Similar to [18], the energy required to receive data is 50 nJ/bit and the energy required to transmit data is 100 nJ/bit. The size of one-unit data is 16 bytes. The energy consumption for one-unit data transmission and reception is 6400 nJ and 12800 nJ, respectively; that is
The choice of the terminate parameter σ governs convergence speed of the algorithm and lifetime of the tree produced at converged point. However, it is not possible to know the optimal value of σ for a random connectivity graph. Intuitively, it may appear that choosing σ as low as possible results in a higher lifetime tree. When σ is too low, the algorithm will perform more switching to find a spanning tree to meet the terminate parameter. However, such spanning tree may not topologically exist in a given connectivity graph representing the sensor network. The oscillations may occur and diminish the lifetime of the final tree. Therefore, we need to learn the appropriate ranges of σ for different network configurations. To solve this problem, we generate 100 random deployed networks for each scale sensor network as training inputs to our algorithm. Then, we measure their lifetimes with varying σ values and select the optimal σ value for simulation experiments. The maximum allowed switching time
5.2. Simulation Results
In this section, we first compare our algorithm with the MITT [18], RaSMaLai [19], and MECDA [14] data-gathering schemes for Scenario 2. The lifetime performance is compared as a function of the number of nodes. The sampling rate is set to 10% and 20%.
It can be seen from Figure 4 that our algorithm outperforms the compared algorithms. Furthermore, the lower the sampling rate, the better the lifetime performance of our algorithm. When the sampling rate is low, more nodes in the network can become the aggregating nodes, so as to save energy. The MITT algorithm is based on a min-max-weight spanning tree. The algorithm improves the lifetime of a given tree by switching the parent of the node under consideration. The MECDA algorithm uses hybrid-CS technology, and the objective of the algorithm is to minimize the total energy consumption in a data-gathering tree. It does not consider the balance of energy consumption and energy consumption of each node. The nodes near the sink may have higher energy consumption, which affects the lifetime of the network. The RaSMaLai algorithm uses randomized switching scheme to maximize the lifetime of data collection trees based on the concept of the bounded balanced trees. However, because the algorithm collects the raw data directly, the nodes close to the sink need to relay data of all the downstream nodes on the path, so that the energy consumption is higher, and the network lifetime is shorter. Our algorithm uses both the hybrid-CS strategy and the random switching method to adjust the load of bottleneck node, thus obtaining the best lifetime performance.

Lifetime of different algorithms.
Figure 5 shows the lifetime of different algorithms when the sampling rate is varied from 10% to 30%, and the number of nodes is 200 and 300, respectively. As shown in Figure 5, with the increase of the number of nodes, the nodes near the sink need to forward more data, the energy consumption increases, and the lifetime is reduced. With the increase of sampling rate, the lifetime for our algorithm and MECDA decreases. This is because when the sampling rate increases, the maximum data length M is increased, and the intermediate nodes need to gather more data to become aggregating nodes, the energy consumption of such nodes increases, and the network lifetime is reduced. However, the lifetime of RaSMaLai does not change with the varying of sampling rate, because it does not use the CS strategy. When the sampling rate is over 30%, the performance of our algorithm is more and more close to that of the RaSMaLai algorithm. The reason is that when the sampling rate is large, few nodes can become the aggregating node, and the characteristic of energy consumption is consistent with the RaSMaLai algorithm.

Lifetime with different sampling rates.
As shown in Figure 6, we compare our algorithm with MECDA in different scenarios. No matter which scenario is used, our algorithm outperforms the MECDA algorithm. In Scenario 1, the performance of our algorithm is more superior, because when the sink is placed at (0, 0) coordinate, the nodes around the sink need to forward more data of downstream nodes and have heavy loads, and thus our algorithm can make the load of such nodes become more balanced by switching. However, in Scenario 3, the performance improvement of our algorithm is relatively small; this happens since the sink located at the middle naturally gives better balancing for each algorithm.

Lifetime in different scenarios.
6. Conclusion
In this paper, we have studied the problem of constructing a maximum-lifetime data-gathering tree. We used hybrid-CS theory to data gathering in wireless sensor networks. We first constructed an arbitrary data-gathering tree, used a random switching decision to select a bottleneck node's child as switching node, and then designed an optimal parent node selecting strategy for assigning an optimal parent to the selected switching node. We kept transferring descendants of the bottleneck node to realize load balancing until the terminate parameter or maximum allowed switching time is satisfied. Simulation results show that the proposed algorithms can significantly increase the lifetime of WSNs and outperform several existing approaches in terms of network lifetime.
Footnotes
Competing Interests
The authors declare that they have no competing interests.
Acknowledgments
This research is supported by the Natural Science Foundation of Jiangsu Province under Grants nos. BK20130096 and BK2011754, the National Natural Science Foundation of China under Grants nos. 61272084, 61202004, 61202353, 61300240, and 61302158, the Postdoctoral Science Foundation of China under Grant no. 2015M581794, the Project of Natural Science Research of Jiangsu University under Grants nos. 15KJB520027 and 14KJB520031, the Postdoctoral Science Foundation of Jiangsu Province under Grant no. 1501023C, the Key Project of Natural Science Research of Jiangsu University under Grant no. 11KJA520002, and the Natural Science Foundation of Anhui Province (no. 1608085MF127).
