Abstract
The existing compressed sensing (CS) based data gathering (CSDG) methods in wireless sensor networks (WSNs) usually assume that the sensed data are sparse or compressible. However, the sparsity of raw sensed data in some case is not straightforward. In this paper, we present reshuffling cluster compressed sensing based data gathering (RCCSDG) method to achieve both energy efficiency and reconstruction accuracy in WSNs. By incorporating CS into the cluster protocol, RCCSDG is able to reduce the energy consumption and support larger networks. Moreover, the sparsity of raw sensed data can be greatly improved by reshuffling pretreatment. A theoretical analysis to energy consumption of cluster head is performed, and the cost of the pretreatment is small enough to be neglected. Based on these natures, the raw sensed data can be recovered from fewer samples. Also, considering the sensed data to be of excellent temporal stability in a short time, we reshuffle them just one time in this stable period to further reduce the energy consumption of WSNs. In addition, the delay of RCCSDG is analyzed based on TDMA2 scheduling scheme. We carry out simulations on real sensor datasets. The results show that the RCCSDG can effectively compress the data transmission and decrease energy consumption of WSNs while ensuring the reconstruction accuracy.
1. Introduction
With the development of wireless sensor networks (WSNs), a wide range of applications of WSNs are being used in many areas, such as climate monitoring, forest fire detection, and habitat and infrastructure monitoring [1]. Data gathering [2] is one of the most essential functions provided by WSNs, where the sensor nodes periodically collect the information of the monitoring area and transmit them to the sink. However, each sensor node of WSNs, being a microelectronic device, can only be equipped with a battery-powered source and cannot be recharged in most cases. The network lifetime is limited by the capacity of battery. A primary challenge of designing data gathering schemes lies in prolonging the network lifetime and not sacrificing the data accuracy.
Because the main energy consumption of sensor nodes is contributed to the data transmission, data compression can extend the network lifetime effectively. Data aggregation techniques [3] deal with large volumes of raw sensed data by some algorithms, and only a small amount of meaningful results is transmitted to the sink. Consequently, it can reduce the data transmission and prolong the lifetime of WSNs. Up to now, many data aggregation techniques have been heavily investigated to reduce the quantity of data to be transmitted. Madden et al. [4] adopt simple data aggregation methods, such as averaging, maximizing, or minimizing, to extract some statistics characteristics of sensed data and get rid of other unnecessary information. This method above is just suitable for the condition of low accuracy. In [5], Ciancio et al. proposed a data compression scheme based on distributed Wavelet transform. After transformation, only a fewer significant coefficients are needed to transmit to the sink and other insignificant coefficients are discarded. But it is not quite suitable to distributed processing because of its computational complexity. The data gathering on the basis of the distributed source coding scheme was put forward in [6, 7]. In this process, each node just needs to code separately and send the compressed information to the sink along the shortest path. However, it requires nodes to know the global correlation structure which is difficult to be obtained in large-scale WSNs.
The emergence of compressed sensing (CS) theory [8–10] has opened up a new research approach for the in-network data aggregation [11–15]. According to CS theory, a sparse signal can be precisely recovered from far fewer samples than Nyquist criterion. This technique provides an effective way for reducing the data transmission by a simple compression at nodes in WSNs. At present, the applications of CS based data gathering (CSDG) in WSNs are mainly concentrated in planar route (linear structure) [16, 17]. And the basic idea of CSDG is illustrated in Figure 1. Using random measurement matrix, every node expands one sampling point into m-dimensional information and sends the product to its parent node. The parent node also expands a sampling point into m-dimensional vector, getting a new m-dimensional vector by adding the m-dimensional information from all its children nodes, and forwards the new vector to the next parent node. If the size of network nodes is n, the load of the entire network is

Compressed data gathering through multihop.
Hierarchical route (cluster) is able to solve the problem of limited scale of planar routing network, so it is available to support larger networks. CS based cluster method [20] integrates adjacent nodes to form a cluster, and then the cluster head (CH) will compress all the sensed data within cluster by linear compressed projection. Because the nodes in cluster only send the reading itself without the requirement of linearly expanding data, the CS based cluster can further compress the data. Existing CS based cluster methods are built on the hypothesis that the raw sensed data are sparse or compressible in some domains such as DCT, infinite difference, or Wavelet. However, such assumption is not entirely tenable for real sensed data. In many practical cases, the sparsity of raw sensed data is not straightforward, which will make the CS based cluster less practical.
In this paper, we propose an efficient reshuffling cluster compressed sensing based data gathering (RCCSDG) scheme to solve the above challenges. Firstly, the LEACH (Low Energy Adaptive Clustering Hierarchy) [21] protocol is adopted to randomly select the cluster heads (CHs) among the whole network and balanced energy consumption for each sensor node. Secondly, we find that the sparsity of data can be greatly improved by the use of a simple pretreatment (reshuffling) on the raw sensed data. Here, the cost of pretreatment is small enough to be neglected. After receiving all the raw sensed data of the cluster, the CHs reshuffle them into ascending order and compress the preprocessing signals by linear compressed projection, and the compressed information will be transmitted to the sink, because the CHs have a certain computational capacity, and the cost of linear compressed projection and pretreatment is not high. So the RCCSDG method can improve the compression ratio dramatically just sacrificing little computational resource. Considering most sensor signals to be of excellent temporal stability in a short time [19], we reshuffle the sensor data only one time and keep the order in this stable period. By this operation, the proposed RCCSDG method can further reduce the energy consumption but ensure the reconstruction accuracy. The main contributions of this paper are listed as follows:
We present an efficient data gathering scheme by introducing the CS theory on the basis of the clustering structure, which can substantially reduce communication overhead and balance the nodes energy consumption. The algorithm based on reshuffling, especially, is capable of improving the sparsity of the data and further reducing the amount of data transmission. Due to the fact that many sensor signals such as temperature and humidity will not change dramatically in a short time period, in this period, the data will be reshuffled just one time and keep the order. This method can reduce the computation burden. We also carried out a theoretical analysis in regard to energy consumption and delay of RCCSDG. RCCSDG is verified by utilizing the real sensed data. The results of simulation show that RCCSDG can effectively reduce the energy consumption and achieve better reconstruction accuracy.
The rest of this paper is organized as follows. In Section 2, we present system model and motivation. Section 3 describes the details of the RCCSDG method we proposed and Section 4 presents the theoretical analysis on the energy consumption of CH and delay of the RCCSDG. The simulation results on both energy consumption and the accuracy of reconstruction are presented in Section 5. Finally, we conclude this paper and discuss future work in Section 6.
2. System Model and Motivations
In the RCCSDG model, we assume that N nodes have been randomly distributed in the sensing area and the proposed scheme implements a two-hop WSN (see Figure 2) that monitors a given physical scalar magnitude (e.g., temperature or humidity) [22]. In practical application, the topology of the WSNs can be abstracted as a weighted undirected graph This is a static network with high density. When the deployment of wireless sensor network is finished, all the sensor nodes and the sink are assumed to be stationary, unless the nodes fail or die. Sensor network is a homogeneous network. In addition to the base station, the other nodes are considered to have equal status and initial energy. Here, it is worthy to notice that the energy of nodes cannot be added in the process of data gathering. All nodes have some storage space and certain capability of data fusion and can take turns to be cluster heads.

System model of RCCSDG.
The system structure of the proposed RCCSDG method is depicted in Figure 2. In first phase, suppose that the W nodes of whole networks would form I clusters according to the clustering mechanisms, each cluster including one CH and (
Under such design mode, all intracluster non-CH nodes only transmit their readings to their CH, and each CH transmits
It is well known that CSDG is able to recover the raw sensed data with high probability from few measurements when the data are sparse or compressible in some certain domains such as DCT, infinite difference, or Wavelet. However, the sparsity of most sensed signals in real world is not perfect. In some actual situations, the sensed data of adjacent nodes are not very uniform or vary greatly although they are close to each other on the physical position. When the sensed signals are not smooth enough, the sparsity of the data is not straightforward and even not sparse enough in transformed domains. As an example, Figure 3(a) shows a 72-dimensional out-of-order signal which has a great deal of volatility and worse smoothness. Obviously, it is not sparse itself. Figures 3(b), 3(c), and 3(d) give the corresponding sparse representation in three transformed domains, respectively. We set a threshold

An out-of-order signal in transformed domains.
Excitingly, we find that the signals are very smooth and have sparsest representation in TV domain when sorted into ascending order by their amplitudes; the results are shown in Figure 4. Figure 4(a) is the results by sorting the same original data of Figure 3(a) into ascending order; Figure 4(b) plots the sparse coefficients of this new signal in TV domain. One can see that most sparse coefficients are close to zero and only 5 values are relatively large; thus its sparsity is approximately 5. To check whether the other sensor data after reshuffling also have good sparsity, we compute the sparsity of light data, humidity data, and voltage data in different domains. In our tests, every type of sensor has 60 groups and every group contains 72 data items. We compute the sparsity of every group and the average sparsity of all groups as a result. The statistical results are presented in Table 1. We found that the sparsity of sensor data after reshuffling is always lower than the TV, DCT, and DWT in all the scenarios under investigation. These results indicate that the sensor data after sorting have a good sparsity in TV domain. Motivated by this investigation, we can improve the sparsity of the data signal through the use of a simple pretreatment on the raw sensed data. When the CH receives the raw sensed data
Statistical average sparsity of sensor data in different domains.

An ascending order signal in TV domain.
For the monitoring applications, the change of sensed data usually varies slowly within short time intervals. In other words, the sensed data have excellent temporal stability in a short time. So, in this short period, it can be assumed that the sparsity of this signal will not change and the sensed data can be arranged in the same order. Utilizing this feature, we just reorder data periodically according to empirical knowledge, which can further reduce energy consumption of the network.
3. Reshuffling Clustering Compressed Sensing Based Data Gathering Method
In this section, we will describe the details of our proposed RCCSDG scheme and its implementation. It consists of three parts:
3.1. Sensing Part
As we know, the cluster route is capable of supporting the large-scale WSNs. In this subsection, we choose the LEACH to solve the problem of limited network scale of planar route. Here, the LEACH is a self-adaptive clustering algorithm whose execution process is cyclical, and each cycle is divided into two stages, namely, the establishment of cluster and data communication.
According to LEACH, the whole network is divided into I clusters and each cluster has one cluster head. The non-CH nodes will independently join the corresponding cluster according to distance and then send the joined message to the CH. When the cluster is set up, all non-CH nodes send the readings to their own CH. And the non-CH nodes only communicate with their own CH directly.
The topology of cluster is useful to the application of distributed algorithms, because it is suitable for the large-scale network. In addition, the clustering algorithm that uses periodic selection of cluster head can effectively balance the network energy consumption and prolong the lifetime of the network.
3.2. Data Compression on CH
Due to all the sensed data in the cluster being collected by the CH, these received signals can be compressed by the CH using CS theory. The premise of CS theory is that the signal is K-sparse in a certain domain, and the amount of measurements M is proportional to sparsity K. CS can turn an N-dimensional signal into M-dimensional (
3.2.1. Reshuffling Algorithm
Assume that
collect the intra-cluster readings
output ascending sequence
3.2.2. Linear Compressed Projection
Compressing the signals of sensors is the principal aim of data aggregation for WSNs. Each CH of the network utilizes the linear compressed projection to realize data compression. After getting the reshuffling preprocessing data
Through the data aggregation, each CH just needs to transmit the measurement vector
Since many sensed signals such as temperature and humidity have excellent temporal stability in a short time and the data sorted into ascending order is more sparse than the original order, as a result, the sensed data collected at next interval can also be considered to be sparse when organized in the same order. But as we know, with the monitoring time growing, the signals will change and the sparsity of them may degrade. When sparsity is poor, more measurements will be needed for accurate reconstruction; otherwise the reconstruction will fail. Meanwhile, reordering the elements of data signal every time to obtain optimal measurements for exact reconstruction will increase extra complexity and energy consumption. To cope with this situation, we update the ordering periodically. We set an updated cycle T according to the prior knowledge firstly. During the updated cycle T, the sensed data will be reshuffled just one time and keep the order. When the time interval between the current acquisition and first order is integer times of T, the CH will update the ordering of
collect the intra-cluster readings using Reshuffling Algorithm get sequence collect readings according to the data structure of linear projection
3.3. Data Recovery
CS theory points out that when the number of measurements M satisfies (3), a K-sparse signal may be exactly reconstructed:
The sink gathers all measurements
Step 1. Step 2. update assign Step 3. Step 4. repeat Step 1, Step 2, Step 3, until
4. Energy Consumption and Delay Analysis of RCCSDG Method
4.1. Energy Consumption Analysis of RCCSDG
The previous section has described how to collect and recover the sensed data in RCCSDG scheme, and this section will investigate the energy consumption of RCCSDG. In the process of data gathering, non-CH nodes transmit their sensed data to the CH that they belong to, and the CH is responsible for aggregating the data and sending the results to the sink. Since the CH nodes are the main contribution to energy consumption in WSNs, thus here we only analyze the energy consumption of the CH. The total energy consumption of the CH is comprised of two main components: data processing energy consumption-
Therefore, the energy consumption of the CH can be formed as
4.1.1. Analysis of
The energy consumption of CPU is determined by the number of operations for signal processing. In other words, energy consumption of the data processing is scaled with the operation during the process of signal processing. In RCCSDG scheme that we proposed, all sensed data within cluster are sorted into ascending order through reshuffling algorithm first and then acquire m measurements through linear compressed projections on CH. So the energy consumption of CH for data processing also includes reshuffling cost (
For reshuffling algorithm, it requires no more than
4.1.2. Analysis of
In the process of data communication, the CH receives the sensed data from all non-CH nodes and transmits the compression information to the sink. Thus the energy consumption of
From (10), we can conclude that the energy consumption for transmission,
4.2. Delay Analysis of RCCSDG
In this section, we analyze the delay of RCCSDG. The analysis can be done in a way similar to [29]. Recall that the proposed model implements two-hop WSNs based on TDMA2 scheduling scheme. In this scheme, each node within the same cluster sends data to cluster by TDMA-1 and the cluster headers compress the signals and transmit the random projections to sink by TDMA-2. The processing schedule in one round is shown in Figure 5.

The pipeline TDMA2 scheduling scheme.
To make the delay of data gathering as short as possible, we adopt a pipeline TDMA2 scheduling scheme composed of three phases.
First Phase. The sink finds cluster
Second Phase. Each node of cluster sends the data to the cluster head by TDMA scheduling method. After receiving the data of all nodes in this cluster, the cluster head compresses these data by random projection and the cluster head of
Third Phase. After the cluster head of the
Definition 1. The delay of data gathering D is the time when the last random measurement reaches the sink. The delay of RCCSDG based on pipeline TDMA2 scheduling scheme is
From (12), we can conclude that the worst delay case is partitioned to one cluster only and the best delay case is divided into I uniform clusters. So the delay D of RCCSDG is
5. Simulation Results
In the following subsection, we first verify the efficiency of the RCCSDG scheme by energy consumption simulation and then evaluate reconstruction accuracy over real sensed data. The numerical results show that the RCCSDG method is effective in reducing energy consumption in deed. Furthermore, the reconstruction results also demonstrate the better performance of the RCCSDG method.
5.1. Energy Consumption Simulation
The RCCSDG method can effectively reduce the energy consumption by sacrificing a small amount of computing resource. And during the data gathering, the CH, which plays an important role in data processing and data transmission, is the main aspect of network energy consumption. In this subsection, we will give the simulation results of CH energy consumption and verify the efficiency of the RCCSDG method. Table 2 lists the main parameters used in the simulations.
Simulation parameters.
To reduce the energy consumption of CH, we need to know the factors which influence the energy consumption of CH. Here, we have conducted a numerical analysis to the factors that may influence energy consumption of the CH in terms of data transmission and data processing, and the results are shown in Figure 6. From (10), it indicates that the energy consumption of data transmission is determined by the transmission distance and the number of measurements when the size of cluster is fixed. And this is also shown in Figure 6(a). The energy consumption of data transmission is proportional to the amount of measurements when the distance is constant. And the closer the distance between node and CH is, the faster the energy is consumed. In particular, when the distance is greater than threshold

Transmission energy consumption and computation energy consumption of CH.
By using a pretreatment on original data at CH, the RCCSDG we proposed can decrease the measurements to be transmitted, but it introduces some computation cost. To validate the validity of this scheme, we need to demonstrate that the added computation is far smaller than the reduced transmission. For this, we plot in Figure 7 the energy consumption of transmission and computation of RCCSDG and conventional CS scheme when we set

Energy consumption of RCCSDG and conventional CS scheme.
Since most signals have excellent temporal stability in a short time, the ordering of data is refreshed periodically. Note that ordering process takes place at CH and only affects the energy consumption of computation. The blue line in Figure 7 represents the average computation dissipation when refreshing the ordering every T (here
The results of simulation above show that the energy consumption of CH is mainly contributed to the data transmission, and the energy consumption for signal processing is so small that it can be neglected. So the RCCSDG method can be used to reduce the energy consumption although it increases a little computational complexity.
5.2. Reconstruction Performance Simulation
In order to verify the reconstruction performance of the RCCSDG method we proposed, we use the following two evaluation criterions (RMSE and PSNR) in this paper. Formally, the RMSE (root-mean-square error) is defined as (14), and the PSNR (peak signal-to-noise ratio) is defined as (15):
In our paper, we carry out simulation on real sensed datasets. The datasets contain 2666 humidity readings from 31 sensor nodes and they are collected by the Department of Information Engineering (DEI) of the University of Padova on March 24th, 2009. According to CS theory, the original data can be recovered at the sink by some algorithms.
Firstly, we compare the recovery performance of RCCSDG with other data collection schemes at moment

Comparing reconstruction RMSE of RCCSDG, CSDG-OOMP, and CSDG-TV at moment

Comparing reconstruction PSNR of RCCSDG, CSDG-OOMP, and CSDG-TV at moment
Since the humidity data are not varied much in a short time and smooth when collected in the same order at next collecting moment, as shown in Figure 10, the sparsity of data can be regarded as not changed at this moment when data are arranged in the same order. But as monitoring time increases, the smoothness of the data will become worse, thus impacting the precision of data reconstruction. To handle this problem, the RCCSDG method rearranges the data ordering periodically based on the a priori knowledge. During this period, sensor data are reshuffled just one time and keep the order. By reordering data periodically, the RCCSDG method can make the sparsity of data always stay in a proper range, which can ensure achievement of accurate reconstruction with a small number of measurements. This is depicted by the simulation results in Figure 11. It shows the PSNR of data recovery with compression ratio

Sensor readings first ordered at time t and sensor readings at

The comparison of reconstruction results in a period of time at ratio = 0.4.
6. Conclusion
This paper describes an energy-efficient data gathering scheme for WSNs by reshuffling cluster compressed sensing as described. We have found that the sensed data arranged into ascending order have better sparsity. Based on this principle, the cluster heads just adopt a simple preprocessing on original data to reshuffle the data into ascending order, which can greatly improve the sparsity and effectively minimize the amount of data transmission. We have investigated that the additional computation for preprocessing is small enough to be neglected. Besides, most sensor signals have excellent temporal stability in WSNs; we only update the ascending order of data periodically. By incorporating the temporal correlation, the energy consumption can be further reduced while guaranteeing the data reconstruction accuracy. Also we have demonstrated the theoretical analysis of the energy consumption and delay in detail when adopting the RCCSDG scheme. The simulation results based on real sensor data that validate the energy efficacy and reconstruction accuracy of the RCCSDG scheme have been proposed. Considering the fact that the sensor data also contains low-rank structure information, our future work will investigate matrix completion to further improve the reconstruction accuracy and reduce the computational complexity so as to conserve the energy and ulteriorly extend the lifetime of network.
Footnotes
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
Acknowledgment
The work was supported by the National Nature Science Foundation of China (no. 61162015 and no. 31101081) and the Science and Technology Supported Project of Jiangxi Provincial (no. 20151BBE50095).
