Abstract
In multisink wireless sensor networks, synchronized data collection among multiple sinks is a significant and challenging task. In this paper, we propose an unbalanced threshold based distributed data collection scheme to reconstruct the synchronized sensed data of the whole sensor network in all sinks. The proposed scheme includes the unbalanced threshold based distributed top-
1. Introduction
Wireless sensor network (WSN) is an autonomous wireless network consisting of a large number of tiny, inexpensive, and spatially distributed wireless sensor nodes. It has emerged as one of the most promising technologies for the future [1]. The typical applications of WSN include environment monitoring, industrial process control, intelligent transportation, military surveillance, and health care monitoring [2].
However, wireless sensor nodes have severe resource constraints which include limited computational, memory, communication capacities and nonrenewable energy supply. Therefore, data collection schemes in wireless sensor network should be light weight and energy efficient. Two representative types of data collection schemes in wireless sensor networks are spatial-temporal correlation based data predication schemes [3–5] and distributed source coding schemes [6–8]. In the first type of data collection scheme, a series of spatial-temporal correlation based data prediction algorithms are adopted to prolong the system lifetime by enabling the sink to predict the sensed data based on some historical samples. In the second type of data collection scheme, distributed source coding techniques, such as Slepian-Wolf coding and Wyner-Ziv coding, are designed to compress the multiple correlated sensor data without the need of communication among sensor nodes. Nevertheless, the computational and communication cost of the above two types of data collection schemes are still high. Meanwhile, the parameters of predication model and the distributed coded data are still required to be computed and transmitted within the network frequently.
Recently, compressive sensing (CS) theory has attracted considerable attention in areas of signal processing, computer science, and applied mathematics [9]. It is based on the principle that a sparse signal can be reconstructed from far fewer samples than that required by the classic Shannon-Nyquist sampling theory. Through nonlinear optimization, we can recover many natural signals which can be represented by a few nonzero coefficients under a suitable basis [10]. The inherent characteristics, such as compressibility, robustness, versatility, and stability, have made compressive sensing theory widely applied to communication [11], photography [12], magnetic resonance imaging [13], astronomy [14], and so forth.
As for the data collection problem in wireless sensor networks, a number of compressive sensing based schemes have been proposed in the literature. In 2009, Luo et al. proposed the first complete compressive sensing based data gathering (CDG) scheme for large scale wireless sensor networks [15]. By adapting a weighted sum of all the readings along the routing tree, CDG scheme can achieve substantial communication cost reduction and balanced energy consumption simultaneously. In order to achieve both energy efficiency and recovery fidelity, Xiang et al. proposed the Compressive Data Aggregation (CDA) scheme in WSN [16]. After abstracting the minimum energy compressed data aggregation problem as a NP-completeness problem, they solve it using a mixed integer programming formulation method. Moreover, they also proposed the Dual-lEvel Compressed Aggregation (DECA) framework to recover the in-network aggregated data in the widespread monitoring area by utilizing both the low-rank nature of real-world events and the redundancy in sensed data [17]. By exploiting the spatial and temporal correlation among sensed data, Xu et al. proposed the Spatio-Temporal Hierarchical Data Aggregation Scheme Using Compressive Sensing (ST-HDACS) in [18]. After collecting the sensed data from a subset of randomly selected sensor nodes through a hierarchical routing structure, the fusion center recovers the whole sensed data by solving a matrix completing problem. Furthermore, Tang et al. proposed a robust compressive data gathering scheme to identify outlying sensor readings and derive the corresponding accurate values and further infer broken links in the sensor network [19]. Motivated by the fact that the variance among all the solutions of the blind iterative hard thresholding algorithm with different sparse level can indicate the best sparse level, Xiong and Tang proposed the blind 1-bit compressive sensing reconstruction algorithm for wireless sensor network [20]. Similar research works include the sparse random scheduling based data gathering scheme [21], the Resource Efficient Data Gathering (REDG) scheme [22], the Compressive Sensing Based Data Aggregation Scheme (CSDAS) [23], and the distributed multichain based compressive sensing scheme [24].
Although the above schemes can achieve energy-efficient data collection based on compressive sensing theory, they can only be applied to single-sink wireless sensor networks in a centralized manner. In the scenario of multisink wireless sensor networks, compressed data should be transmitted to one specific sink before executing the data reconstruction algorithm. Then, the recovered data should be distributed back to other sinks through unicast or broadcast messages. Obviously, this kind of centralized pattern is less energy efficient and time consuming. In this paper, we focus on designing compressive sensing based distributed data collection scheme for multisink wireless sensor networks. The proposed scheme includes two correlated algorithms: the unbalanced threshold based distributed top-
The contributions of this paper are threefold.
(1) We propose an unbalanced threshold based distributed data collection scheme in multisink wireless sensor networks based on compressive sensing theory. By decomposing the task of sparse data reconstruction into several correlated parts, all sinks can obtain the synchronized sensed data of the whole network in a distributed and cooperative manner.
(2) We propose an unbalanced threshold based distributed top-
(3) We analyze the data reconstruction performance, communication complexity, and computational complexity of the proposed scheme through experiments. Furthermore, we also compare the performance of the proposed scheme with existing compressive sensing based data collection schemes.
The rest of this paper is organized as follows. Section 2 introduces the basic principles of compressive sensing theory and the iterative hard thresholding algorithm. Sections 3 and 4 describe the unbalanced threshold based distributed top-
2. Preliminaries
Compressive sensing theory asserts that a sparse signal can be recovered from fewer linear and nonadaptive measurements than the number of samples required by Shannon-Nyquist Sampling Theorem. The compressive sampling process can be presented as
One of the most important properties of sensing matrix is the Restrict Isometry Property (RIP). Formally, a matrix A is said to satisfy the RIP of order K with a Restrict Isometry Constant (RIC)
The Basis Pursuit De-Noise (BPDN) [27] and the Least Absolute Shrinkage and Selection Operator (LASSO) [28] are two typical signal recovery frameworks which are wildly used in compressive sensing theory. In BPDN framework, the
A number of data reconstruction algorithms have been proposed in the literature. In general, they can be classified into two categories: optimization algorithms and greedy algorithms. In the optimization algorithms, a number of convex optimization methods have been adopted to search the optimized solution of (6). Basis Pursuit (BP) algorithm [27], Interior Point (IP) algorithm [29], and homotopy algorithm [30] are representatives of this type of reconstruction algorithm. On the contrary, greedy algorithms iteratively select the support sets or the elements of the reconstructed signal based on certain greedy selection strategy. Orthogonal Matching Pursuit (OMP) algorithm [31], Regularized Orthogonal Matching Pursuit (ROMP) algorithm [32], Compressive Sampling Matching Pursuit (CoSmMP) algorithm [33], Subspace Matching Pursuit (SP) algorithm [34], iterative hard thresholding (IHT) algorithm [25], and Iterative Soft Thresholding (IST) algorithm [35] are representatives of greedy algorithm.
Among all proposed data reconstruction algorithms, the iterative hard thresholding algorithm is very easy to implement and can be relatively fast. Meanwhile, it also possesses strong theoretical recovery performance guarantees as convex optimization based algorithms. Essentially, it can be regarded as an iterative method
3. The Unbalanced Threshold Based Distributed Top-
Query Algorithm
In recent years, top-K query, also known as ranking aware query, has received much attention in the areas of relational database system, content distribution network, multimedia retrieval system, and so forth. However, only monotonic aggregation functions can be applied to the top-K query. The largest K absolute elements query in the hard thresholding operator is nonmonotonic. Therefore, the proposed top-K query algorithms, such as thresholding algorithm (TA) [36] and three-phase uniform threshold (TPUT) algorithm [37], cannot be applied to the distributed IHT algorithm directly. We will describe our proposed unbalanced threshold based distributed top-
3.1. Algorithm Design
We assume there are P nodes in the distributed system and each node is assigned with an ID. The node with the lowest ID is designated as the administrator node and other nodes are designated as the member nodes. Each node i maintains a descending ordered list
The core idea of the unbalanced threshold based distributed top-
In the unbalanced threshold computing phase, each member node sends its first K positive elements and last K negative elements to the administrator node. Then, the administrator node can compute the partial sum
In the candidate set computing phase, each member node sends unsent object-value pairs
In the top-
The pseudocode of the unbalanced threshold based distributed top-
Phase I: The unbalanced thresholds computing phase
Computes partial sums Let Computes weights Sends thresholds
Sends
Phase II: The candidate set computing phase
Computes partial sums Let Computes upper bounds Sends candidate set
Sends unsent
Phase III: The top-
Computes whole sums Let
Sends unsent
Lemma 1.
The unbalanced thresholds computed in phase I can guarantee that the true top-
Proof.
Assume that
Lemma 2.
The candidate set computed in phase II can guarantee that the true top-
Proof.
Assume that
Theorem 3.
The unbalanced threshold based distributed top-
Proof.
Through Lemmas 1 and 2, we can prove that true top-
3.2. Example
We will illustrate the unbalanced threshold based distributed top-
An example with 3 nodes.
In the unbalanced thresholds computing phase, member node 1 and member node 2 send
In the candidate set computing phase, member node 1 and member node 2 send
In the top-
The total number of transmitted object-value pairs in our proposed unbalanced threshold based distributed top-
4. The Distributed Iterative Hard Thresholding Algorithm
4.1. Network Model
We assume that P sinks and N sensor nodes are deployed randomly and uniformly in the surveillance area in the multisink wireless sensor network. Each sink is assigned with an ID ranging from 1 to P and each sensor node is assigned with an ID ranging from 1 to N. The sink with the lowest ID 1 is designated as administrator sink and other sinks are designated as member sinks. We assume that an effective multisink routing protocol, such as the dynamic traffic-aware routing protocol [40] or the scalable gradient-based routing protocol [41], is deployed in the network. Each sensor node can transmit its sensed data to the nearest sink in a multihop manner. The network model is shown in Figure 1.

The network model.
In wireless sensor networks, the spatial correlation among sensed data in the surveillance area implies that any sensed data
4.2. Algorithm Design
After setting the initial value of the sensed data
After that, each sink computes the intermediate result
The pseudocode of the distributed iterative hard thresholding algorithm is presented in Algorithm 2.
Computing step size μ according to (16);
All sinks compute All sinks run Algorithm 1 and the administrator sink obtains The administrator sink distributes
Since the distributed top-
5. Experiment and Analysis
We evaluate the performance of our proposed unbalanced threshold based distributed data collection scheme in this section. The simulation is carried out using Matlab R2014b and Wislab simulator on a MacBook Pro laptop computer with dual core i7 CPU and 8 G memory. There are 50~200 wireless sensor nodes and 4~6 sinks randomly deployed in a 400 × 400 m2 surveillance area. Multiple uncorrelated two-dimensional Gaussian distributions have been superposed to simulate the spatial correlated data sources. Note that the Gaussian random matrix is used as the measurement matrix.
We will investigate the data reconstruction accuracy, communication complexity, and computational complexity of our proposed scheme. The signal-to-noise ratio (SNR)
5.1. Performance Analysis
Figures 2, 3, and 4 show the SNR of reconstructed data, the number of transmitted data, and the number of computed data, respectively, at different data sampling rates. We can conclude that the data reconstruction performance improves with the increase on the data sampling rate. Meanwhile, the number of transmitted data and that of computed data increase too. The reason behind this rule lies in the fact that the increase on the data sampling rate would cause more data to be transmitted and computed. Therefore, the recovered sensed data will be more accurate.

The SNR of reconstructed data at different data sampling rates.

The number of transmitted data at different data sampling rates.

The number of computed data at different data sampling rates.
Figures 5, 6, and 7 show the SNR of reconstructed data, the number of transmitted data, and the number of computed data, respectively, in different surveillance scenarios. The number of uncorrelated two-dimensional Gaussian data sources ranges from 2 to 4. We can conclude that the data reconstruction performance degrades with the increase on the number of uncorrelated data sources. Meanwhile, the number of transmitted data and that of computed data increase at the same time. The reason behind this rule lies in the fact that the increase on the number of uncorrelated data sources leads to the increase on the sparsity of sensed data within the network. Therefore, the data reconstruction performance degrades and more data are required to be transmitted and computed.

The SNR of reconstructed data at different number of data sources.

The number of transmitted data at different number of data sources.

The number of computed data at different number of data sources.
Figures 8, 9, and 10 show the SNR of reconstructed data, the number of transmitted data, and the number of computed data, respectively, at different number of sinks. We can conclude that there are no remarkable differences among the data reconstruction performance with different number of sinks. However, the number of transmitted data and that of computed data increase with the increase of the number of sinks. The reason behind this rule lies in the fact that the increase on the number of sinks has no influence on the data reconstruction performance and only requires more data to be transmitted within the network and computed in the sinks.

The SNR of reconstructed data at different number of sinks.

The number of transmitted data at different number of sinks.

The number of computed data at different number of sinks.
5.2. Performance Comparison
We will compare the performance of our proposed scheme with the original iterative hard thresholding algorithm [25], the modified thresholding algorithm base iterative hard thresholding scheme [38], and the modified three-phase uniform threshold based iterative hard thresholding scheme [39] in this subsection. Here, the last two schemes are referred to as the modified TA based DIHT scheme and the modified TPUT based DIHT scheme, respectively.
Figure 11 shows the comparison of the SNR of these four schemes when the data sampling rate is 50%, the number of data sources is 3, and the number of sinks is 5. We can conclude that there are no remarkable differences among the data reconstruction performance of these schemes. In other words, they can provide the similar data recovery accuracy regardless of the centralized or distributed reconstruction pattern.

The SNR of reconstructed data of different schemes.
Figures 12 and 13 show the number of transmitted data and interaction times of these four schemes in the same experiment settings as in Figure 11. Since the iterative hard thresholding algorithm is a centralized data reconstruction algorithm, all data in the member sinks should be transmitted to the administrator sink for reconstruction and then the recovered data would be resent back to the member sinks for synchronization. Therefore, the number of transmitted data of the IHT algorithm is much higher than that of other three schemes although only one interaction is required.

The number of transmitted data of different schemes.

The iteration times of different schemes.
The interaction times of the modified TA based DIHT scheme dominate that of other three algorithms since only one object-value pair is queried in each interaction during the top-
Our proposed unbalanced threshold based distributed data collection scheme requires the fewest data transmission among these four schemes. By adopting unbalanced thresholds, it can adjust the query thresholds adaptively and avoid redundant transmissions between administrator sink and member sinks. Sometimes, all elements in the candidate set S can be obtained in advance in phase II instead of in phase III since reasonable thresholds are computed in the unbalanced threshold based distributed top-
6. Conclusion
In this paper, we proposed the unbalanced threshold based distributed data collection scheme designed for multisink wireless sensor networks. All sinks can obtain the synchronized sensed data of the whole network through distributed sparse signal reconstruction. Meanwhile, we also designed a distributed top-
The data reconstruction accuracy, communication complexity, and computational complexity of the proposed scheme were analyzed in detail through experiments. Furthermore, we also compare the performance of the proposed scheme with existing centralized and distributed data collection schemes in wireless sensor networks. In the future, we would like to design efficient data collection schemes by utilizing the spatial and temporal correlations among sensed data simultaneously.
Footnotes
Conflict of Interests
The authors declare that they have no competing interests.
Acknowledgments
The work in this paper has been supported by the National Natural Science Foundation of China (Grant nos. 61402094, 61300195), the China Scholarship Council Foundation (Grant no. N201408130105), the Research Fund for the Doctoral Program of Higher Education of China (Grant no. 20120042120009), the Natural Science Foundation of Hebei Province (Grant no. F2014501078), and the Science and Technology Support Program of Northeastern University at Qinhuangdao (Grant no. XNK201401). The authors would also like to thank Professor Wenjing Lou for her kindly supports during Dr. Guorui Li's visiting in Virginia Polytechnic Institute and State University.
