Energy-Balanced Data Gathering and Aggregating in WSNs: A Compressed Sensing Scheme

Abstract

Compressed sensing (CS) is an emerging sampling technique by which the data sampling and aggregating can be done simultaneously, which can be applied to many fields, including data processing in wireless sensor networks (WSNs). In WSNs, data aggregating can reduce data transmission cost and improve energy efficiency. Existing CS-based data gathering work in WSNs utilizes the centralized method to process the data by a sink node, which causes the load imbalance and “coverage hole” problems, and so forth. In this paper, we propose an energy-balanced data gathering and aggregating (EDGA) scheme that integrates a clustering hierarchical structure with the CS to optimize and balance the amount of data transmitted. We also design a data reconstruction algorithm to perform data recovery tasks by utilizing the orthogonal matching pursuit theory, which helps to reconstruct the original data accurately and effectively at sink node. The advantages of the proposed scheme compared with other state-of-the-art related methods are measured on the metrics of data recovery ratio and energy efficiency. We implement our scheme on a simulation platform using a real dataset from Intel lab. Simulation results demonstrate that the proposed data gathering and aggregating scheme guarantees accurate data reconstruction performance and obtains energy efficiency significantly compared to existing methods.

1. Introduction

Wireless sensor networks (WSNs) have received huge research focus for their wide applications, including battlefield surveillance, environmental monitoring, medical diagnosis, and wildlife exploration [1]. Due to the limitations of size and cost, sensors are usually equipped with limited and nonrechargeable power source. Therefore, how to save energy for prolonging network lifetime turns into an important issue in WSNs.

Data aggregation is considered as an effective method for WSNs to improve energy efficiency [2, 3]. It is the process of gathering and reconstructing the data from multiple sensors to eliminate redundant data and provide fused data to sink node [4]. Many data fusion approaches are proposed to solve this problem for WSNs. The cluster-based data aggregation is one of the most important kinds for its advantages on energy and load balance [5–7]. For cluster-based approaches, sensor nodes are grouped into some clusters and each cluster consists of a cluster-head (CH) and members. Every CH node gathers and aggregates the sensed data of its members and, then, sends the fused data to sink node directly or by the multihop route of CH node. Figure 1(a) shows a WSN where sensor nodes are densely deployed in the monitoring area to detect the environmental phenomena based on clustering architecture.

Figure 1

An example of data gather and aggregation in WSNs, (a) data gather and transmission in cluster-based network architecture, and (b) data aggregating from CH₁ to sink.

Compressed sensing (CS) [8–11] is a collection of recently proposed sampling methods in Information Theory. The promise of compressed sensing is that it can obtain a sufficiently accurate approximation of an unknown data field by using a small number of generalized measurements, which are known as projections in the compressed sensing literature. The objective of data compressing is twofold: compress sensor readings to reduce global data traffic and distribute energy consumption evenly to prolong network lifetime. Figure 1(b) shows a round CS measurement gathering process from CH₁ node to sink under multiple-hop topology. Each sensor sends data packets to its next-hop sensor.

When compressed sensing is applied to in-network data compression, it will bring a wealth of similar benefits as distributed source coding including simple encoding process, saving of internode data exchange, and decoupling of compression from routing. Furthermore, compressed sensing has two additional advantages. First, it can deal with abnormal sensor readings gracefully. Second, data reconstruction is not sensitive to packet losses. In compressed sensing, all messages received by sink node are equally important. However, in distributed source coding, received data are predefined as main or side information. Losing main information causes fatal errors to the decoder. All these desired merits make compressed sensing a promising solution to the data gathering and aggregation problem in large-scale wireless sensor networks.

In order to improve energy efficiency of wireless sensor network, we proposed an energy-balanced data gathering and aggregating scheme based on CS in [12]. In this paper, we improve this scheme using a cluster-based method to minimize and balance data traffic. The two important issues for our scheme are to divide the clusters with a lower traffic cost and reconstruct the original data with a lower data loss provability. So, we first propose a clustering method to minimize the data traffic in intracluster. Then, we use CS method to design a distributed data gathering and aggregating scheme in intercluster. Our scheme includes three steps. (1)

Member nodes send data directly to the CH node using time division multiple access, as the data traffic is very low in intracluster.

(2)

Once receiving data collected by their cluster, CH node runs the necessary data aggregation tasks using CS.

(3)

CH node routes the processed data toward sink node over the relay nodes (CH).

In summary, this paper makes a number of major contribution as follows: (i)

We study the network clustering problem for improving and balancing the energy efficiency of WSNs, which provides a new insight into the way we can get a distributed data gathering method.

(ii)

We propose an energy-balanced data gathering and aggregating (called EDGA) scheme based on cluster and compressive sensing theories. The data is gathered and aggregated in intracluster and intercluster, respectively, which reduce the traffic cost of network.

(iii)

We design a data reconstruction algorithm to recover the missing data in the incomplete dataset accurately.

(iv)

We evaluate our scheme via simulations and real dataset collected by Intel indoor test data. The results demonstrate that our scheme is significantly effective in terms of energy efficiency and data recovery ratio compared with existing methods.

The remainder of this paper is organized as follows. Section 2 surveys related work. Section 3 explains preliminaries and models and shows how to solve this problem exactly under considering the constraints. In Section 4, we design an energy-balanced data gathering and aggregating scheme based on compressed sensing theory. The performance of the proposed scheme is evaluated and compared with existing methods in Section 5. We conclude this paper in Section 6.

2. Related Work

With the emergence of compressed sampling theory [10, 13, 14], we have seen a new avenue of research in the field of in-network data compression. Compressed wireless sensing (CWS) [3] appears to be able to reduce the latency of data gathering in a single-hop network by delivering linear projections of sensor readings through synchronized amplitude modulated analog transmissions. Due to the difficulties in analog synchronization, CWS is less practical for large-scale sensor networks. Multichannel singular spectrum analysis (MSSA) [15] is a nonparametric and adaptive method based on the lag-covariance matrix. It is used to produce geographic data reconstruction.

Ji et al. [16] proposed a Bayesian compressed sensing (BCS) algorithm as dealing with images is high-dimensional problem. Chou et al. proposed an energy efficient information collection (EEIC) algorithm [14] to collect information from sensor networks taking into account both energy consumption and the amount of information in the sensing data. In an overview paper, Haupt et al. [17] also speculated the potential of using compressed sampling theory for data aggregation in a multihop WSN. However, no real scheme has been reported based on this initial idea. Xu et al. [18] proposed a compressed sparse functions approach to collect data through the use of CS techniques.

For improving the energy efficiency, some cluster-based methods were proposed to balancer energy dissipation. Xie and Jia [7] studied the relationship between the size of clusters and number of transmissions and proposed a centralized clustering algorithm to aggregate the sensed readings. Liu et al. [19] presented a new data aggregation method to reduce communication cost based on CS and cluster through the collection of a small number of samples at a data gathering point. The CH node processes the data based on a nonpersistent CSMA (Carrier-Sense Multiple Access) MAC protocol. This improved clustering method is based on the LEACH (Low-Energy Adaptive Clustering Hierarchy) [20], a clustering-based protocol that utilizes randomized rotation of local cluster base stations (CH) to evenly distribute the energy load among the sensors in the network.

Existing CS-based data reconstruction methods are not fully suitable for service-oriented wireless sensor networks due to two reasons: (1) most computations take place at sink node rather than sensors, which causes the load imbalance and “coverage hold” problems; (2) CS-based methods usually require the data to have inherent structures. Features that are extracted from internet traffic [21] are not applicable. To solve the above challenges, an effective data recovery method is required to improve the system performance in terms of data reconstruction accuracy, load balance, and network lifetime.

3. Preliminaries and Models

In this section, we first present the objectives of energy-balanced data gathering and aggregating scheme. Then, we state the related terminologies and problems definition.

3.1. Objectives

We assume the network has the following properties: (i)

The network is modeled as a communication graph $G = (V, E)$ , where V is the set of sensor nodes, $1,2, \dots, N$ member of V. E is the set of edges where an edge exists between two sensor nodes if they are within the communication range of each other. The sensor network is homogeneous; that is, the sensing range and the communication range of all the sensor nodes are the same, respectively.

(ii)

The sensing range of all the sensor nodes are modeled as the directional sensing model shown as in Figure 2. That is, the sensing range is a sector where the location of sensor is the center, and the maximal radius is $r_{s}$ , and the sensing angle at the center is α [22].

(iii)

The network is divided into some cluster grids ( $C_{1}, C_{2}, \dots, C_{k}$ ), and a sensor is selected as cluster-head in each cluster grid.

(iv)

The sensors are deployed in random and uniform way, and the whole network is time synchronized.

Figure 2

Directional sensing model.

In physical environmental data reconstruction system, sensors are deployed in the monitoring area. They sense and send data to sink node via CH nodes periodically over a given time slot. For simplicity, we suppose total $N$ sensors are deployed.

The timeline of system consists of some time round, shown as in Figure 3. The nodes gather and aggregate the targets or environment information data to sink node in working phase. Each working phase is divided into t time slots. After sensors are deployed, each sensor is required to report its sensory data once per time through multihop wireless communications. $x (i, j)$ denotes the sensory data of sensor $S_{i}$ at time slot j, where $i = 1,2, \dots, N$ and $j = 1,2, \dots, t$ .

Figure 3

Timeline of network.

Some mathematical notations that are used throughout this paper are given in Notations.

3.2. Definitions

To describe our problem clearly, we give some definitions in this subsection.

Definition 1 (sensing model).

Sensor's sensing model is defined as a quadruple $〈 O, r_{s}, α, β 〉$ as shown in Figure 2, where O means the location of sensor corresponding to its coordinates in 2-dimensional surface; $r_{s}$ is the sensing radius of sensor; α is the sensing angle of squint; $2 α$ denotes the sensing view angle, β is the direction parameter of sensor location ( $0 \leq β < 2 π$ ).

Definition 2 (coverage area (SA)).

Sensor's coverage area is defined by

\begin{matrix} {SA}_{i} = \{‖x - S_{i}‖ \leq r_{s}, |θ_{i} (x)| < α | x \in Ω\}, \end{matrix}

(1)

where

‖x - S_{i}‖

is the x is Euclidean distance between any point x and sensor

S_{i}

in Figure 2.

Definition 3 (measurement matrix (MM)).

Measurement matrix is a mathematical method to describe the dynamic environment data sensor sensed. MM is defined by

\begin{matrix} X = x {(i, j)}_{n \times t} . \end{matrix}

(2)

So, MM is a matrix consisting of n rows and t columns. Each data element in the matrix is collected validly.

Definition 4 (reconstructed matrix (RM)).

Reconstructed matrix is generated by using the proposed data reconstruction algorithm (DR-OMP) to approximate MM. RM is defined by

\begin{matrix} X^{'} = x^{'} {(i^{'}, j^{'})}_{n \times t} . \end{matrix}

(3)

3.3. Problem Definition

In WSNs, in order to improve communication efficiency, the sensed data is sent to sink node after compression using the traditional methods, which causes a larger data error between the reconstructed data and the original data. Based on the above definitions, our goal is to find an optimal $X^{'}$ , which approximates the original X as closely as possible. That is,

\begin{matrix} Objective: \min_{x \in R^{N}} {‖X - X^{'}‖}_{l_{1}}, \end{matrix}

(4)

where

‖\cdot‖

is the Euclid norm used to measure the error between X and

X^{'}

, and

\begin{matrix} ‖X‖ = \sqrt{\sum_{i, j}^{} {(x (x, j))}^{2}} . \end{matrix}

(5)

3.4. Energy Consumption Model

For the estimation of the transmission energy cost, we use a standard transmission model proposed by Heinzelman et al. [20]. Such a model assumes that the energy per bit for transmission over a wireless link is a function of the distance between a transmitter and a receiver. Let $E_{T x} (L, d)$ and $E_{R x} (L)$ be the energy consumption for transmitting or receiving a L-bit message over distance d, respectively. Consider the following:

\begin{matrix} E_{T x} (L, d) = E_{T -elec} \times L + ϵ_{amp} \times L \times d^{2}, \\ E_{R x} (L) = E_{R -elec} \times L, \end{matrix}

(6)

where

E_{T -elec}

and

E_{R -elec}

are the energy consumption for transmitting and receiving one bit message, respectively.

ϵ_{amp}

is the transmission amplifier.

4. Energy-Balanced Data Gathering and Aggregating Scheme

We design an energy-balanced data gathering and aggregating scheme in this section, EDGA for short. We first introduce network clustering for data gathering and process based on CS theory. Then, a data reconstruction algorithm based on Orthogonal Matching Pursuit is presented in detail, which is used to recover the missing data in the incomplete dataset accurately.

4.1. Calculating Coverage Area of Nodes for Clustering

Under the requirements of monitoring tasks, the whole monitoring area is divided into some virtual monitoring grids ( $C_{1}, C_{2}, \dots, C_{k}$ ) based on the location and communication range of nodes. Any one node's communication range covers its adjacent node's coverage area. So, two adjacent nodes can communicate with each other directly. Any node can get the whole location of monitoring area based on route algorithm. A node can determine which cluster it is assigned by calculating its coverage area. The nodes that are assigned to the same monitoring grid form a cluster. In order to improve the performance of clustering, some factors need to be considered in clustering procedure. (i)

Key targets or points: each key monitoring target or point should be covered by at least one cluster.

(ii)

Node distribution: the scale of node should meet the requirements of network on coverage and connectivity [23]. Each cluster consists of some sensor nodes. We compare the network performance under different node distributions; that is, network is sparse, mediated, and dense environment.

(iii)

Network coverage: as the number of clusters decreases proportionately with the effective sensing radius of sensors, the cluster's size should be set lower when the effective sensing radius is larger.

(iv)

Network communication: any two nodes in the same cluster and CH nodes in adjacent clusters can be communicated reliably.

For any one sensor node $S_{i}$ , we calculate its coverage area $C A_{i}^{j}$ in its virtual monitoring grid $C_{j}$ . The coverage area is shown as in Figure 4. Then, node $S_{i}$ determines which virtual monitoring grid $C_{j}$ it belongs to, on the basis of its maximal coverage area $C A_{i}^{j}$ . Finally, all clusters are formed following each virtual monitoring grid ( $C_{1}, C_{2}, \dots, C_{k}$ ).

Figure 4

An illustration of node's coverage area.

4.2. Cluster-Head Election

In cluster-based sensor network, CH node consumes more energy compared with other member nodes. If a node takes the role of CH in multiple time round, its energy will be depleted over time, which easily causes the monitoring area that cannot be covered by nodes, that is, “coverage hole.” Therefore, it is necessary that CH be elected in each time round. The node with more residual energy should be considered as CH candidate. The CH election mechanism is based on a distributed algorithm. For any one node, it elects to be CH by sending election messages following a given probability p. To balance energy consumption of network, the probability of a node $S_{i}$ to be cluster-head $P_{c h}^{i}$ is calculated by the following [24]:

\begin{matrix} P_{ch}^{i} = \max (\frac{k}{N} \times \frac{E_{r}^{i}}{E_{ini}}, \frac{k}{N} \times \frac{E_{\min}}{E_{ini}}), \end{matrix}

(7)

where k and

N

denote the total number of clusters and nodes, respectively;

E_{r}^{i}

means the residual energy of node

S_{i}

at current time;

E_{i n i}

is the initial energy;

E_{m i n}

is the energy threshold of cluster-head. If the residual energy of a node

E_{r}^{i}

is less than

E_{m i n}

, this node does not elect to be cluster-head. The value of

E_{m i n}

is determined by the total energy of a cluster-head consumed after finishing a data packet gathering, compressing, and transmitting one time.

In order to balance and save energy consumption of system, time round and state transition mechanism are used [25]. The lifeline of network is divided into some time rounds, shown as in Figure 3. Each round consists of initial phase and working phase. In initial phase, node $S_{i}$ sends a “vote” message to elect as a CH following the probability $P_{c h}^{i}$ . If the message transmission is not successful, node $S_{i}$ transforms into “listening” status. A new round election will run if all “vote” message transmission is not successful in a short time slice (that means the time threshold). If the message transmission of node $S_{i}$ is successful, it sends an “invitation” message $S_{M}^{i}$ ( $S_{M}^{i} = (E_{r}^{i}, P_{c h}^{i})$ ) to its neighbor node in the cluster for indicating $S_{i}$ is CH. The other nodes join this cluster after receiving the message $S_{M}^{i}$ . Once the CH is determined, it allocates channel time to its member nodes using TDMA mechanism based on the number of member nodes. This avoids the communication collision within the cluster. In working phase, CH node first processes the readings received form its member nodes and obtains the targets or environmental information. Then, CH node aggregates it with the other data received from other CH nodes and sends to sink node via other CH nodes if it is not the directly adjacent CH node of sink. The detailed data gathering and aggregating process is stated in the following.

4.3. Data Gather and Aggregation

In the process of data gathering, the readings are transmitted from CH nodes to sink node as illustrated in Figure 1(b). Each reading proceeds with a weighted sum via each node along its communication route. First, CH₁ sends $ϕ_{11} d_{1}$ to CH₂ where $ϕ_{11}$ is a random number, then CH₂ sends $ϕ_{11} d_{1} + ϕ_{12} d_{2}$ to CH₃ until sink node receives the last aggregated value. The kth weighted sum is denoted by

\begin{matrix} y_{i} = \sum_{i = 1}^{k} ϕ_{1 i} d_{i} . \end{matrix}

(8)

Finally, sink node can get M weighted sum $\{y_{i}\}$ , $i = 1,2, \dots, M$ . Mathematically, we can obtain

\begin{matrix} [\begin{bmatrix} y_{1} \\ y_{2} \\ ⋮ \\ y_{M} \end{bmatrix}] = [\begin{bmatrix} ϕ_{11} & ϕ_{12} & \dots & ϕ_{1 N} \\ ϕ_{21} & ϕ_{22} & \dots & ϕ_{2 N} \\ ⋮ & ⋮ & ⋮ & ⋮ \\ ϕ_{M 1} & ϕ_{M 2} & \dots & ϕ_{M N} \end{bmatrix}] [\begin{bmatrix} d_{1} \\ d_{2} \\ ⋮ \\ d_{N} \end{bmatrix}] . \end{matrix}

(9)

Based on CS theory, using the M weighted sum can reconstruct the N node's original data. The total number of transmission is $M \times N$ in this way. As $M ≪ N$ , we can see that using CS-based data compression method reduces the amount of data transmission compared with the traditional way. Moreover, as the energy consumption of network is balanced in each cluster, this approach decreases the total energy consumption of network and optimizes the load balance. Also, as the spatial and temporal correlations of measurement data [26], there exists a transform domain in which the signal is sparse. Based on this assumption, we will explain whether the set of linear equations are solvable, and how these equations can be solved in the next subsection.

4.4. Spatial Correlated Data Recovery

According to compressed sensing theory, a sparse signal can be reconstructed from a small number of measurements with a probability close to one. An N-dimensional signal is considered as a M-sparse signal if there exists a domain in which this signal can be denoted by M ( $M ≪ N$ ) nonzero coefficients. Because of the signal correlation, it can be described more compactly in transform domains. So, we use Discrete Cosine Transformation (DCT) for sparse transformation in this paper because it has a strong “energy compaction” property [27]: most of the signal information tends to be concentrated in a few low-frequency components of the DCT. DCT transformation function is shown as follows:

\begin{matrix} f_{m} = \sum_{i = 1}^{n} x [i] \cos [\frac{π}{n} k (i + \frac{1}{2})], m = 1,2, \dots, n . \end{matrix}

(10)

With sufficient number of measurements, sink node can reconstruct sensor readings through solving an $l_{1}$ optimization problem in (4). $l_{1}$ optimization problem can be solved with linear programming (LP) techniques [28]. Although the reconstruction complexity of LP based decoder is polynomial, it goes pretty high when N is too large. While there is a large body of on-going work looking for low-complexity reconstruction techniques [29, 30], this topic is beyond the scope of this paper. With the current LP based decoder, we would suggest that the size of N does not exceed 1000.

4.5. Data Reconstruction Algorithm Design

We propose a data reconstruction algorithm based on Orthogonal Matching Pursuit to recovery original data, DR-OMP for short. The pseudo-code of DR-OMP is shown as Algorithm 1.

Algorithm 1: DR-OMP for data reconstruction.

Input: An $N \times t$ measurement matrix Φ; an N-dimensional data vector v; the sparsity level m of the ideal data.

Output: An estimate $s^{'}$ in $R^{N}$ for the ideal data; an N-dimensional approximation $a_{m}$ of the data vector v;

an N-dimensional residual $r_{m} = v - a_{m}$ ; a set $Λ_{m}$ containing m elements from ( $1,2, \dots, d$ )

(1) $r_{0} \leftarrow 0$ , $Λ_{0} \leftarrow \emptyset$ , $i \leftarrow 0$ ; // Initialization

(2) while $i < m$ do

(3) $λ_{i} = a r g$ $ma x_{j = 1,2, \dots, t} | 〈 r_{i - 1}, φ_{j} 〉 |$ ; // Find the maximum.

(4) $Λ_{i} \leftarrow Λ_{i - 1} \cup λ_{i}$ ;

(5) Determine the orthogonal projector $P_{i}$ onto span $\{φ_{λ} : λ \in Λ_{i} \}$ ;

(6) $a_{i} = P_{i} v$ ; // Calculate new approximation and residual.

(7) $r_{i} = v - a_{i}$ ;

(8) $i = i + 1$ ;

(9) end

(10) $a_{m} = Σ_{λ \in Λ_{m}} s_{λ}^{'} φ_{λ}$ ; // Estimate $s^{'}$ .

Our DR-OMP sparse approximation algorithm is used for data reconstruction. To identify the ideal data s, we need to determine which column of ϕ participates in the measurement vector v. The idea behind the algorithm is to pick columns in a greedy fashion. At each iteration, we choose the column of ϕ that is most strongly correlated with the remaining part of v. Then, we subtract its contribution to v and iterate $a_{i}$ on the residual. One hopes that, after m iterations, the algorithm will have identified the correct set of columns.

We analyze the complexity of Algorithm 1. The key operation is the procedure for computing the inverse matrix, which provide the best approximate result. This procedure is completed by a matrix multiplication. Therefore, its time complexity is $O (N t)$ . As this algorithm repeats m times, the total complexity is $O (m N t)$ .

5. Performance Evaluations

In this section, simulations and experiments based on real dataset are conducted to evaluate our design under different network settings and reveal insights of system performance.

5.1. Simulation Settings and Dataset

We first conduct simulations using OMNET++ 4.2 [31] and MATLAB to evaluate our framework and algorithm. The sensors are deployed on 200 m × 200 m randomly and uniformly ( $N = 100,150,200$ ). The whole monitoring area is divided into 16 virtual grids, and each grid area is 50 m × 50 m. The other parameters of simulation are showed in Table 1.

Table 1

Simulation parameters.

Parameter description	Value
Monitoring area size	200 m × 200 m
Sensing range ( $r_{s}$ )	50 m
Communication range	100 m
Number of nodes ( $N$ )	100, 150, 200
Sensing angle of node (α)	$π / 4$
Initial energy of node ( $E_{ini}$ )	5 J
Size of data packet	80 B
Energy parameter ( $E_{T -elec}$ )	50 nJ/bit
Energy parameter ( $E_{R -elec}$ )	50 nJ/bit
Energy parameter ( $ϵ_{amp}$ )	10 pJ/bit/m²

The second experiment is based on a real WSN system from the Intel Berkeley Research lab [32]. The real data are gathered by Intel indoor experiment built on TinyOS platform. Mica2Dot sensors with weather boards collected timestamped topology information, along with temperature, humidity, and light once every 31 seconds.

To verify the performance of EDGA scheme, we compare it with other approaches, which include compressed sensing (CS) [33], multichannel singular spectrum analysis (MSSA) [15], energy efficient information collection (EEIC) [14].

The performance of proposed EDGA scheme is evaluated with respect to different system parameters, for example, cluster number (k), network lifetime, and data loss probability. Through these experiments, comparison between different benchmarks and our proposed EDGA scheme is used to demonstrate the performance of our design.

5.2. Experimental Results and Analysis

In the first set of simulations, we implement our method considering two methods MSSA and EEIC in terms of network lifetime, the number of cluster (k), and residual energy. Table 2 lists the cluster numbers and its corresponding sizes. Three node distributions represent the network that is sparse, mediated, and dense when the cluster number $k = 16,25$ and 40, respectively.

Table 2

Three types of cluster size.

Total cluster number	Cluster size
$k = 40$	50 m × 20 m
$k = 25$	40 m × 40 m
$k = 16$	50 m × 50 m

Figures 5(a)–5(c) demonstrate the network lifetime under different node distributions. In the sparse network settings, our proposed EDGA scheme increases by $15.9 %$ and $30.6 %$ in terms of the network lifetime compared with MSSA and EEIC, respectively. In the dense and mediated network settings, EDGA scheme increases by $25.1 %$ and $76.6 %$ , $21.1 %$ , and $54.2 %$ in terms of the network lifetime compared with MSSA and EEIC, respectively. By comparing the lifetime round of three algorithms, the performance of EDGA scheme is the best on the metric of network lifetime compared with the other two algorithms. The reason is that EDGA cuts the data communication volumes by data aggregation, and the energy of network is balanced by network clustering. Moreover, we compare the residual energy of nodes when the number of dead node is 10, shown as in Figure 6. Each item represents an average of 20 experimental values based on random selection.

Figure 5

Comparison of the three algorithms in terms of network lifetime.

Figure 6

Comparison on the residual energy of nodes ( $N = 100$ ).

Figure 7 shows the affection of the cluster size to the network lifetime. When cluster number $k = 16$ , the network obtains the maximal lifetime compared with that of $k = 25$ and $k = 40$ . With the increment of cluster number, the correlation of data is decreased, which causes degradation of data aggregation on performance. Therefore, set an appropriate k value that helps the network to obtain an optimal performance.

Figure 7

Comparison of network lifetimes under different cluster sizes ( $N = 100$ ).

In the second set of simulations, we implement our algorithm considering three methods CS, MSSA, and EEIC in terms of error ratio on data reconstruction. Error rate (η) is defined as a metric for measuring the reconstruction error after interpolation shown as

\begin{matrix} η = \frac{\sqrt{\sum_{i, j}^{} {(x (i, j) - x^{'} (i^{'}, j^{'}))}^{2}}}{\sqrt{\sum_{i, j}^{} {(x (i, j))}^{2}}} . \end{matrix}

(11)

Figures 8(a)–8(c) plot the error ratio on indoor temperature, humidity, and light. The data loss rate ranged from $10 %$ to 90%. A hight value of the loss rate means most of the data are lost. From Figure 8(a), we can see that EDGA scheme obtains the best performance compared with the other three algorithms. Even 90% data has been lost; EDGA also can reconstruct the temperature data with $η < 10 %$ . While error ratio of CS is close to 20%, EEIC and MSSA are about 40% and 60%, respectively. The error ratio of data reconstruction is raising with the increment of data loss probability.

Figure 8

Comparison of the four algorithms under different data loss probabilities.

Figures 8(b) and 8(c) show the error ratio on humidity and light. The performance of EDGA scheme on humility and light are similar with that on temperature. EDGA scheme can achieve the best performance on environment humility and light data reconstruction among the four algorithms. However, the advantage of EDGA on humidity and light is less significant than that on temperature data reconstruction for the reason that the fluctuation of humility and light are not large in a short interval.

Therefore, EDGA scheme achieves a better performance compared with the other three algorithms in terms of network lifetime and error ratio of data recovery.

6. Conclusion

In this paper, our intention was to demonstrate a new way of data gathering and aggregation in wireless sensor networks. First, we studied the network clustering issue for optimizing and balancing the energy efficiency of network, which provides a new insight into the way we can get a distributed data gathering method. Second, we proposed an energy-balanced data gathering and aggregating scheme to improve data reconstruction accuracy based on compressed sensing theory, where observed correlations between nodes have been used to reconstruct the original data, and reduce the traffic cost of network accordingly. Third, we evaluated the proposed scheme via both simulations and a real database to validate its effectiveness and efficiency. The results demonstrated that our scheme achieved better reconstruction accuracy with less than 20% error in face of 90% data missing probability.

This work leaves at least two open issues in the future. One issue is to exploit the space and time correlations between data factors for improving the accuracy of data reconstruction. Another issue is how to optimize the accuracy and complexity on data reconstruction. This may help to meet both QoS and energy efficiency requirements in wireless sensor networks.

Footnotes

Notations

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

This work is supported by China Postdoctoral Science Foundation under Grant no. 2014M562153, Guangzhou Education Bureau Science Foundation under Grant no. 1201430560, the Key Program of NSFC-Guangdong Union Foundation under Grant no. U1135002, and the Key Program of Guangdong Technology Innovation under Grant no. CXZD1144.

References

Akyildiz

I. F.

Sankarasubramaniam

Cayirci

Wireless sensor networks: a survey

Computer Networks 2002 38 4 393 422

10.1016/s1389-1286(01)00302-4

2-s2.0-0037086890

Anastasi

Conti

Francesco

M. D.

Passarella

Energy conservation in wireless sensor networks: a survey

Ad Hoc Networks 2009 7 3 537 568

10.1016/j.adhoc.2008.06.003

2-s2.0-56449087483

Kulkarni

R. V.

Förster

Venayagamoorthy

G. K.

Computational intelligence in wireless sensor networks: a survey

IEEE Communications Surveys & Tutorials 2011 13 1 68 96

10.1109/surv.2011.040310.00002

2-s2.0-79951581342

Guizani

Secure and efficient data transmission for cluster-based wireless sensor networks

IEEE Transactions on Parallel and Distributed Systems 2014 25 3 750 761

10.1109/TPDS.2013.43

2-s2.0-84894526330

Younis

Krunz

Ramasubramanian

Node clustering in wireless sensor networks: recent developments and deployment challenges

IEEE Network 2006 20 3 20 25

10.1109/mnet.2006.1637928

2-s2.0-33745069352

Soro

Heinzelman

W. B.

Cluster head election techniques for coverage preservation in wireless sensor networks

Ad Hoc Networks 2009 7 5 955 972

10.1016/j.adhoc.2008.08.006

2-s2.0-60249083032

Xie

Jia

Transmission-efficient clustering method for wireless sensor networks using compressive sensing

IEEE Transactions on Parallel and Distributed Systems 2014 25 3 806 815

10.1109/TPDS.2013.90

2-s2.0-84894584290

Pudlewski

Prasanna

Melodia

Compressed-sensing-enabled video streaming for wireless multimedia sensor networks

IEEE Transactions on Mobile Computing 2012 11 6 1060 1072

10.1109/tmc.2011.175

2-s2.0-84860380810

Caione

Brunelli

Benini

Distributed compressive sampling for lifetime optimization in dense wireless sensor networks

IEEE Transactions on Industrial Informatics 2012 8 1 30 40

10.1109/TII.2011.2173500

2-s2.0-84856350415

10.

Han

Yin

Compressive Sensing for Wireless Networks 2013

Cambridge, UK

Cambridge University Press

11.

Kong

Xia

Liu

X.-Y.

Chen

M.-Y.

Liu

Data loss and reconstruction in wireless sensor networks

IEEE Transactions on Parallel and Distributed Systems 2013 25 11 2818 2828

10.1109/TPDS.2013.269

12.

Xing

Xie

Wang

Data gathering and processing for large-scale wireless sensor networks

Proceedings of the 9th IEEE International Conference on Mobile Ad-Hoc and Sensor Networks (MSN ′13)

December 2013

Dalian, China

354 358

10.1109/msn.2013.56

2-s2.0-84894138712

13.

Di Francesco

Das

S. K.

Anastasi

Data collection in wireless sensor networks with mobile elements: a survey

ACM Transactions on Sensor Networks 2011 8 1, article 7

10.1145/1993042.1993049

2-s2.0-80052978346

14.

Chou

C. T.

Rana

Energy efficient information collection in wireless sensor networks using adaptive compressive sensing

Proceedings of the IEEE 34th Conference on Local Computer Networks (LCN ′09)

October 2009

Zurich, China

443 450

10.1109/lcn.2009.5355162

2-s2.0-77951132841

15.

Zhu

L. M.

SEER: metropolitan-scale traffic perception based on lossy sensory data

Proceedings of the 28th Conference on Computer Communications (IEEE INFOCOM ′09)

April 2009

Rio de Janeiro, Brazil

217 225

10.1109/infcom.2009.5061924

2-s2.0-70349653488

16.

Xue

Carin

Bayesian compressive sensing

IEEE Transactions on Signal Processing 2008 56 6 2346 2356

10.1109/TSP.2007.914345

2-s2.0-44849087307

17.

Haupt

Bajwa

W. U.

Rabbat

Nowak

Compressed sensing for networked data

IEEE Signal Processing Magazine 2008 25 2 92 101

10.1109/msp.2007.914732

2-s2.0-41949106208

18.

Wang

Moscibroda

Efficient data gathering using compressed sparse functions

Proceedings of the 32nd IEEE Conference on Computer Communications (IEEE INFOCOM ′13)

April 2013

Turin, Italy

310 314

10.1109/infcom.2013.6566785

2-s2.0-84883113496

19.

Liu

Zhu

Tang

The data aggregation of wireless sensor networks based on compressed sensing and cluster

Journal of Computational Information Systems 2013 9 9 3399 3406

10.12733/jcis5798

2-s2.0-84878746956

20.

Heinzelman

W. B.

Chandrakasan

A. P.

Balakrishnan

An application-specific protocol architecture for wireless microsensor networks

IEEE Transactions on Wireless Communications 2002 1 4 660 670

10.1109/TWC.2002.804190

2-s2.0-33646589837

21.

Roughan

Zhang

Willinger

Qiu

Spatio-temporal compressive sensing and internet traffic matrices (Extended Version)

IEEE/ACM Transactions on Networking 2012 20 3 662 676

10.1109/TNET.2011.2169424

2-s2.0-84862515916

22.

Chen

Wen

Distributed clustering with directional antennas for wireless sensor networks

IEEE Sensors Journal 2013 13 6 2166 2180

10.1109/jsen.2013.2249659

23.

Wang

X. G.

Wang

Square region-based coverage and connectivity probability model in wireless sensor networks

Proceedings of the 5th International Conference on Collaborative Computing: Networking, Applications and Worksharing (CollaborateCom ′09)

November 2009

Washington, DC, USA

IEEE

1 8

10.4108/ICST.COLLABORATECOM2009.8335

24.

Yang

Zhang

D.-Y.

Zhang

Y.-Y.

Wang

Cluster-based data aggregation and transmission protocol for wireless sensor networks

Chinese Journal of Software 2010 21 5 1127 1137

10.3724/sp.j.1001.2010.03534

2-s2.0-77953276349

25.

Zhao

On maximizing the lifetime of wireless sensor networks using virtual backbone scheduling

IEEE Transactions on Parallel and Distributed Systems 2012 23 8 1528 1535

10.1109/tpds.2011.305

2-s2.0-84863510617

26.

Wang

Akyildiz

I. F.

Spatial correlation and mobility-aware traffic modeling for wireless sensor networks

IEEE/ACM Transactions on Networking 2011 19 6 1860 1873

10.1109/tnet.2011.2162340

2-s2.0-84655161856

27.

Ahmed

Natarajan

Rao

K. R.

Discrete cosine transform

IEEE Transactions on Computers 1974 23 1 90 93

10.1109/t-c.1974.223784

28.

Donoho

D. L.

Compressed sensing

IEEE Transactions on Information Theory 2006 52 4 1289 1306

10.1109/TIT.2006.871582

2-s2.0-33645712892

29.

Tropp

J. A.

Gilbert

A. C.

Signal recovery from random measurements via orthogonal matching pursuit

IEEE Transactions on Information Theory 2007 53 12 4655 4666

10.1109/TIT.2007.909108

ZBL1288.94022

2-s2.0-64649083745

30.

Blumensath

Davies

M. E.

Gradient pursuits

IEEE Transactions on Signal Processing 2008 56 6 2370 2382

10.1109/tsp.2007.916124

2-s2.0-44849136723

31.

OMNET++ Simulation, http://www.omnetpp.org/

32.

Intel Indoor Test Data, http://www.select.cs.cmu.edu/data/labapp3/index.html

33.

Zhu

Compressive sensing approach to urban traffic sensing

Proceedings of the 31st International Conference on Distributed Computing Systems (ICDCS ′11)

July 2011

Minneapolis, Minn, USA

889 898

10.1109/icdcs.2011.35

2-s2.0-80051895235