Zero-effort projection for sensory data reconstruction in wireless sensor networks

Abstract

Compressive sensing is a promising technique for data gathering in large-scale wireless sensor networks. Existing compressive sensing–based data gathering techniques still follow sampling than compression paradigm. In this article, we proposed a random sampling zero-encoding data gathering scheme for wireless sensor networks, which exploits virtual Gaussian energy diffusion model to obtain sampling and compression data gathering. Our proposed data gathering model not only can make simultaneous sampling and compression but also do not need to assign projection matrix to each sensor node. Our scheme can efficiently resolve two types of sensor networks’ data gathering problems: recover missing sensory data and extend monitoring field using incomplete random sampling. Extensive experimental results show that our proposed random sampling zero-encoding data gathering model has good performance for reconstructing the sensory data in wireless sensor networks.

Keywords

Wireless sensor networks compressive sensing Gaussian energy diffusion zero-encoding random sampling

Introduction

Wireless sensor networks (WSNs) are regarded as the bridge to connect human society and the physical world, which are widely deployed for monitoring and disseminating information about various phenomena of interests.^1,2 A WSN can contain large-scale sensor nodes, but it has its own features: (1) sensor node is resource-constrained such as energy, computation, storage, and bandwidth; (2) sensor networks are error-prone such as packet loss, transmission error, and abnormal reading; and (3) the sensory data of the same monitoring field often have strong spatial–temporal correlation. One of the major challenges for designing sensor networks is minimal samples and communication cost with obtaining fidelity information at sink.

In-network compression is an essential technique to reduce communication costs. Traditional compression techniques require sensor nodes with a strong computational power and need to exchange side information among sensor nodes such as transform-based compression and joint entropy compression, which are not suitable to data compression for WSNs. Compressive sensing (CS) is a new sampling and compression paradigm which is based on the fact that a relatively small number of linear combination of a compressible or sparse signal can contain most of its salient information. To the best of our knowledge, existing periodical compressive data gathering techniques based on CS still separate the process of sampling and compression.^3–6 Sampling than compression CS data gathering techniques would bring many problems. First, the sampling ratio is relatively high because too many sensor nodes should take part in one measurement gathering. Second, the data gathered could be easily damaged because the sensor network is an error-prone network. Third, the number of CS measurements is difficult to adaptive control because the sink lacks of trusted comparison sensory data.

In this article, we present a random sampling zero-encoding data gathering model to reconstruct the sensory data for WSNs, which aims to make compression and sampling simultaneously, and it is used for improving the robustness of CS measurements and reducing sampling rate. Our main contributions are as follows:

We presented a random sampling zero-encoding data gathering model based on virtual Gaussian energy diffusion model which can make simultaneous sampling and compression and need not to assign projection matrix to sensor nodes

We analyzed that the orthogonal Gaussian energy diffusion basis has good compression performance for spatially correlated signals. Meanwhile, we also proved that sampling matrix and orthogonal Gaussian energy diffusion basis satisfy restricted isometric property (RIP) condition with probability tending to 1

According to our proposed random sampling zero-encoding data gathering model, we proposed an efficient missing sensory data recovery scheme, which can reduce the number of sampling sensors significantly

The rest of this article is organized as follows. In section “Related works,” we present the related work. The foundations of CS are introduced in section “Basic of CS.” In section “Problem statement,” we give the problem statement and present random sampling zero-encoding data gathering model. In section “Random sampling zero-encoding sensory data reconstruction,” we propose two types of applications based on random sampling zero-encoding model in detail. Section “Experimental results” reports our experimental results, and the conclusions are given in section “Conclusion.”

Related works

Compressive data gathering

In recent years, many compressive data gathering techniques were proposed. D Baron et al.⁷ proposed distributed CS that enables new distributed coding algorithms for multi-signal ensembles exploiting both intra- and inter-signal correlation structures, and also gave three joint sparsity models. Rabbat and colleagues^8,9 applied CS theory to single-hop data gathering in WSNs to obtain efficient compression for network data. S Lee et al.^10,11 proposed joint optimization of transport cost and reconstruction for spatially localized CS in multi-hop sensor networks. Luo et al.^5,6 proposed compressive data gathering based on CS theory to effectively reduce communication costs and prolong network lifetime in large-scale monitoring sensor networks. In the previous studies,^3,4,12 the authors extended CS data gathering to dual-layer compressed aggregation and adapted the number of measurements during the data gathering. To the best of our knowledge, existing researches still separate the processes of sampling and compression. Sampling than compression compressive data gathering techniques would bring high data transmission cost.

Incomplete sensory data recovery

There exist two groups of incomplete sensory data recovery, which recover the missing sensory data using correctly received sensory data and reconstruct the monitoring field sensory data using incomplete sampling. The missing sensory data could be obtained by retransmission techniques or reconstructed by spatial interpolation techniques.¹³ But retransmission would increase network burden which can lead to more packets loss and more energy consumption. To reconstruct the entire monitoring field using incomplete sampling, spatial correlation interpolation or transform domain interpolation can be done to reconstruct the missing samples.^3,14–16 Sheikhhasan¹⁴ and Umer et al.¹⁵ presented distance-weighted interpolation technique to reconstruct the missing samples–based spatial correlation of the sensory data. Guo et al.¹⁶ proposed sparsity-based spatial interpolation algorithm via solving the $ℓ_{1}$ norm minimization to recover the missing samples in the spatial domain. Xiang et al.³ applied matrix completion scheme to extend the reconstructed sensory data to the missing samples. The literature^3,16 exploited the thought of CS to reconstruct sensory data. Interpolation technique is a data-smoothing operation and makes the reconstructed data become blurred.

Basic of CS

CS is a new kind of compression and sampling paradigm. It asserts that a small number of linear projections of sparse or compressible signals can contain sufficient information for reconstruction of the signals.^17–19 While Shannon–Nyquist sampling theory stated that the sampling rate must be at least twice the maximum frequency to avoid losing information when capturing a signal, CS theory breaks through the bottleneck of Shannon–Nyquist’s sampling theory for sparse or compressible signal and makes simultaneous sampling and compression possible.

We assume that $x = [x_{1}, x_{2}, \dots, x_{N}]^{T}$ is the sparse or compressible signal, where $x \in R^{N}$ , $x_{i} \in R (i = 1, 2, \dots, N)$ , $Ψ = [ψ_{1}, ψ_{2}, \dots, ψ_{N}]$ are orthogonal sparse bases of $x$ ; $ψ_{i}$ is the $i th$ column of $Ψ$ ; and T denotes transposition. Then, the signal $x$ can be expressed as equation (1)

x = Ψ s = \sum_{i = 1}^{N} s_{i} ψ_{i}

(1)

where $s$ is the coefficient vector corresponding to sparse basis $Ψ$ , that is, $s_{i} = < x, ψ_{i} >$ and <·> are the inner products. If the vector $s$ has only k nonzero coefficients or $(n - k)$ smallest coefficients can be ignored, $s$ is called k-sparse. The conventional compression technique is to store or transmit the k-largest coefficients and their corresponding indices in order to achieve the goal of compression. CS theory exploits signal projections to enable sampling and compression simultaneously. It assumes the projection matrix is an $M \times N (M << N)$ matrix $Φ = [ϕ_{1}, ϕ_{2}, \dots, ϕ_{N}]$ , where $ϕ_{i}$ is the $i th$ column of $Φ$ , and the measurements of $x$ can be expressed as equation (2)

y = Φ x = Φ Ψ s = Θ s

(2)

where $y = [y_{1}, y_{2}, \dots, y_{M}]^{T}$ is the measurement vector and $Θ = Φ Ψ$ is the measurement matrix. To recover the encoding signals, two problems need to be solved: (1) How to design the measurement matrix $Θ$ such that the salient information in any k-sparse or compressible signal can be obtained? (2) How to design the reconstruction algorithm to recover $x$ from M measurements? For the first problem, the measurement matrix $Θ$ must satisfy the RIP.¹⁸

Definition 1 (RIP)

Suppose $Θ$ be an $M \times N$ measurement matrix and let $Θ_{T}$ be the $M \times | T |$ submatrix obtained by extracting the columns of $Θ$ corresponding to the indices in T ( $T \subset {1, \dots, N}$ , $| T | \leq k$ , and |·| denote cardinality). Define the k-restricted isometric constant $δ_{k}$ for matrix $Θ$ as the smallest constant that satisfies the following

(1 - δ_{k}) ‖ x ‖_{2}^{2} \leq ‖ Θ_{T} x ‖_{2}^{2} \leq (1 + δ_{k}) ‖ x ‖_{2}^{2}

(3)

for all k-sparse vectors $x \in R^{N}$ for sufficiently small values of $δ_{k}$ .

For the second problem, the reconstruction $x$ can be calculated by $x = Ψ s$ , where $s$ is the solution to the following equation

\hat{s} = \underset{s}{\arg min} ‖ s ‖_{1} s . t . y = Θ s

(4)

If the measurement vector $y$ contains noise, then the reconstructed signal can be expressed as

\hat{s} = \underset{s}{\arg \min} ‖ s ‖_{1} s . t . ‖ Θ s - y ‖_{2}^{2} \leq ε

(5)

where $ε$ bounds amount of noise in the data. The problem (4) is a convex optimization problem that conveniently reduces to a linear program. The problem (5) is a quadratic cone programming problem. There already exist many efficient algorithms to solve problems (4) and (5) such as orthogonal matching pursuit (OMP)²⁰ and compressive sampling matching pursuit (CoSaMP) algorithms.²¹

From the framework of CS theory, we can see that CS codec scheme shifts the complexity from encoder to decoder and makes encoder become very simple.

Problem statement

In this article, we focus on periodical compressive data gathering in large-scale WSNs. According to CS theory, compressible signal recovery process is to first recover its corresponding sparse signal and then recover the original signal by inverse sparse basis transformation. Why don’t we imagine the spatially correlated signal as the energy diffusion of multiple virtual sources? If it can be, the process of spatially correlated signal recovery is equivalent to identify the locations and amplitudes of the corresponding virtual sources. What is more, if the energy diffusion model of virtual source satisfies a certain property, each sample value can be considered as a CS measurement. If the above conditions hold, then every sensory value is both sampled value and compressed value which achieves sampling and compression simultaneously.

In this section, we shall establish an energy diffusion model to meet the above conditions and to prove that a single sample value can be considered as a CS measurement.

Basic assumption

To simplify the problem statement, we make the following reasonable assumptions for periodical compressive data gathering in WSNs:

The monitoring field contains M sensors and a sink. Each sensor samples a real value periodically, and the sink is responsible for gathering and recovering the sensory data

All sensors sample once in a given periodical time interval and each periodical time is called a round data gathering

The monitoring field is partitioned into N grid cells which indicate the resolution offered by the sensor network. The distance between neighboring grid cells is considered as one unit. M sensor nodes are randomly deployed in N grid cells, and each grid cell contains at most one sensor node

For each round data gathering, the monitoring data of N grid cells are denoted by $x = [x_{1}, x_{2}, \dots, x_{N}]^{T}$ , and M samples are denoted by $x_{r} = [x_{r_{1}}, x_{r_{2}}, \dots, x_{r_{M}}]^{T}$

Problem formulation

To transform spatially correlated signals to energy diffusion sources, we need to establish an energy diffusion model which does not need to meet the real energy diffusion model because energy sources are virtual. For the same spatially correlated signal, when different energy diffusion models are selected, it means different energy sources distribute in the monitoring field. Without loss of generality, we define the energy diffusion model as Gaussian model

E_{ij} = \frac{1}{\sqrt{2 π} σ} e^{\frac{- d_{ij}^{2}}{2 σ^{2}}}

(6)

where $E_{ij}$ represents the energy amplitude in the i grid cell due to unit energy source in the grid j, $d_{ij}$ is the distance between the $i th$ grid cell and the $j th$ grid cell, and $σ$ is a constant which is an energy attenuation control parameter. We also assume that the energy amplitude of each grid cell is linear superposition of all energy sources in the monitoring field. According to the above assumptions, the unit energy diffusion matrix G with N grid cells can be expressed as

G = \frac{1}{\sqrt{2 π}} [\begin{matrix} e^{\frac{- d_{11}^{2}}{2 σ^{2}}} & e^{\frac{- d_{12}^{2}}{2 σ^{2}}} & \dots & e^{\frac{- d_{1 N}^{2}}{2 σ^{2}}} \\ e^{\frac{- d_{21}^{2}}{2 σ^{2}}} & e^{\frac{- d_{22}^{2}}{2 σ^{2}}} & \dots & e^{\frac{- d_{2 N}^{2}}{2 σ^{2}}} \\ ⋮ & ⋮ & ⋮ & ⋮ \\ e^{\frac{- d_{N 1}^{2}}{2 σ^{2}}} & e^{\frac{- d_{N 2}^{2}}{2 σ^{2}}} & \dots & e^{\frac{- d_{NN}^{2}}{2 σ^{2}}} \end{matrix}]

(7)

We denote Gaussian energy sources by $s = [s_{1}, s_{2}, \dots, s_{N}]^{T}$ corresponding to N grid cells. If $s_{i} = 0 (i = 1, 2, \dots, N)$ , it represents the $i th$ grid cell without energy source. Otherwise, $s_{i}$ represents the amplitude of energy source at the $i th$ grid cell which can be either positive or negative because energy source is virtual. The sensory data of the entire grids can be expressed as

x = Gs

(8)

Since there exist N grid cells and M sensors in the monitoring field, M samples can be expressed as

x_{r} = Φ x

(9)

where $Φ$ is an $M \times N$ matrix called sampling matrix, and $Φ_{ij}$ is defined as

Φ_{ij} = {\begin{matrix} \begin{matrix} 1 \\ 0 \end{matrix} & \begin{matrix} j = r_{i} \\ otherwise \end{matrix} \end{matrix} i = 1, 2, \dots, M; j = 1, 2, \dots, N

where $r_{i}$ represents the grid index of the $i th$ sensor nodes. In each row of $Φ$ , only one entry is equal to $1$ , and the other entries are all equal to $0$ . $G$ is a symmetrical square matrix based on its definition, so it can be orthogonalized. The orthogonalized matrix of $G$ is denoted by $Ψ$ , then $G$ can be expressed as $G = Ψ P$ ( $P$ is an $N \times N$ nonsingular matrix). Based on the above symbols, $x$ can be expressed as

x = Gs = Ψ Ps = Ψ u

(10)

where $u = Ps$ , and $x_{r}$ can be expressed as

x_{r} = Φ x = Φ Gs = Φ Ψ Ps = Θ u

(11)

If we can exploit $x_{r}$ as the measurements to recover $x$ based on CS theory, and the recovery process is expressed as

\hat{u} = \underset{u}{\arg min} ‖ u ‖_{1} s . t . ‖ Θ u - x_{r} ‖_{2}^{2} \leq ε

(12)

and

\hat{x} = Ψ^{- 1} \hat{u}

where $ε$ bounds the amount of noise in $x_{r}$ , $\hat{u}$ and $\hat{x}$ are the recovery signals corresponding to $u$ and $x$ , respectively. Then, the following two conditions should be satisfied: (1) $u$ should be satisfactory sparsity and (2) $Θ$ should be obeyed RIP. In the other words, $Ψ$ and $Φ Ψ$ must satisfy sparse basis property and RIP, respectively.

Does $u$ satisfy sparsity?

In this subsection, we display that $u$ satisfies the common sparse property. Although we cannot prove that $u$ is sparse for spatially correlated signal $x$ , we can illustrate that $u$ has good sparse performance by the statistics characteristics of $P$ and shall give the compression performance comparison with discrete cosine transform (DCT) basis and wavelet basis in the experimental section.

For any spatially correlated signal $x$ , the coefficients under orthogonalization energy diffusion basis can be expressed as $u = Ps (P = Ψ^{- 1} G)$ . The element mean and element variance of every row of $P$ with $σ^{2} = 1$ and $N = 64, 256, and 1024$ , are shown in Figure 1, which illustrates that the element mean of every row of $P$ is all almost zero, and nonzero means are all concentrated in the front rows of $P$ . Meanwhile, the element variance of rows is nonzero only in a small part which is also concentrated in the front rows of $P$ . It means the majority of row elements of $P$ are close to zero. If the signal $s$ has a certain structure, then the $u$ can be considered as sparse signal based on the statistic characteristic of $P$ .

Figure 1.

The element mean and element variance of every row of $P$ with $σ^{2} = 1$ and different N values: (a) and (b) are the element mean and element variance of every row of $P$ with $N = 64$ ; (c) and (d) are the element mean and element variance of every row of $P$ with $N = 256$ ; and (e) and (f) are the element mean and element variance of every row of $P$ with $N = 1024$ .

Figure 2 shows two types of signals compression results. In Figure 2(a), the signal comes from a block pixel value of “Lena” which can be considered as strong spatially correlated signal. In Figure 2(c), the signal comes from GreenOrbs²² within the same round data gathering and sorted by sensor node Mote_ID which can be considered as weak spatially correlated signal. Figure 2(b) and (d) illustrates that the energy of transformed coefficients under orthogonal Gaussian energy diffusion basis are mainly concentrated in a few elements. Based on the above analysis, we can consider that the orthogonal energy diffusion basis has compression function for spatially correlated signal.

Figure 2.

The transformed coefficient signal $u$ under different real signals: (a) 16 × 16 block pixels of “Lena,” (b) transformed coefficients of (a), (c) 256 sensory data from GreenOrbs, and (d) transformed coefficients of (c).

Does $Θ$ obey RIP?

In this subsection, we first present the statistical properties of $Ψ$ and then prove that $Θ$ satisfies RIP with high probability. The element mean and element variance of every row of $Ψ$ with $N = 64, 256, and 1024$ are shown in Figure 3, which illustrates that the element mean and element variance of rows of $Ψ$ are almost all zero and one, respectively, with different values of N. Based on the law of large numbers, we can consider that every row of $Ψ$ obeys the same distribution whose mean is zero and variance is one. Actually, we can consider the elements of each row of $Ψ$ as a random sequence generated by a single random variable, then $Ψ$ is generated by N random variables which are denoted by $ξ_{1}, ξ_{2}, \dots, ξ_{N}$ . The random variables $ξ_{1}, ξ_{2}, \dots, ξ_{N}$ have the same distribution, and $E (ξ_{i}) = 0$ , $Var (ξ_{i}) = E (ξ_{i}^{2}) = 1 (i = 1, 2, \dots, N)$ . To simply express, we denote $ξ_{1}, ξ_{2}, \dots, ξ_{N}$ as $ξ_{x} = [ξ_{1}, ξ_{2}, \dots, ξ_{N}]^{T}$ .

Figure 3.

The element mean and element variance of every row of $Ψ$ with $σ^{2} = 1$ and different N values: (a) and (b) are the element mean and element variance of every row of $P$ with $N = 64$ , (c) and (d) are the element mean and element variance of every row of $P$ with $N = 256$ , and (e) and (f) are the element mean and element variance of every row of $P$ with $N = 1024$ .

In the following part, we shall prove that $Θ$ satisfies RIP based on the statistical properties. Since $Θ = Φ Ψ$ and the nonzero element of each row of $Φ$ is independent of each other, then each row of $Θ$ can be considered as independent randomly selected from $Ψ$ . Meanwhile, $Θ$ can be considered as generated by independent and identically distributed (i.i.d.) random variables $ξ_{r_{1}}, ξ_{r_{2}}, \dots, ξ_{r_{M}}$ . Before the proof, we give the definition of sub-Gaussian and related Corollary to convenience our proof.

Definition 2 (sub-Gaussian)

A random variable $ξ$ is called sub-Gaussian if there exists a constant $c > 0$ such that²³

E (e^{λ ξ}) \leq e^{c^{2} λ^{2} / 2}

(13)

holds for all $λ \in R$ . We use the notation $ξ ~ Sub (c^{2})$ to denote that $ξ$ satisfies the above inequality.

Corollary 1

If the $i th$ row of $Θ$ is considered as a generated sequence by random variable $ξ_{r_{i}} (i = 1, 2, \dots, M)$ , then random variable $ξ_{r_{i}} ~ Sub (2)$ .²⁴

Theorem 1

Fix $δ \in (0, 1)$ , and each row of $Θ$ satisfies $Sub (2)$ , if the number $M = O (k \log (N / k))$ , then the probability of $Θ$ satisfies²⁴

(1 - δ) \leq \frac{‖ Θ v ‖_{2}^{2}}{‖ v ‖_{2}^{2}} \leq (1 + δ)

(14)

for all N-dimensional k-sparse signal $v$ , it tends to 1.

According to Theorem 1, we know that $Ψ$ can sparsify our sensory data and satisfy the RIP with $Φ$ as well.

Random sampling zero-encoding sensory data reconstruction

In the above section, we have analyzed that each sample can be considered as a CS measurement under Gaussian energy diffusion model. In this section, we present a random sampling zero-encoding data gathering scheme according to the above theory discussion. Our proposed compressive data gathering model can be applied to two types of practical applications: (1) recover the missing sensory data due to packet loss, transmission error, abnormal reading, and so on and (2) extend the monitoring field using incomplete random sampling.

Efficiently recovering missing sensory data

Because sensor network is an error-prone network, packet loss, transmission error, and abnormal reading are very common phenomena especially for large-scale WSNs. First, we analyzed sensory data of two real sensory systems to illustrate the statistical results of missing sensory data. We selected 45 rounds sensory data of GreenOrbs²² which contain 330 sensor nodes and Intel Berkeley Research lab²⁵ which contain 54 sensor nodes, respectively. In this subsection, the missing sensory data only refer to lost packet and abnormal reading. Figure 4(a) displays continuous 45 rounds missing sensory data ratio of GreenOrbs from 18 December 2010, 00:00 with 10-min interval for each round.Figure 4(b) shows continuous 45 rounds missing sensory data ratio of Intel Berkeley Research lab from the epoch 6486 to 6528. Figure 4 illustrates that the missing sensory data are very common phenomena which can reach more than 50% such as Intel Lab Data. According to Figure 4, we can know that the missing sensory data seriously affect the overall monitoring results. Retransmission techniques, however, are commonly used to handle packet loss phenomenon. It can effectively resolve the packet loss when the load of corresponding sensor node and communication link is lighter. Otherwise, packet retransmission could lead to more packet loss.

Figure 4.

The missing sensory data ratio of two real sensor networks: (a) continuous 45 rounds sensory data ratio comes from GreenOrbs data with 10-min interval for each round and (b) continuous 45 rounds sensory data ratio comes from Intel Lab Data from the epoch of 6848 to 6528.

In order to recover the missing sensory data, we designed a post-processing sensory data recovery scheme based on our proposed random sampling zero-encoding data gathering model. Moreover, the lost packet cannot be retransmitted in our sensory data recovery scheme which can reduce the load of sensor networks. Our proposed missing sensory data recovery scheme includes the following steps:

Partition the monitoring field into N grid cells and map the sensors to grid cells;

Record the missing sensory data;

Reconstruct all sensory data based on our proposed random sampling zero-encoding data gathering scheme using correctly received sensory data;

Extract the missing sensory data from the reconstructed N sensory data

In this data recovery scheme, the number of grid cells N is equal to the number of sampling sensors M, and each grid cell has a sensor. We assume that the sink obtains $M'$ correct sensory data in each round data gathering. To map the sampling sensors to the grid cells, if each sensor has localization function, we can sort the sensory data based on their location information which keeps original spatial correlation. Otherwise, we can sort the sensors by reshuffled techniques to improve the spatial correlation of sensory data. For recording the missing sensory data, the abnormal sensory data should be removed except for lost sensory data because the abnormal sensory data is equal to the lost sensory data. When the sink gathers the sensory data over, the recovery process would be carried out. We also assume that the correctly received sensory data and the missing sensory data are denoted by $x_{c}$ and $x_{m}$ , respectively. The detailed algorithm of recovering the missing sensory data (root-mean-square deviation (RMSD)) is described in Algorithm 1.

Algorithm 1 function

RMSD (Ψ, x_{c})

Input:

• N × N-dimensional orthogonal Gaussian energy diffusion basis

Ψ

;

• M′-dimensional correctly received sensory data

x_{c}

Output:

• (N−M′)-dimensional recovery sensory data

x_{m}

Φ \leftarrow 0

; //Initializing sampling matrix

Ω_{c} \leftarrow {i | x_{i} is correct received}

//record sampling sensor index

Ω_{m} \leftarrow {1, 2, \dots, N} \ Ω_{c}

//record missing sensor index

j = 1;

for

i = 1

to Ndo

i \in Ω_{c}

then

Φ_{ji} = 1;

j = j + 1;

end if

end for

Θ = Φ * Ψ

; //generate the sensing matrix

\hat{u} = \underset{u}{\arg min} ∥ u ∥_{1} s . t . ∥ Θ u - x_{r} ∥_{2}^{2} \leq ε

; //using CoSaMP algorithm recover the sensory data.

\hat{x} = Ψ^{- 1} \hat{u}

;

{\hat{x}}_{m} = {\hat{x}}_{Ω_{m}}

;

return

{\hat{x}}_{m}

;

If the missing sensory data can be considered as randomly distributed in all sensory data, then the correctly received sensory data can also be considered as random distribution. We can use $ℓ_{1}$ norm minimization program to recover the missing sensory data. The process of recovery can be expressed as

\hat{u} = \underset{u}{\arg min} ‖ u ‖_{1} s . t . ‖ Θ u - x_{r} ‖_{2}^{2} \leq ε

(15)

and

\hat{x} = Ψ^{- 1} \hat{u}

where $ε$ is the error bounded, $\hat{u}$ represents the recovered sparse signal, and $\hat{x}$ represents all the recovered sensory data which can also be expressed as

\hat{x} = {\hat{x}}_{r} + {\hat{x}}_{m}

(16)

where ${\hat{x}}_{m}$ is the recovered missing sensory data.

If the missing sensory data cannot be considered as random distribution such as consecutive sensory data loss corresponding to spatial block sensory field, we can transform this case into randomly distributed by reshuffling techniques and also can improve the compressibility of the original sensory data. In the following, we give the sorted method based on Gaussian energy diffusion model which is called Gaussian sort (GS). Figure 5 shows the GS order of N sensory data. Due to rearrangement of the order of the original sensory data, it can also be seen as random distribution even if it is a continuous block of sensory data loss. In the experimental section, we shall give the GS order which can also significantly improve the compressibility of sensory data.

Figure 5.

Gaussian energy diffusion order—GS order.

Extend monitoring field using incomplete random sampling

Through a limited number of sensor nodes to obtain a larger monitoring data is also an important research issue for sensor networks. For example, to obtain the entire monitoring field sensory data as an “image,” if each “pixel” of the monitoring field should be deployed a sensor which requires deploying too many sensors. It may be impossible in sometimes considering from the scale of sensor network and sensor node deployment. In this subsection, we proposed an extension monitoring field scheme using incomplete random sampling based on our proposed random sampling zero-encoding data gathering model.

To implement our proposed scheme, we assume that the monitoring field is two-dimensional (2D) plane, and the sink needs know the location of each sensor nodes. To simplify the reconstruction scheme, we also assume that the WSN is ideal and the missing sensory data cannot occur. The process of extending the monitoring field using incomplete random sampling contains the following steps:

Divide the entire monitoring field into N grid cells which must be based on spatial location information;

Assign the M sensor nodes to the N grid cells;

The sink gathers M sensory data within each given a periodical time interval;

The sink reconstructs the monitoring data of N grid cells based on our proposed random sampling zero-encoding data gathering model after each round data gathering

For step 1, we first must divide the entire monitoring field into N grid cells based on spatial location information which represents the resolution of the monitoring field, and M sensor nodes are independent randomly deployed in the entire monitoring field and each grid only can contain one sensor node. For step 2, M sensor nodes are mapped into N grid cells based on their locations. In step 3, each sensor node samples once in a given periodical time interval and sends its sensory data to the sink. Step 4 would be carried out after data gathering. We assume that the data of the M samples and N grid cells are denoted by $x_{r}$ and $x$ , respectively. Then, the reconstructed process can be expressed as

\hat{u} = \underset{u}{\arg min} ‖ u ‖_{1} s . t . ‖ Θ u - x_{r} ‖_{2}^{2} \leq ε

(17)

and

\hat{x} = Ψ^{- 1} \hat{u}

where $ε$ is the error bounded, $\hat{u}$ represents the recovered sparse signal, and $\hat{x}$ represents the recovered sensory data of N grid cells.

Experimental results

In this section, we conduct extensive experiments to evaluate the performance of our proposed zero-encoding sensory data gathering model, which contains two aspects: the performance of missing data recovery and the performance of extending the monitoring field using incomplete sensory data. Before the experiments, we give the experimental data sets and the performance metrics. To obtain a variety of real data experimental results, we come up with four real different ways to generate the real data sets for our experiments:

Intel Lab Data. Real temperature data sets obtained from 54 sensor nodes network which deployed in the Intel Berkeley Research lab.²⁵

GreenOrbs data. Real temperature data set obtained from a forest monitoring WSN which contains 330 sensor nodes.²²

Weather temperature data. temperature distribution data provided by WorldClim.²⁶

In our experiments, we use the CoSaMP²¹ algorithm to solve $ℓ_{1}$ norm minimization program. To evaluate the performance of compression, we use sparsity as the metric, namely, $| | s | |_{0}$ presents the sparsity for signal $s$ . We adopt the mean absolute error (MAE) as the signal recovery performance metric, which is defined as

MAE = \frac{{‖ x - \hat{x} ‖}_{2}}{{‖ x ‖}_{2}}

(18)

where $x$ is the original signal, and $\hat{x}$ is the recovery signal.

Real sensed data recovery

To evaluate the performance of our proposed missing sensory data recovery scheme, the experimental data sets are selected from the Intel Lab Data²⁵ and GreenOrbs data.²² CS technique cannot directly apply to single round data gathering because the Intel Lab Data only contains 54 sensor nodes. In our experiment, we exploit multi-round sensory data as single signal to carry out our proposed missing data recovery scheme. In the following, we carry out our experiment from three types of sensory data under Mote_ID order and GS order, and the three types of sensory data are as follows:

Continuous six rounds temperature sensory data of epoch 6520–6525 come from Intel Lab Data which contains 208 correctly received sensory data and 116 missing sensory data

A round temperature sensory data of 19 December 2010, 14:30–14:40 come from GreenOrbs data which contains 274 correctly received sensory data and 40 missing sensory data

Randomly selected missing sensory data from 256 correctly received temperature sensory data of 19 December 2010, 14:30–14:40 of GreenOrbs data under the same round

Figure 6 shows the original continuous six rounds temperature sensory data of Intel Lab Data and its recovery sensory data under Mote_ID order and GS order. In Figure 6(a), there are 324 temperature sensory data which contain 208 correctly received sensory data and 116 missing sensory data; the missing temperature sensory value is set to $10^{\circ} C$ . Figure 6(b) and (c) shows the recovery sensory data under Mote_ID order and GS order, respectively. Figure 6(d) shows the correctly received sensory data, and Figure 6(e) and (f) represents the recovery sensory data under Mote_ID order and GS order corresponding to the correctly received sensory data. Figure 6(e) illustrates that the recovery sensory data are basically close to the correctly received sensory data, and the MAE is equal to 0.0328 under the 27.9% missing ratio. The main reason is that the data have good spatial correlation. While the recovery performance can be significantly improved using GS order, the MAE is equal to 0.0081. This can illustrate that the GS order sensory data can improve the compressibility of sensory data under orthogonal Gaussian energy diffusion basis.

Figure 6.

(a) Continuous six rounds sensory data come from Intel Lab Data sorted by Mote_ID; (b) and (c) are the recovery sensory data using Mote_ID order and GS order, respectively; (d) 208 correctly received sensory data; and (e) and (f) are the recovery sensory data corresponding to the correctly received sensory data using Mote_ID order and GS order, respectively.

Because the Mote_ID order sensory data of the Intel Lab Data are considered as strong spatial correlation signal, it achieved good recovery performance. In the following, we shall exploit GreenOrbs data to evaluate our proposed missing sensory data recovery scheme. Figure 7 shows the round temperature sensory data of GreenOrbs and its recovery sensory data under Mote_ID order and GS order. In Figure 7(a), there are 274 correctly received sensory data and 40 missing sensory data which are denoted by $0^{\circ} C$ . Figure 7(b) and (c) shows the recovery sensory data under Mote_ID order and GS order, respectively. Figure 7(d) shows the correctly received sensory data, and Figure 7(e) and (f) represents the recovery sensory data under Mote_ID order and GS order, respectively, corresponding to the correctly received sensory data. Figure 7(e) illustrates that the recovery performance is not good, and the MAE is equal to 0.1306. However, the MAE is equal to 0.0504 under GS order as shown in Figure 7(f).

Figure 7.

(a) A round sensory data come from GreenOrbs data sorted by Mote_ID; (b) and (c) are the recovery sensory data using Mote_ID order and GS order, respectively; (d) 284 correctly received sensory data; and (e) and (f) are the recovery sensory data corresponding to the correctly received sensory data using Mote_ID order and GS order, respectively.

In order to better display the comparison results between the original sensory data and the recovered sensory data, we randomly selected missing sensory data from 256 correctly received sensory data of GreenOrbs data set to evaluate our proposed missing data recovery scheme. Figure 8 illustrates that the recovery performance of GS order significantly outperforms Mote_ID order under different number of CS measurements. Actually, our proposed random sampling zero-encoding data gathering model can not only be used for missing sensory data recovery but also be used to reduce the samples for WSN data gathering, namely, we can randomly select a part of sensors to take part in sampling during each round data gathering.

Figure 8.

The MAE comparison between Mote_ID order and GS order of sensory data under different number of measurements.

Meteorological data reconstruction

In this subsection, we evaluate the performance of our proposed extension of the monitoring field using incomplete random sampling over a set of temperature distribution data provided by WorldClim.²⁶ The temperature data set partitions the global surface into $4320 \times 1800$ grids,²⁷ and a snapshot mean monthly surface temperature in September over global land areas, excluding Antarctica, is shown in Figure 9. We use $64 \times 64$ grid temperatures to validate our proposed extension of the monitoring field using incomplete random sampling scheme. The $64 \times 64$ grids’ mean monthly surface temperature in September is selected from 3200 to 4263 rows and 500 to 563 columns which are shown in Figure 10(a). In Figure 10, the displayed temperature is 10 times of real temperature.

Figure 9.

A snapshot of mean monthly surface temperature in September over global land areas, excluding Antarctica.

Figure 10.

(a) Original temperature data and (b), (c), (d), (e), and (f) are the reconstructed temperature data corresponding to $M = 120, 140, 160, 180, and 200$ and $k = 33, 35, 38, 53, and 55$ , respectively.

The $64 \times 64$ grids can be considered as a monitoring field. To obtain the entire monitoring field temperature, our proposed incomplete random sampling scheme can be used to reconstruct it. If $64 \times 64$ sensory data are considered as a single signal to reconstruct using $ℓ_{1}$ norm minimization program, the computation complexity would be very high. We divided the monitoring field into many suitable sub-blocks which contain $16 \times 16$ grids. Our proposed incomplete random sampling scheme is used for each block to reconstruct the monitoring data. Different number of samples are considered as CS measurements to reconstruct the monitoring data, as shown in Figure 10. Respectively, 120, 140, 160, 180, and 200 samples of each sub-block are used to reconstruct the monitoring data corresponding to Figure 10(b)–(f). During the process of reconstruction, we denoted the number of samples and the sparsity of reconstructed signal by M and k, respectively. From Figure 10(b), we can see that the reconstruct data basically reflect the monitoring data distribution of the whole field. In Figure 10(f), the monitoring data are reconstructed perfectly. Meanwhile, Figure 11 shows the reconstructed errors corresponding to Figure 10(b)–(f).

Figure 11.

The reconstructed errors corresponding to Figure 10(b)–(f).

Conclusion

This article investigates the problem of sensory data reconstruction in WSNs based on CS. We first proposed a random sampling compressive data gathering model based on virtual Gaussian energy diffusion model. Then, we analyzed that orthogonal Gaussian energy diffusion basis has good compression function and proved that the product of sampling matrix and orthogonal Gaussian energy diffusion basis satisfies RIP of CS. Our proposed random sampling compressive data gathering model makes simultaneous sampling and compression be possible, which does not want to assign projection matrix to sensor nodes. The experimental results show that our proposed random sampling zero-encoding data gathering model has good performance.

Footnotes

Academic Editor: Weifa Liang

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The work presented in this paper was supported by the National Natural Science Foundation of China (grant nos 61303209, 61402009, 61302179, and 61572366) and the research funds of the university (grant nos WXZR1524 and WXZR1503).

References

Chong

Kumar

SP.

Sensor networks: evolution, opportunities, and challenges. Proc IEEE 2003; 91: 1247–1256.

Yick

Mukherjee

Ghosal

Wireless sensor network survey. Comput Netw 2008; 52: 2292–2330.

Xiang

Luo

Deng

. Dual-level compressed aggregation: recovering fields of physical quantities from incomplete sensory data. 2011, https://arxiv.org/abs/1107.4873

Wang

Tang

Yin

. Data gathering in wireless sensor networks through intelligent compressive sensing. In: Proceedings IEEE INFOCOM, 2012, Orlando, FL, 25–30 March 2012, pp.603–611. New York: IEEE.

Luo

Sun

. Efficient measurement generation and pervasive sparsity for compressive data gathering. IEEE T Wirel Commun 2010; 9: 3728–3738.

Luo

Sun

. Compressive data gathering for large-scale wireless sensor networks. In: Proceedings of the 15th annual international conference on mobile computing and networking, Beijing, 20–25 September 2009, pp.145–156. New York: ACM.

Baron

Duarte

Wakin

. Distributed compressive sensing. 2009, https://arxiv.org/pdf/0901.3403.pdf

Rabbat

Haupt

Singh

. Decentralized compression and predistribution via randomized gossiping. In: Proceedings of the 5th international conference on information processing in sensor networks, Nashville, TN, 19–21 April 2006, pp.51–59. New York: ACM.

Haupt

Bajwa

Rabbat

. Compressed sensing for networked data. IEEE Signal Proc Mag 2008; 25: 92–101.

10.

Lee

Pattem

Sathiamoorthy

. Compressed sensing and routing in multi-hop networks. Technical report, University of Southern California, Los Angeles, CA, 2009.

11.

Lee

Ortega

. Joint optimization of transport cost and reconstruction for spatially-localized compressed sensing in multi-hop sensor networks. In: Proceedings of the Asia Pacific signal and information processing association summit (APSIPA), Singapore, 14–17 December 2010.

12.

Xiang

Luo

Vasilakos

. Compressed data aggregation for energy efficient wireless sensor networks. In: Proceedings of the 2011 8th annual IEEE communications society conference on sensor, mesh and ad hoc communications and networks (SECON), Salt Lake City, UT, 27–30 June 2011, pp.46–54. New York: IEEE.

13.

Lee

Jung

IB.

Speedy routing recovery protocol for large failure tolerance in wireless sensor networks. Sensors 2010; 10: 3389–3410.

14.

Sheikhhasan

A comparison of interpolation techniques for spatial data prediction. 2006, https://staff.fnwi.uva.nl/a.s.z.belloum/MSctheses/Hamzeh_C.pdf

15.

Umer

Kulik

Tanin

Kriging for localized spatial interpolation in sensor networks. In: B

Ludäscher

Mamoulis

(eds) Scientific and statistical database management. Berlin, Heidelberg: Springer, 2008, pp.525–532.

16.

Guo

Huang

. Sparsity-based spatial interpolation in wireless sensor networks. Sensors 2011; 11: 2385–2407.

17.

Candès

. Compressive sampling. In: Proceedings of the international congress of mathematicians, Madrid, 22–30 August 2006, pp.1433–1452. Zurich: European Mathematical Society.

18.

Donoho

DL.

Compressed sensing. IEEE T Inform Theory 2006; 52: 1289–1306.

19.

Baraniuk

RG.

Compressive sensing. IEEE Signal Proc Mag 2007; 24: 118–121.

20.

Tropp

Gilbert

AC.

Signal recovery from random measurements via orthogonal matching pursuit. IEEE T Inform Theory 2007; 53: 4655–4666.

21.

Needell

Tropp

JA.

Cosamp: iterative signal recovery from incomplete and inaccurate samples. Appl Comput Harmon A 2009; 26: 301–321.

22.

GreenOrbs, http://www.greenorbs.org/all/greenorbs.htm

23.

Buldygin

Kozachenko

IUV

. Metric characterization of random variables and random processes, vol. 188. Providence, RI: American Mathematical Society, 2000.

24.

Yang