Research on Reliability-Oriented Data Fusaggregation Algorithm in Large-Scale Probabilistic Wireless Sensor Networks

Abstract

A lot of facts show that many researches just place emphasis on data aggregation or data fusion, which is not beneficial to analyze the sensed data thoroughly and will lead to the aggregation results' not being used fully; worse yet, the actual networks are always existed with lossy links; many now available aggregation algorithms are based on ideal network models and not any further analysis and fusion about aggregation results are done. Thus, we propose the concept of data fusaggregation so as to support processing sensed data while transmitting in large-scale probabilistic wireless sensor networks and propose a reliability-oriented data fusaggregation algorithm (RODFA) to assist users to get the monitoring information from the monitored geographic environment and measure the reliability of the information they get. RODFA also facilitates network administrator to improve the system sensing performance for large-scale probabilistic WSNs. In RODFA, the parameter η, which could reflect the reliability of aggregation result intuitively, is defined and calculated and it plays an important part in helping users to process aggregation result further. In our experiment, the validity of RODFA is verified by our simulation results, and the influence of network sizes and network performances on data fusaggregation is analyzed.

1. Introduction

Wireless sensor network (WSN) is a class of wireless ad hoc networks which consist of thousands of sensor nodes (SNs). Due to recent advance in microelectronics, wireless communications and sensor technologies have made the development of low-cost, low-power, multifunctional sensor nodes possible [1]. The capabilities of pervasive surveillance, sensor networks have attracted significant attention from many applications domains, such as habitat monitoring [2, 3], object tracking [4, 5], environment monitoring [6–8], military [9], traffic management [10], disaster management [11], and smart environments. In these applications, the aggregation and fusion of sensed data are very important for users to get the summary information from monitored areas. Many works about data aggregation and data fusion for WSNs emerged, which further promote the application domains of WSN [12–14]; however, firstly, the existing study rarely takes into consideration that SNs are restricted by resources due to low battery supply; secondly, most researches are aimed at studying the network capacity issue under ideal network model which does not match the actual network; thirdly, data aggregation and data fusion have different singular focus; for example, the focus of data aggregation is data transmitting while data fusion places emphasis on analyzing sensed data. Both data transmitting and analyzing are important to WSNs applications; lastly, there are few profound discussion and research on the network size's and network performance's influence on data fusaggregation result (which will be introduced in the next section). We will try to study these four issues in later sections.

Firstly, as energy resources provided for SNs are usually battery cells which are impossible to recharge during WSNs' working process, SNs are restricted by resources due to low battery supply. Consequently, in this paper, energy saving technology is taken into consideration, to make the network model more practical. For better energy utilization, we choose cluster-based WSNs [15, 16]. In cluster-based WSNs, SNs resident in nearby area would form a cluster and SNs can select one of the clusters to be their cluster-head nodes (CHs). The CH organizes data pieces received from SNs into an aggregated result and then forwards the result to the base station (sink node) along with the regular routing paths.

Secondly, to the best of our knowledge, lots of the studies mentioned above are based on ideal deterministic network model (DNM), where any pair of nodes in a network is either connected or disconnected. If two nodes are connected, that is, there is a deterministic link between them, and then a successful data transmission can be guaranteed as long as there is no collision. Otherwise, if two nodes are disconnected, the direct communication between them is assumed to be impossible. However, for real application, this DNM assumption is too ideal and not practical on account of the “transitional region phenomenon” [17, 18]. Because of the transitional region phenomenon, a large number of network links (probably more than 90%) become unreliable, which is named lossy links [17]. Even without collisions, data transmission over a lossy link is successfully conducted with a certain probability rather than being completely guaranteed. Therefore, a more practical network model for WSN is the probabilistic network model (PNM) [17], in which data communications over a link are successful with a certain probability rather than always being successful or always unsuccessful. For convenience, the WSNs considered under the DNM/PNM are called deterministic/probabilistic WSNs. Hence, in order to make our research much closer to the real network and get more practical value than existing studies, our research is under probabilistic WSN shown as the network model in Section 2.

Thirdly, in order to truly achieve processing sensed data while transmitting among network, users, and network administrator, we first present the concept of data fusaggregation which is based on the definitions of data aggregation and data fusion. In many applications, WSNs are mainly used for gathering data acquired from the physical environment to an external base station [19]. During a data gathering process, if the raw data can be aggregated and only an aggregation value is transmitted to the sink node, it is called data aggregation [20]. And data fusion is a widely adopted signal processing technique that can improve the system sensing performance by jointly considering the measurements of multiple sensors [21]. As data aggregation and data fusion are both playing important functions in many applications of WSNs and they have different focuses, the realization of processing sensed data while transmitting has started to become important for WSNs. Thus, data fusaggregation is presented in our research as follows.

Definition 1 (data fusaggregation).

The raw data can be aggregated; only one aggregation value will be transmitted to the sink node. And through analyzing the aggregation value, the sink node will get some other information which could measure the performance of network or be useful to users.

Lastly, on the basis of the concept of data fusaggregation, a data fusaggregation algorithm is proposed to comprehensively analyze and use sensed data to facilitate the improvement of system sensing performance. This algorithm is named reliability-oriented data fusaggregation algorithm (RODFA). In this algorithm, firstly, the parameter η (the lower limit value of reliability) is defined to measure the reliability of aggregation result. Then we obtain the formula for calculating the value of η through theoretical derivation. Lastly, the aggregation result and the parameter η will be sent to users through the sink which could be a reference for users' handling the information they get. In our experiment part, we first study the influence of network size and network performance on RODFA result.

The rest of this paper is organized as follows. In Section 2, the network model is discussed; it includes the assumptions of this model and the problem definitions. In Section 3, we describe the mathematic foundations of RODFA. The calculation of η for one $\hat{Sum} (S_{t})$ is given in Section 4. Section 5 shows the validity of RODFA. And we discuss factors affecting the RODFA (network size and network performance) in Section 6. In the last section, we conclude the paper.

2. Network Model

2.1. Assumptions of the Model

In this paper, we consider that a large-scale probabilistic WSN consists of N sensors. Let $N_{t}$ be the number of active SNs in the monitored area at time t. $N_{t}$ is varying with time and unknown by the sink. Let $s_{t i}$ $(1 \leq i \leq N_{t})$ be the sensed data of active sensor node i at time t, and let $S_{t} = {s_{t 1}, s_{t 2}, \dots, s_{t N_{t}}}$ be the set of all the sensed data in monitored area at time t. $s_{t i}$ is stored in the active sensor node i for $1 \leq i \leq N_{t}$ . Since all sensed data are bounded, we use $\sup (S_{t})$ to denote the upper bound of all sensed data.

On the basis of the analysis in Section 1, for better energy utilization, we apply the dynamic minimal spanning tree routing protocol (DMSTRP) to our network model and our radio power model is similar to [22]. The DMSTRP is a cluster-based routing protocol for large-scale wireless sensor networks which is proposed in [23]. According to [23], in our network model, the monitored geographical area is divided into a number of clusters which are formed similar to [24], and the clusters are disjoined with each other. Here, assume that the monitored geographical area is fully covered by active sensors, and it is divided into n clusters. Let ω be the maximum of network's layers at time t, and let $m_{t 1}, m_{t 2}, m_{t 3}, \dots, m_{t ω}$ be the numbers of nodes in the $layer 1, layer 2, layer 3, \dots$ , $and layer ω$ at time t, respectively. Let $S_{1 t} = {s_{1 t 1}, s_{1 t 2}, s_{1 t 3}, \dots, s_{1 t m_{t 1}}}$ be the set of all sensed data in layer 1 at time t, and let $S_{2 t}, S_{3 t}, \dots, S_{ω t}$ be the set of all sensed data in $layer 2, layer 3, \dots, layer ω$ at time t, respectively.

When the sink node initiates one time of data collection at time t, we let d be the distance of all lossy links, the lossy links are successfully conducted with a certain probability q, and let $B^{(q)}$ be the set of all sensed data which are sensed successfully to sink from layer 1 at time t. Thus, the sensed data of layer 2, layer 3,…, layer ω will be sent successfully to the sink with a certain probability $q^{2}, q^{3}, \dots, q^{ω}$ , respectively, and let $B^{(q^{2})}, B^{(q^{3})}, \dots, B^{(q^{ω})}$ be the set of all sensed data which are sensed successfully to sink node from the $layer 2, layer 3, \dots, layer ω$ at time t, respectively. And the function relationship between q and d will be presented in the next section.

2.2. Problem Definition

In this paper, when researching on data fusaggregation, we take SUM operation, for example, and propose RODFA to comprehensively analyze and use sensed data for facilitating further promotion of WSNs' application domains. The exact sum of the monitored area at time t is defined as $Sum (S_{t}) = \sum_{i = 1}^{m_{t 1}} S_{1 t i} + \sum_{i = 1}^{m_{t 2}} S_{2 t i} + \sum_{i = 1}^{m_{t 3}} S_{3 t i} + \dots + \sum_{i = 1}^{m_{t ω}} S_{ω t i}$ . We will briefly introduce some related definitions of reliability as follows before introducing the steps of RODFA.

Definition 2 (ε-estimate).

${\hat{I}}_{t}$ is called as an ε-estimate of $I_{t}$ if $| ({\hat{I}}_{t} - I_{t}) / I_{t} | \leq ε$ for any ε ( $ε \geq 0$ ).

Definition 3 (reliability).

For a given network and ε ( $ε \geq 0$ ), reliability is the probability of whether ${\hat{I}}_{t}$ is the ε-estimate of $I_{t}$ .

Definition 4 (η).

For a given network and ε ( $ε \geq 0$ ), η ( $0 \leq η \leq 1$ ) is the lower limit value of probability of whether ${\hat{I}}_{t}$ is the ε-estimate of $I_{t}$ ; that is, $η \leq Pr (| ({\hat{I}}_{t} - I_{t}) / I_{t} | \leq ε)$ and we call it the lower limit value of reliability.

η is an important parameter which could measure the reliability of aggregation results. It will be sent to users for facilitating the next handling of aggregation result, should the users regard the aggregation result as a decision-making or just a reference point, even discard this aggregation result directly. The calculation of η will be completed in RODFA. The main steps of RODFA are shown as Figure 1, and they can be described as follows in detail.

Figure 1

The main steps of RODFA.

Step 1.

Sink node launches one time of data collection and sends the data acquisition command to every cluster, and the clusters will retransmit this command to their own child nodes.

Step 2.

According to [22], every node will send its sensed data to its cluster node. And after analyzing and processing the sensed data, the cluster will retransmit the processing result to sink node.

Step 3.

Sink node will get an approximate sum which will be discussed in Section 3: $\hat{Sum} (S_{t}) = (1 / q) \sum_{S_{1 t i} \in B^{(q)}} S_{1 t i} + (1 / q^{2}) \sum_{S_{2 t i} \in B^{(q^{2})}} S_{2 t i} + \dots + (1 / q^{ω}) \sum_{S_{ω t i} \in B^{(q^{ω})}} S_{ω t i}$ .

Step 4.

According to the formula of η $ϕ_{δ / 2}^{2} \leq q^{ω} \inf (N_{t}) \inf (S_{t}) ε^{2} / (1 - q^{ω}) \sup (S_{t})$ (where $δ = 1 - η$ , and the derivation process of this formula will be given in Section 4), sink node will calculate the value of η and turn it back to users with $\hat{Sum} (S_{t})$ together, or turn it back to the network administrator for facilitating the administrator to improve the system sensing performance. And the algorithm stops.

Combined with the above steps of RODFA, the key of RODFA is to obtain the value of η of $\hat{Sum} (S_{t})$ . And the problem of computing the value of η is defined as follows.

Input: (1)

$S_{1 t} = {s_{1 t 1}, s_{1 t 2}, s_{1 t 3}, \dots, s_{1 t m_{t 1}}}$ , $S_{2 t}, S_{3 t}, \dots, S_{ω t}$ ;

(2)

ε ( $ε \geq 0$ ), ω, and q;

(3)

aggregation operator sum.

Output:

The value of η of the mathematic estimator of sum: $\hat{Sum} (S_{t})$ .

For facilitating the later analysis, we will give the other two definitions of unbiased estimate and delay.

Definition 5 (unbiased estimate).

${\hat{I}}_{t}$ is an unbiased estimate of $I_{t}$ if the mathematical expectation of ${\hat{I}}_{t}$ is equal to $I_{t}$ , that is, $E ({\hat{I}}_{t}) = I_{t}$ ; otherwise ${\hat{I}}_{t}$ is a biased estimator of $I_{t}$ .

Combining [23], DMSTRP connects nodes in clusters by MSTs. In each cluster, all nodes including the CH are connected by a MST and then the CH as a leader to collect data from the whole tree. All CHs are connected by another MST to route toward sink. And the processing of data fusing is handled along with the tree route. The average transmission distance of each node can be reduced by using MSTs as Figure 2 shows, and thus the energy dissipation of transmitting data is reduced.

Figure 2

A MST in DMSTRP.

In Figure 2, node 1, node 2, and node 6 are leaf nodes; node 3 and node 5 are father nodes; node 4 is the cluster head node. Because node 1 and node 2 have the same father, only node 1 can transmit in the first period. Thus the first period transmitting queue is ${1, 6}$ ; this means node 1 and node 6 can transmit their data at the same time. We can get the transmitting queue in the following period as ${2, 5}$ and ${3}$ . Based on the above analysis, we define delay as follows.

Definition 6 (delay).

The amount of periods from the sink node initiates a data collection command to all sensed data from SNs being sent to the sink node, no matter how long it will last. This amount of time is called one delay.

According to the concept of delay, we can get that the delay of Figure 2 is Figure 3.

Figure 3

Verifying the validity of RODFA for network with 5000 m × 5000 m size, and for the red forked line, as when all relevant information in the network is fixed and immovable, the values of q and ω are constant, and the value of η of $\hat{Sum} (S_{t})$ will show a trend of increase with the increase of the relative error bound ε. Thus, we just describe the calculation results for η on a scale of 0 to 1; that is $0 \leq ε \leq 0.1$ .

3. Mathematic Foundations

The calculation of probability q is the foundation of RODFA. Existing studies show that the probability q will be a function of distance d (the distance between any pair of nodes is connected). Next, we will briefly describe the derivation of this function and then do research on the estimator of sum.

3.1. Relationship between q and Distance d

Let r be the receiving signal-to-noise ratio (SNR) and let f be the length of one frame, and then the functional relationship between q and r can be described as [18]

\begin{matrix} q = {(1 - 0.5 \times e^{(- r / 2) (1 / 0.64)})}^{8 f} . \end{matrix}

(1)

In this paper, the propagation model is lognormal shadowing model [25], and the relationship between r and the distance d will be obtained in next section.

Let $P L (d)$ be the path loss for one certain location, $d_{0}$ the reference distance, β the path loss exponent, and $X_{σ}$ Gaussian random variable available for zero mean. Then the relationship between $P L (d)$ and d can be described by formula (2) and in units of dBm:

\begin{matrix} P L (d) = P L (d_{0}) + 10 β lo g_{10} (\frac{d}{d_{0}}) + X_{σ} . \end{matrix}

(2)

On the basis of formula (2), we assume that $P_{trans}$ is the output power of the sender and $P_{recv}$ is the power that the receiver received. Then $P_{recv}$ can be obtained from the following:

\begin{matrix} P_{recv} = P_{trans} - P L (d) . \end{matrix}

(3)

Let $P_{n}$ be the platform noise and combined with the relationship (3) between the SNR and the power received by the receiver; we can see that

\begin{matrix} SNR = P_{recv} - P_{n} . \end{matrix}

(4)

Once the model is determined, all of the parameters in the above formulas will be only given. As we set the propagation model to be lognormal shadowing model, the parameter settings for our experiment are similar to [26]. Combined with the above formulas (1)–(4), we can obtain the relationship between q and d that can be shown as

\begin{matrix} q = {(1 - 0.5 \times e^{(- P_{trans} - P L (d_{0}) - 10 β lo g_{10} (d / d_{0}) - X_{σ} - N) / 1.28})}^{8 f} . \end{matrix}

(5)

3.2. Estimator of Sum

Let $S_{1 t}, S_{2 t}, S_{3 t}, \dots, S_{ω t}$ be the sets of sensed data in layer 1, layer 2, layer 3, …, layer ω at time t, respectively, and let $B^{(q)}, B^{(q^{2})}, B^{(q^{3})}, \dots, B^{(q^{ω})}$ be the sets of all sensed data which are sensed successfully to sink node from the layer 1, layer 2, layer 3, …, layer ω at time t, respectively. The mathematic estimator of sum is denoted by $\hat{Sum} (S_{t})$ , and the $\hat{Sum} (S_{t})$ can be computed by

\begin{array}{l} \hat{Sum} (S_{t}) = \frac{1}{q} \sum_{S_{1 t i} \in B^{(q)}} S_{1 t i} + \frac{1}{q^{2}} \sum_{S_{2 t i} \in B^{(q^{2})}} S_{2 t i} \\ + \dots + \frac{1}{q^{ω}} \sum_{S_{ω t i} \in B^{(q^{ω})}} S_{ω t i}, \end{array}

(6)

where q is the probability in which lossy links are successfully conducted at time t. And according to the above definition of the unbiased estimate, Theorem 7 shows that

S \hat{um} (S_{t})

is the unbiased estimator of the exact sum

Sum (S_{t})

Theorem 7.

Let $E (\hat{Sum} (S_{t}))$ be the mathematical expectation of $\hat{Sum} (S_{t})$ and let $Var (\hat{Sum} (S_{t}))$ be the variance. Then,

\begin{matrix} E (\hat{Sum} (S_{t})) = Sum (S_{t}), \\ Var (\hat{Sum} (S_{t})) \leq \sup (S_{t}) Sum (S_{t}) \frac{1 - q}{q} . \end{matrix}

(7)

Proof.

For any $i (1 \leq i \leq m_{t k})$ , set random variables $X_{1 i}, X_{2 i}, X_{3 i}, \dots, X_{ω i}$ satisfy the following equations, respectively:

\begin{array}{l} X_{1 i} = {\begin{cases} 1 & if S_{1 t i} \in B^{(q)} \\ 0 & if S_{1 t i} \notin B^{(q)} \end{cases} \\ ⋮ \\ X_{ω i} = {\begin{cases} 1 & if S_{ω t i} \in B^{(q^{ω})} \\ 0 & if S_{ω t i} \notin B^{(q^{ω})} . \end{cases} \end{array}

(8)

Clearly, there are $Pr (X_{1 i} = 1) = q$ , $Pr (X_{1 i} = 0) = 1 - q$ , $Pr (X_{2 i} = 1) = q^{2}$ , $Pr (X_{2 i} = 0) = 1 - q^{2}$ ,…, $Pr (X_{ω i} = 1) = q^{ω}$ , and $Pr (X_{ω i} = 0) = 1 - q^{ω}$ . Meanwhile, according to the lognormal shadowing model, for $1 \leq i \neq j \leq m_{t k}$ , there exist the random variables $X_{k i}$ and $X_{k j}$ that are independent of each other ( $1 \leq k \leq ω$ ). Furthermore, according to the distribution of $X_{k i} (1 \leq k \leq ω)$ , there are $E (X_{1 i}) = q$ , $E (X_{2 i}) = q^{2}, \dots, E (X_{ω i}) = q^{ω}$ , $Var (X_{1 i}) = q (1 - q)$ , $Var (X_{2 i}) = q^{2} (1 - q^{2}), \dots, Var (X_{ω i}) = q^{ω} (1 - q^{ω})$ .

Combining (6), we can get

\begin{array}{l} \hat{Sum} (S_{t}) = \frac{1}{q} \sum_{i = 1}^{m_{t 1}} S_{1 t i} X_{1 i} + \frac{1}{q^{2}} \sum_{i = 1}^{m_{t 2}} S_{2 t i} X_{2 i} \\ + \dots + \frac{1}{q^{ω}} \sum_{i = 1}^{m_{t ω}} S_{ω t i} X_{ω i} . \end{array}

(9)

As $E (X_{1 i}) = q$ , $E (X_{2 i}) = q^{2}, \dots, E (X_{ω i}) = q^{ω}$ ; there is

\begin{matrix} E (\hat{Sum} (S_{t})) = \sum_{i = 1}^{m_{t 1}} S_{1 t i} + \sum_{i = 1}^{m_{t 2}} S_{2 t i} + \dots + \sum_{i = 1}^{m_{t ω}} S_{ω t i} = Sum (S_{t}) . \end{matrix}

(10)

For two independent events X and Y, let $D (X)$ and $D (Y)$ be the variances of event X and event Y, respectively; then $D (X + Y) = D (X) + D (Y)$ . Combine $Var (X_{1 i}) = q (1 - q), \dots, Var (X_{ω i}) = q^{ω} (1 - q^{ω})$ ; then,

\begin{array}{l} Var (\hat{Sum} (S_{t})) \\ = \frac{(1 - q)}{q} \sum_{i = 1}^{m_{t 1}} S_{1 t i}^{2} + \frac{(1 - q^{2})}{q^{2}} \sum_{i = 1}^{m_{t 2}} S_{2 t i}^{2} + \dots + \frac{(1 - q^{ω})}{q^{ω}} \sum_{i = 1}^{m_{t ω}} S_{ω t i}^{2} \\ \leq \frac{(1 - q)}{q} Sup (S_{t}) \sum_{i = 1}^{m_{t 1}} S_{1 t i} + \frac{(1 - q^{2})}{q^{2}} Sup (S_{t}) \sum_{i = 1}^{m_{t 2}} S_{2 t i} \\ + \dots + \frac{(1 - q^{ω})}{q^{ω}} Sup (S_{t}) \sum_{i = 1}^{m_{t ω}} S_{ω t i} \\ = \frac{(1 - q^{ω})}{q^{ω}} Sup (S_{t}) \times [\frac{q^{ω - 1}}{(1 + q + q^{2} + \dots + q^{ω - 1})} \sum_{i = 1}^{m_{t 1}} S_{1 t i} \\ + \frac{q^{ω - 2}}{(1 + q^{2} + q^{4} + \dots + q^{ω - 2})} \sum_{i = 1}^{m_{t 2}} S_{2 t i} \\ + \frac{q^{ω - 3}}{(1 + q^{3} + q^{6} + \dots + q^{ω - 3})} \sum_{i = 1}^{m_{t 3}} S_{3 t i} \\ + \dots + \sum_{i = 1}^{m_{t ω}} S_{ω t i}] . \end{array}

(11)

According to formula (11), as $q^{ω - 1} / (1 + q + q^{2} + \dots + q^{ω - 1}) \leq 1$ , $q^{ω - 2} / (1 + q^{2} + q^{4} + \dots + q^{ω - 2}) \leq 1$ , $q^{ω - 3} / (1 + q^{3} + q^{6} + \dots + q^{ω - 3}) \leq 1, \dots$ , then,

\begin{array}{l} Var (\hat{Sum} (S_{t})) \leq \frac{(1 - q^{ω})}{q^{ω}} Sup (S_{t}) \\ \times [\sum_{i = 1}^{m_{t 1}} S_{1 t i} + \sum_{i = 1}^{m_{t 2}} S_{2 t i} + \sum_{i = 1}^{m_{t 3}} S_{3 t i} + \dots + \sum_{i = 1}^{m_{t ω}} S_{ω t i}] \\ = \frac{(1 - q^{ω})}{q^{ω}} Sup (S_{t}) \times Sum (S_{t}) . \end{array}

(12)

Theorem 7 shows that the mathematic estimator of sum $\hat{Sum} (S_{t})$ is the unbiased estimator of the exact sum $Sum (S_{t})$ . The upper limit of the variance value $Var (\hat{Sum} (S_{t}))$ is inversely proportional to the probability q. That is to say, with the probability q increasing, the upper limit of $Var (\hat{Sum} (S_{t}))$ can be arbitrarily small. Thus, referring to [27], with the increase of q, the relative error between $\hat{Sum} (S_{t})$ and $\hat{Sum} (S_{t})$ will gradually decrease, and if q is sufficiently large, this relative error can be arbitrarily small.

4. Calculating η of $\hat{Sum} (S_{t})$

According to the steps of RODFA in Section 2.2, the key step is to calculate η of $\hat{Sum} (S_{t})$ . Thus, we will research on this issue next.

The steps of calculating the value of η are (1) proofing that $\hat{Sum} (S_{t})$ obeys the normal distribution; (2) transforming the normal distribution into standard normal distribution; and (3) utilizing the characteristics of standard normal distribution to calculate the value of η.

For any i, let the variable $Y_{k i}$ ( $1 \leq K \leq ω$ ) be as follows:

\begin{array}{l} Y_{1 i} = {\begin{cases} S_{1 t i} & if S_{1 t i} \in B^{(q)} \\ 0 & if S_{1 t i} \notin B^{(q)} \end{cases} \\ ⋮ \\ Y_{ω i} = {\begin{cases} S_{ω t i} & if S_{ω t i} \in B^{(q^{ω})} \\ 0 & if S_{ω t i} \notin B^{(q^{ω})} . \end{cases} \end{array}

(13)

There is $\hat{Sum} (S_{t}) = (1 / q) \sum_{S_{1 t i} \in B^{(q)}} S_{1 t i} + (1 / q^{2}) \sum_{S_{2 t i} \in B^{(q^{2})}} S_{2 t i} + \dots + (1 / q^{ω}) \sum_{S_{ω t i} \in B^{(q^{ω})}} S_{ω t i}$ . Firstly, we need to proof that $\hat{Sum} (S_{t})$ obeys normal distribution. In view of the linear combination of n independent normal distribution functions still obey normal distribution through proofing that the sum of each layer data obeys normal distribution to proof that $\hat{Sum} (S_{t})$ obeys the normal distribution. Reference [28] shows that, if each layer data conforms to Lyapunov condition, the sum of each layer data will be in accordance with the application conditions of the central limit theorem; that is, the sum of each layer data will obey the normal distribution. And Theorem 8 proofs that the data in layer 1, layer 2, …, layer ω conform to Lyapunov condition, respectively.

Theorem 8.

The ω groups of sequence of random variables $Y_{k i}$ ( $1 \leq K \leq ω$ ) satisfy the Lyapunov condition; that is $\exists ξ_{k} > 0$ satisfy the following formula:

\begin{matrix} \lim_{m_{tk} \to \infty} \frac{1}{s_{m_{t k}}^{2 + ξ_{k}}} \sum_{i = 1}^{m_{t k}} E ({| Y_{k i} - μ_{k i} |}^{2 + ξ_{k}}) = 0 . \end{matrix}

(14)

Among them,

1 \leq K \leq ω

m_{t k}

is the number of data in layer k at time t,

s_{m_{t k}}^{2} = \sum_{i = 1}^{m_{t k}} σ_{k i}

, and for

\forall i (1 \leq i \leq m_{t k})

, there are

μ_{k i} = E (Y_{k i})

and

σ_{k i} = Var (Y_{k i})

Proof.

Combining Section 2, the sensed data of the layer 1, layer 2, layer 3,…, layer ω will be sent successfully to the sink with a certain probability $q, q^{2}, q^{3}, \dots, q^{ω}$ , respectively. $S_{1 t}, S_{2 t}, S_{3 t}, \dots, S_{ω t}$ are the set of all sensed data in layer 1, layer 2, layer 3, …, layer ω at time t, respectively, and $B^{(q)}, B^{(q^{2})}, B^{(q^{3})}, \dots, B^{(q^{ω})}$ are the sets of all sensed data which are sent successfully to sink node from layer 1, layer 2, layer 3, …, layer ω at time t, respectively.

For Layer 1 $(k = 1)$ . There is $μ_{1 i} = E (Y_{1 i}) = q s_{1 t i}$ , and $σ_{1 i} = Var (Y_{1 i}) = s_{1 t i}^{2} q (1 - q)$ .

Order $ξ_{1} = 1$ , according to the above analysis, for $\forall i (1 \leq i \leq m_{t 1})$ , there is

\begin{array}{l} E ({| Y_{1 i} - μ_{1 i} |}^{2 + ξ_{1}}) = E ({| Y_{1 i} - μ_{1 i} |}^{3}) \\ = q {(s_{1 t i} - q s_{1 t i})}^{3} + (1 - q) {(q s_{1 t i})}^{3} \\ = s_{1 t i}^{3} q (1 - q) (1 - 2 q + 2 q^{2}) . \end{array}

(15)

Meanwhile, we have

\begin{matrix} s_{m_{t 1}}^{2 + ξ} = s_{m_{t 1}}^{3} = \sum_{i = 1}^{m_{t 1}} s_{1 t i}^{2} q (1 - q) \sqrt{\sum_{i = 1}^{m_{t 1}} s_{1 t i}^{2} q (1 - q)} . \end{matrix}

(16)

Combine formula (15) with formula (16), there is

\begin{array}{l} \lim_{m_{t 1} \to \infty} \frac{1}{s_{m_{t 1}}^{3}} \sum_{i = 1}^{m_{t 1}} E ({| Y_{1 i} - μ_{1 i} |}^{3}) \\ = \frac{1 - 2 q + 2 q^{2}}{\sqrt{(q (1 - q))}} \lim_{m_{t 1} \to \infty} \frac{\sum_{i = 1}^{m_{t 1}} s_{1 t i}^{3}}{\sum_{i = 1}^{m_{t 1}} s_{1 t i}^{2} \sqrt{\sum_{i = 1}^{m_{t 1}} s_{1 t i}^{2}}} \\ \leq \frac{1 - 2 q + 2 q^{2}}{\sqrt{(q (1 - q))}} \frac{\sup {(S_{1 t})}^{3}}{\inf {(S_{1 t})}^{3}} \lim_{m_{t 1} \to \infty} \frac{1}{\sqrt{m_{t 1}}} . \end{array}

(17)

Among them, $\inf (S_{1 t})$ and $\sup (S_{1 t})$ , respectively, present the lower limit and upper limit of sensed data in layer 1. $| \inf (S_{1 t}) |$ , $| \sup (S_{1 t}) | ≪ + \infty$ ; hence $\lim_{m_{t 1} \to \infty} (1 / s_{m_{t 1}}^{3}) \sum_{i = 1}^{m_{t 1}} E ({| Y_{1 i} - μ_{1 i} |}^{3}) \leq 0$ . Meanwhile, $s_{m_{t 1}}^{3} \geq 0$ and $E ({| Y_{1 i} - μ_{1 i} |}^{3}) \geq 0$ ; therefore $\lim_{m_{_{t 1}} \to \infty} (1 / s_{m_{_{t 1}}}^{3}) \sum_{i = 1}^{m_{_{t 1}}} E ({| Y_{1 i} - μ_{1 i} |}^{3}) \geq 0$ . In conclusion, $\lim_{m_{_{t 1}} \to \infty} (1 / s_{m_{_{t 1}}}^{3}) \sum_{i = 1}^{m_{_{t 1}}} E ({| Y_{1 i} - μ_{1 i} |}^{3}) = 0$ ; that is, when $k = 1$ , there is $ξ_{1} (ξ_{1} = 1)$ to satisfy formula (14) in Theorem 8, and $Y_{1 i}$ satisfies the Lyapunov condition. According to [28], $\hat{Sum} (S_{1 t}) = \sum_{i = 1}^{m_{t 1}} Y_{1 i}$ meets the application conditions of central limit theorem; that is, $\hat{Sum} (S_{1 t})$ obeys normal distribution.

Omitting the analysis of other layers (the researching method for other layers is same as $k = 1$ ), we will then describe the analysis of layer ω; that is, $k = ω$ .

For Layer ω $(k = ω)$ . Same as the analysis of $k = 1$ , when $k = ω$ , there are $u_{ω i} = E (Y_{ω i}) = q^{ω} S_{ω t i}$ and $σ_{ω i} = Var (Y_{ω i}) = q^{ω} (1 - q^{ω}) S_{ω t i}^{ω}$ . For $\forall i (1 \leq i \leq m_{t ω})$ , if we let $ξ_{ω} = 1$ , then there will be

\begin{array}{l} E ({| Y_{ω i} - u_{ω i} |}^{2 + ξ}) = E ({| Y_{ω i} - u_{ω i} |}^{3}) \\ = q^{ω} {(S_{ω t i} - q^{ω} S_{ω t i})}^{3} \\ + (1 - q^{ω}) {| 0 - q^{ω} S_{ω t i} |}^{3} \\ = q^{ω} S_{2 t i}^{3} (1 - q^{ω}) (1 - 2 q^{ω} + 2 q^{2 ω}), \end{array}

(18)

S_{m_{t ω}}^{2 + ξ} = S_{m_{t ω}}^{3} = \sum_{i = 1}^{m_{t ω}} S_{ω t i}^{2} q^{ω} (1 - q^{ω}) \sqrt{\sum_{i = 1}^{m_{t ω}} S_{ω t i}^{2} q^{ω} (1 - q^{ω})} .

(19)

Combining formula (18) with formula (19), there is

\begin{array}{l} \lim_{m_{t ω} \to \infty} \frac{1}{S_{m_{t ω}}^{3}} \sum_{i = 1}^{m_{t ω}} E ({| Y_{ω i} - u_{ω i} |}^{3}) \\ = \frac{(1 - 2 q^{ω} + 2 q^{2 ω})}{q^{ω} (1 - q^{ω})} \lim_{m_{t ω} \to \infty} \frac{\sum_{i = 1}^{m_{t ω}} S_{ω t i}^{3}}{\sum_{i = 1}^{m_{t ω}} S_{ω t i}^{2} \sqrt{\sum_{i = 1}^{m_{t ω}} S_{ω t i}^{2}}} \\ \leq \frac{(1 - 2 q^{ω} + 2 q^{2 ω})}{q^{ω} (1 - q^{ω})} \frac{Sup {(S_{ω t})}^{3}}{\inf {(S_{ω t})}^{3}} \lim_{m_{t ω} \to \infty} \frac{1}{\sqrt{m_{t ω}}} . \end{array}

(20)

That is, for layer ω, $ξ_{ω} (ξ_{ω} = 1)$ satisfies the formula (14) in Theorem 8 to make $\lim_{m_{_{t ω}} \to \infty} (1 / s_{m_{_{t ω}}}^{3}) \sum_{i = 1}^{m_{_{t ω}}} E ({| Y_{ω i} - μ_{ω i} |}^{3}) = 0$ , and $Y_{ω i}$ also satisfies the Lyapunov condition. According to [28], $\hat{Sum} (S_{ω t}) = \sum_{i = 1}^{m_{t ω}} Y_{ω i}$ meets the application conditions of central limit theorem; that is, $\hat{Sum} (S_{ω t})$ obeys the normal distribution.

Theorem 8 shows that the ω groups of random variable sequences $Y_{k i}$ ( $1 \leq K \leq ω$ ) satisfy the Lyapunov condition. That is, the sum of each layer data $\hat{Sum} (S_{k t}) = \sum_{i = 1}^{m_{t k}} Y_{k i}$ obeys normal distribution. As whether the sensed data in each layer can be sent successfully to sink node is independent of each other, so $\hat{Sum} (S_{t})$ is the sum of these ω independent variable normal distributions $\hat{Sum} (S_{k t})$ . Thus $\hat{Sum} (S_{t})$ obeys normal distribution. For a given relative error limit ε, Theorem 9 describes the calculation of η of $\hat{Sum} (S_{t})$ .

Theorem 9.

Supposing $δ = 1 - η$ , $ϕ_{δ / 2}$ is the $δ / 2$ quantile of standardized normal distribution, if $ϕ_{δ / 2}$ satisfies the following formula:

\begin{matrix} ϕ_{δ / 2}^{2} \leq \frac{q^{ω} \inf (N_{t}) \inf (S_{t}) ε^{2}}{(1 - q^{ω}) \sup (S_{t})} . \end{matrix}

(21)

Then, the probability that the relative error between

\hat{Sum} (S_{t})

and

Sum (S_{t})

satisfies the given error limit ε will be equal or greater than η; that is,

\begin{matrix} Pr (| \frac{\hat{Sum} (S_{t}) - Sum (S_{t})}{Sum (S_{t})} | \leq ε) \geq η . \end{matrix}

(22)

Proof.

From formula (21), there is $\inf (N_{t}) \inf (S_{t}) ε^{2} \geq ϕ_{δ / 2}^{2} \sup (S_{t}) ((1 - q^{ω}) / q^{ω})$ . As $\inf (N_{t})$ and $\inf (S_{t})$ are, respectively, the lower limit of $N_{t}$ and the lower limit of the value of all sensed data, so there is $Sum (S_{t}) = \sum_{i = 1}^{N_{t}} s_{t i} \geq \inf (N_{t}) \inf (S_{t})$ . Hence,

\begin{matrix} ε^{2} Sum (S_{t}) \geq ϕ_{δ / 2}^{2} \sup (S_{t}) \frac{(1 - q^{ω})}{q^{ω}} . \end{matrix}

(23)

Theorem 7 shows that $Var (\hat{Sum} (S_{t})) \leq ((1 - q^{ω}) / q^{ω}) Sup (S_{t}) Sum (S_{t})$ , $E (\hat{Sum} (S_{t})) = Sum (S_{t})$ , and as $\hat{Sum} (S_{t})$ obeys normal distribution, from formula (23), there is

\begin{matrix} Pr {\frac{| \hat{Sum} (S_{t}) - Sum (S_{t}) |}{ϕ_{δ / 2} \sqrt{Var (\hat{Sum} (S_{t}))}} \geq 1} = δ . \end{matrix}

(24)

Combining the knowledge of standard normal distribution quantile [29], (23), (24), and $δ = 1 - η$ , there is

\begin{matrix} Pr (| \frac{\hat{Sum} (S_{t}) - Sum (S_{t})}{Sum (S_{t})} | \leq ε) \geq η . \end{matrix}

(25)

5. Validity of RODFA

To evaluate RODFA, we use Matlab to simulate a sensor network of 5000 nodes. Using the above 5000 nodes and 285 cluster-heads randomly deployed in network area of 5000 m × 5000 m, 10000 m × 10000 m, 15000 m × 15000 m, we make the DMSTRP protocol running in the simulation network system to connect all nodes and cluster-heads into a whole. For time t in ready phase, we can obtain the maximum of network's layers ω through counting the numbers of layers of all clusters. As all the nodes are randomly deployed in network area, and as far as possible to ensure uniformity in the process of deploying, we can randomly select one cluster from the network, and then to obtain the distance d of all lossy links through calculating the mean distance of all distances between every two linked nodes in this cluster. Plugging this distance d into formula (5), then we can get the probability q at time t.

This group experiments are to investigate whether RODFA is valid. As data communications over a link is successful with a certain probability rather than always being successful or always fail in probabilistic WSNs [17], every time we calculate the reliability of $\hat{Sum} (S_{t})$ , we will get the different reliability value even if all relevant information of the network is fixed and immovable (i.e., the number of active nodes, the structure of the cluster, the number of clusters, the number of nodes in every cluster, the distance of all links, and the layout of all nodes are changeless).

We do 10000 times of calculations when all relevant information in the network is fixed and immovable and let the initial energy of sink node, cluster-heads, and nodes be large enough (no node will die in experiment process). For these 10000 calculations, we calculate the relative error between $\hat{Sum} (S_{t})$ and $Sum (S_{t})$ ( $\hat{Sum} (S_{t})$ and $Sum (S_{t})$ obtained from our simulation network. In order to get the value of $Sum (S_{t})$ , in this experiment, we assume that the CHs will not only retransmit the aggregation result to sink node but also relay all sensed data that are from other SNs or CHs and then do statistics analysis on cumulative probability under different relative error bounds ε, shown as the blue dashed lines in Figures 3, 4, and 5, and we call this reliability as statistical reliability. Furthermore, according to the above analysis, sink node can get the values of ω and q for time t easily, and after putting ω and q in $ϕ_{δ / 2}^{2} \leq q^{ω} \inf (N_{t}) \inf (S_{t}) ε^{2} / (1 - q^{ω}) \sup (S_{t})$ , we can calculate the value of η of $\hat{Sum} (S_{t})$ for every relative error bound ε and then describe the line of η as the red forked lines in Figures 3, 4, and 5, and we call this reliability as RODFA reliability.

Figure 4

Verifying the validity of RODFA for the network with 10000 m × 10000 m size and we only describe the calculation results for ε on a scale of 0 to 0.14 in the red forked line.

Figure 5

Verifying the validity of RODFA for the network with 15000 m × 15000 m size, where we only describe the calculation results for ε on a scale of 0 to 0.18 in the red forked line.

Figures 3, 4, and 5, respectively, illustrate the statistical results from simulation experiments (statistical reliability) and the calculation results from RODFA (RODFA reliability) for network area of 5000 m × 5000 m, 10000 m × 10000 m, 15000 m × 15000 m. In Figure 3, the blue dashed line shows that the statistical reliability for ε = 0.0225 is 0.9999, and the red forked line shows that, when the RODFA reliability η of $\hat{Sum} (S_{t})$ reaches 0.9999, the relative error bound ε between $\hat{Sum} (S_{t})$ and $Sum (S_{t})$ will be 0.0907, and the difference between 0.0225 and 0.0907 is 0.0682. Furthermore, the blue dashed line also demonstrates that, for the network with 5000 m × 5000 m size, the maximum of the relative error bound ε between $\hat{Sum} (S_{t})$ and $Sum (S_{t})$ from the network simulation results is 0.0234. In Figure 4, it can be found that when the statistical reliability of $\hat{Sum} (S_{t})$ (shown as the blue dashed line) reaches 0.9999, ε is equal to 0.0367; and the calculation results of RODFA demonstrate that, when RODFA reliability η reaches 0.9999, the relative error bound ε will be 0.1379, and the difference between 0.0367 and 0.1379 is 0.1012. In addition, the blue dashed line also shows that the maximum of the relative error bound ε between $\hat{Sum} (S_{t})$ and $Sum (S_{t})$ from the network simulation results is 0.0413. In Figure 5, when the statistical reliability of $\hat{Sum} (S_{t})$ reaches 0.9999, the relative error bound ε is equal to 0.0508; and the red forked line shows that, when RODFA reliability η of $\hat{Sum} (S_{t})$ reaches 0.9999, the relative error bound ε will be 0.1768, and the difference between 0.0508 and 0.1768 is 0.126. Furthermore, the blue dashed line also demonstrates that the maximum of the relative error bound ε in the network simulation results is 0.0513.

After describing the comparative analysis of each figure separately, we will investigate and analyze the relevant feature details shown in Figures 3, 4, and 5 later.

Firstly, the three figures show that the maximums of relative error bounds ε for blue dashed lines are all small. These numbers show that the data aggregation method in our paper has better approximating effect, and it is also corresponding to the proof that $\hat{Sum} (S_{t})$ is an unbiased estimate of $Sum (S_{t})$ . Secondly, there is a gap between blue dashed line and red forked line in each figure, and reasons for this phenomenon are as follows: (1) the zooming out in the formula derivation process of upper limit variance of $\hat{Sum} (S_{t}) (Var (\hat{Sum} (S_{t})) \leq ((1 - q^{ω}) / q^{ω}) Sup (S_{t}) \times Sum (S_{t}))$ ; (2) just considering the distance d and neglecting other factors which also affect the value of q between two connected nodes. (3) In these three figures, due to the increase of network area, the statistical reliabilities go down (for a same relative error value) and the maximums of relative error bounds ε between $\hat{Sum} (S_{t})$ and $Sum (S_{t})$ grow bigger (from 0.0234 to 0.0513, shown as the blue dashed lines in these three figures). (4) With the increase of network area, the RODFA reliability η is falling, but its changing rate is greater than that of the statistical reliability ( $0.0682 < 0.1012 < 0.126$ ).

The above analysis indicates that the RODFA is validity, and η of $\hat{Sum} (S_{t})$ which is calculated by RODFA has a greater likelihood to be the minimum of $\hat{Sum} (S_{t})$ reliability.

6. Discussing the Factors Affecting RODFA

The aim of the former part of the experiment is to demonstrate the effectiveness of RODFA. Next we will turn RODFA embedded in our simulation network and investigate the factors affecting RODFA. We use Matlab simulator to implement a wireless sensor network. To ensure that the simulation results in this paper are correct, we use our simulator to do the same experiments (the network lifetime) of DMSTRP that use C/C++ in [23] and get the approximately same results. And the optimal number of clusters in our simulative network is adopted according to [30].

In this section, the first group of experiments is to describe the changing trend of the value of η of $\hat{Sum} (S_{t})$ (calculated by RODFA) with the increase of network running rounds. And the experiment results for different network areas are shown in Figure 6.

Figure 6

η of $\hat{Sum} (S_{t})$ in network area of 5000 m × 5000 m (blue), 10000 m × 10000 m (red), and 15000 m × 15000 m (green). From the blue line, we can find that the value of η is 0 from the 0th to the 9th round, and in the 9th round, the value of η leaps to 0.9611 and lasts for 400 rounds. In the 409th round, the value of η suddenly reduces to 0.9457 and lasts for 210 rounds, and then there has a sudden drop of η in the 619th round. Then η has a sustained downward trend until its value becomes 0. The red line shows the η of $\hat{Sum} (S_{t})$ in network area of 10000 m × 10000 m. We can find that, among the first 11 rounds, the value of η is 0, and then it leaps to 0.9133 in the 11th round and lasts for 300 rounds. In the next 200 rounds (from the 311th round to the 511th round), η is 0.8742, and in the 511th round, the value of η reduces to 0.8504 again. After a sudden drop in the 631th round, η has a sustained downward trend until its value becomes 0. The green line describes the η of $\hat{Sum} (S_{t})$ in network area of 15000 m × 15000 m. η = 0 for the first 13 rounds, and in the 413th round, η has a sudden drop, and then its value continues to fall until it becomes 0.

Due to these three lines belonging to the same type of experimental result, we will just analyze the blue curve shape in Figure 6. The blue line describes the changing trend of η with the increase of network running rounds for network of 5000 m × 5000 m (when $ε = 0.08$ ). As the establishment of clusters will take a certain amount of time (how long does it take is based on network area and active nodes number), that is why η is 0 from the 0th round to the 9th round. After the sink node obtaining the values of q and ω (ω = 3 and q = 0.9308), RODFA will work out the value of η and η leaps to 0.9611 in the 9th round.

Based on the above analyses, the values of q and ω mainly depend on network area, active nodes number, and internal structure of cluster and so on. As all of these three lines are describing the simulation results of network with certain area, thus we do not need to investigate the inference of network area on q and ω. From the blue line, we can find that the number of active nodes is 5000 in the first 610 rounds. But near the 400th round, clusters are remodeled, which makes the network structure changed; it also changes the values of q and ω. Thus, following the 409th round, η suddenly reduces to 0.9457 and lasts for 210 rounds. Secondly, the active nodes number begins to decline from the 600th round. As setup phase and ready phase continuously cycle in the network, the values of q and ω are different for different time. That leads to the result that η has a sustained downward trend until its value becomes 0 from the 619th round. Lastly, we also find that η is lower in the subsequent 123 rounds (from the 811th round to the 934th round). That is because the active nodes number remains at a low level in the subsequent rounds of network running, while less active nodes deploy in the network with a certain area, the value of ω will be larger and q will be smaller, which makes η small.

The second group of experiments is to investigate the influence of network size on RODFA. According to $ϕ_{δ / 2}^{2} \leq q^{ω} \inf (N_{t}) \inf (S_{t}) ε^{2} / (1 - q^{ω}) \sup (S_{t})$ , the value of η required given ε, q, and ω. Based on the above analysis, the values of q and ω depend on network size when the routing protocol is certain. And as the value of ε is given by user's application requirement, η will change with the difference of network area. As shown in Figure 7, when ε is, respectively, set to be 0.01, 0.03, and 0.05, η decreases with the growth of network size.

Figure 7

The influence of network size on RODFA, where ε, respectively, is set to be 0.01, 0.03, and 0.05. The number of nodes is 5000. The values of η for these three ε are calculated while the network size increases from 5,000 to 15,000.

The last group of experiments is to analyze the network delay. According to the definition of delay in Section 2, we can get the delays of simulation network models with size of 5000 m × 5000 m, 10000 m × 10000 m, and 15000 m × 15000 m and describe them in Figure 8. Figure 8 shows that the network delay fluctuates up and down in 524 when the number of alive nodes is more than 1500; while when the number of alive nodes is less than 1500, the value of network delay begins to decrease until its value becomes 0. Probably this is due to the numbers of links and clusters in the network, and the value of network ω decreases with the decrease of alive nodes in the network, which lead to the changing trend of network delay shown in Figure 8.

Figure 8

The delay of network under different size.

7. Conclusions

Our proposed algorithm, RODFA, belongs to a data fusaggregation algorithm which mainly includes aggregating sensed data from large-scale wireless sensor networks and doing fusion analysis of the aggregation results. We choose SUM to design RODFA in this paper and the main idea of RODFA is to calculate the parameter η of $\hat{Sum} (S_{t})$ through auto analysis and synthesizing of $\hat{Sum} (S_{t})$ . It can provide η to users and facilitate them to do the next handling of $\hat{Sum} (S_{t})$ ; also it can facilitate the network administrator to improve system sensing performance. Experiments in Section 5 indicate that the parameter η, which is calculated by RODFA, has a greater likelihood to be the minimum of $\hat{Sum} (S_{t})$ reliability and RODFA is valid. In Section 6, we first investigate the influences of network size and network performanceon data fusaggregation to guide the network administrator on improving the system sensing performance.

Footnotes

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgment

The project is supported by The National Key Technology Research and Development Program of the Ministry of Science and Technology of China (Grant no. 2012BAH82F04).

References

Huang

Zomaya

A. Y.

Delicato

F. C.

Pires

P. F.

Long term and large scale time synchronization in wireless sensor networks

Computer Communications 2014 37 77 91

10.1016/j.comcom.2013.10.003

Rozyyev

Hasbullah

Subhan

Indoor child tracking in wireless sensor network using fuzzy logic technique

Research Journal of Information Technology 2011 3 2 81 92

2-s2.0-80054787151

10.3923/rjit.2011.81.92

Szewczyk

Osterweil

Polastre

Hamilton

Mainwaring

Estrin

Habitat monitoring with sensor networks

Communications of the ACM 2004 47 6 34 40

2-s2.0-4243114087

10.1145/990680.990704

Wang

Z. B.

Lou

Wang

J. C.

Chen

H. L.

A hybrid cluster-based target tracking protocol for wireless sensor networks

International Journal of Distributed Sensor Networks 2013 2013 16

494863

10.1155/2013/494863

Garcia

Quintero

Pierre

A global profile-based algorithm for energy minimization in object tracking sensor networks

Computer Communications 2010 33 6 736 744

2-s2.0-76049090619

10.1016/j.comcom.2009.11.020

Zhang

Song

G. M.

Qiao

G. F.

Wang

A. M.

A wireless sensor network system with a jumping node for unfriendly environments

International Journal of Distributed Sensor Networks 2012 2012 8

568240

10.1155/2012/568240

Sabri

Aljunid

S. A.

Ahmad

Yahya

Kamaruddin

Salim

M. S.

Wireless sensor actor network based on fuzzy inference system for greenhouse climate control

Journal of Applied Sciences 2011 11 17 3104 3116

2-s2.0-80052844950

10.3923/jas.2011.3104.3116

Kumar

Monitoring forest cover changes using remote sensing and GIS: a global prospective

Research Journal of Environmental Sciences 2011 5 105 123

10.3923/rjes.2011.105.123

Bekmezci

Alagöz

Delay sensitive, energy efficient and fault tolerant distributed slot assignment algorithm for wireless sensor networks under convergecast data traffic

International Journal of Distributed Sensor Networks 2009 5 5 557 575

2-s2.0-70449625151

10.1080/15501320802300123

10.

Chen

Y. L.

Lai

H. P.

A fuzzy logical controller for traffic load parameter with priority-based rate in wireless multimedia sensor networks

Applied Soft Computing 2014 14 594 602

10.1016/j.asoc.2013.08.001

11.

Tseng

Y.-C.

Pan

M.-S.

Tsai

Y.-Y.

Wireless sensor networks for emergency navigation

Computer 2006 39 7 55 62

2-s2.0-33746369139

10.1109/MC.2006.248

12.

Deligiannakis

Kotidis

Roussopoulos

Processing approximate aggregate queries in wireless sensor networks

Information Systems 2006 31 8 770 792

2-s2.0-33748704250

10.1016/j.is.2005.02.001

13.

Fan

Y.-C.

Chen

A. L. P.

Efficient and robust sensor data aggregation using linear counting sketches

Proceedings of the 22nd IEEE International Parallel and Distributed Processing Symposium (IPDPS '08)

April 2008

1 12

2-s2.0-51049105748

10.1109/IPDPS.2008.4536265

14.

Tan

Xing

G. L.

Yuan

Z. H.

Liu

Yao

J. G.

System-level calibration for data fusion in wireless sensor networks

ACM Transactions on Sensor Networks 2013 28 93 1 28

15.

Elias

Optimal design of energy-efficient and cost-effective wireless body area networks

Ad Hoc Networks 2014 13, part B 560 574

10.1016/j.adhoc.2013.10.010

16.

Chen

Y.-L.

Lin

J.-S.

Energy efficiency analysis of a chain-based scheme via intra-grid for wireless sensor networks

Computer Communications 2012 35 4 507 516

2-s2.0-84856415019

10.1016/j.comcom.2011.12.002

17.

Liu

Zhang

Opportunity-based topology control in wireless sensor networks

IEEE Transactions on Parallel and Distributed Systems 2010 21 3 405 416

2-s2.0-76749154972

10.1109/TPDS.2009.57

18.

Zuniga

Krishnamachari

Analyzing the transitional region in low power wireless links

Proceedings of the 1st Annual IEEE Communications Society Conference on Sensor and Ad Hoc Communications and Networks (IEEE SECON '04)

October 2004

517 526

2-s2.0-20344378689

19.

Wang

T. C.

Qin

X. L.

Liu

An energy-efficient and scalable secure data aggregation for wireless sensor networks

International Journal of Distributed Sensor Networks 2013 2013 11

843485

10.1155/2013/843485

20.

Wan

P.-J.

Huang

S. C.-H.

Wang

Wan

Jia

Minimum-latency aggregation scheduling in multihop wireless networks

Proceedings of the 10th ACM International Symposium on Mobile Ad Hoc Networking and Computing (MobiHoc '09)

May 2009

185 194

2-s2.0-70450177431

10.1145/1530748.1530773

21.

Varshney

P. K.

Distributed Detection and Data Fusion 1996

Springer

22.

Heinzelman

W. R.

Chandrakasan

Balakrishnan

Energy-efficient communication protocol for wireless microsensor networks

Proceedings of the 33rd Annual Hawaii International Conference on System Siences

January 2000

223

2-s2.0-0033877788

23.

Guangyan

Xiaowei

Jing

Dynamic minimal spanning tree routing protocol for large wireless sensor networks

Proceedings of the 1st IEEE Conference on Industrial Electronics and Applications (ICIEA '06)

May 2006

Singapore

1 5

2-s2.0-42749099671

10.1109/ICIEA.2006.257220

24.

Muruganathan

S. D.

D. C. F.

Bhasin

R. I.

Fapojuwo

A. O.

A centralized energy-efficient routing protocol for wireless sensor networks

IEEE Communications Magazine 2005 43 3 S8 S13

2-s2.0-17744376200

10.1109/MCOM.2005.1404592

25.

Fazio

de Rango

Sottile

A new interference aware on demand routing protocol for vehicular networks

Proceedings of the International Symposium on Performance Evaluation of Computer and Telecommunication Systems (SPECTS '11)

June 2011

98 103

2-s2.0-80052645976

26.

S. M.

Research on the Key Technology of Opportunistic Routing in Multi-Radio Multi-Channel Wireless Mesh Network 2013

Hunan, China

Hunan University

27.

Bernstein

Elements of Statistics Ii: Inferential Statistics 2004 1st

Columbus, Ohio, USA

McGraw-Hill

28.

Fischer

A History of the Central Limit Theorem: From Classical to Modern Probability Theorem 2011 1st

New York, NY, USA

Springer

29.

Shen

Xie

S. Q.

Pan

C. Y.

Probability Theory & Mathematical Statistics 2005

Beijing, China

Higher Education Press

30.

Lindsey

Raghavendra

Sivalingam

K. M.

Data gathering algorithms in sensor networks using energy metrics

IEEE Transactions on Parallel and Distributed Systems 2002 13 9 924 935

2-s2.0-0036766616

10.1109/TPDS.2002.1036066

Research on Reliability-Oriented Data Fusaggregation Algorithm in Large-Scale Probabilistic Wireless Sensor Networks

Abstract

1. Introduction

Definition 1 (data fusaggregation).

2. Network Model

2.1. Assumptions of the Model

2.2. Problem Definition

Definition 2 (ε-estimate).

Definition 3 (reliability).

Definition 4 (η).

Step 1.

Step 2.

Step 3.

Step 4.

Definition 5 (unbiased estimate).

Definition 6 (delay).

3. Mathematic Foundations

3.1. Relationship between q and Distance d

3.2. Estimator of Sum

Theorem 7.

Proof.

4. Calculating η of Sum ^ ( S t )

Theorem 8.

Proof.

Theorem 9.

Proof.

5. Validity of RODFA

6. Discussing the Factors Affecting RODFA

7. Conclusions

Footnotes

Conflict of Interests

Acknowledgment

References

4. Calculating η of $\hat{Sum} (S_{t})$