Energy-Efficient Node Selection Algorithms with Correlation Optimization in Wireless Sensor Networks

Abstract

The sensing data of nodes is generally correlated in dense wireless sensor networks, and the active node selection problem aims at selecting a minimum number of nodes to provide required data services within error threshold so as to efficiently extend the network lifetime. In this paper, we firstly propose a new Cover Sets Balance (CSB) algorithm to choose a set of active nodes with the partially ordered tuple (data coverage range, residual energy). Then, we introduce a new Correlated Node Set Computing (CNSC) algorithm to find the correlated node set for a given node. Finally, we propose a High Residual Energy First (HREF) node selection algorithm to further reduce the number of active nodes. Extensive experiments demonstrate that HREF significantly reduces the number of active nodes, and CSB and HREF effectively increase the lifetime of wireless sensor networks compared with related works.

1. Introduction

A wireless sensor network consists of spatially sensor nodes which are generally self-organized and connected by wireless communications [1]. Today such networks are used in many industrial and consumer applications, such as traffic data collection, vehicular monitoring and control, security surveillance, and smart homes. Each sensor node is equipped with a sensing device which can detect the environmental condition. The nodes are also powered by limited batteries and it is difficult or impossible to replace them in some special environments. It is why energy efficiency is always the most important criterion for such networks. One important approach to extend the network lifetime is to reduce the number of required packet transmissions in the network [2–5], such as clustering [6–11], in-network data aggregation [12–18], and approximate data collection [19, 20]. In these scenarios, all nodes in the network are considered active and the data are gathered from all nodes during the collecting process.

However, it is not an efficient way to collect all raw data from each node in some special applications which aim to collect information originated from the environment, such as temperature, humidity, and pressure. In these applications, it is fully tolerant if the final collected information is just within error threshold. The sensing data of each node is generally a noise version of the observed phenomenon and there is a deviation among them due to distance, location, or node sensitivity. Nodes are generally correlated if they are observing the same physical phenomena. Correlations between nodes are described in some simple ways such as the maximum or minimum value between nodes [21]. In this paper, correlation occurs if the sensing data of simple node can be obtained from the other nodes. Accordingly, a subset of active nodes can be selected to provide the required sensing service within error threshold, and the rest nodes can go to sleep and preserve energy. In this way, the active node selection strategy with correlation optimization not only prolongs the network lifetime, but also helps to solve other issues in dense wireless sensor networks [22], such as lower network throughput, serious node conflict, and excessive packet transmissions.

How to describe the correlation among the sensing data quantitatively is the key issue when achieving an efficient active node selection strategy. The distance function is generally considered as an important model to formulate the data similarity between nodes because the sensitivity is sometime related to the distance between the source and sensing device. Here we adopt Manhattan distance between sensing data as error metric [22]. Based on the observation that the sensing data are similar to each other if they are close enough, Kotidis [23] proposes Snapshot query in which only selected active nodes report their sensing data, and sensing data of one-hop nodes is computed by active nodes. Liu et al. [24] propose an EEDC algorithm which divides the nodes into disjoint cliques based on spatial correlation so that the nodes in the same clique have similar sensing data and can communicate directly with each other. Hung et al. [22] propose a DCglobal algorithm to determine a set of active nodes with high energy levels and wide data coverage ranges.

Figures 1(a) and 1(b) show the selected nodes with EEDC and DCglobal for a given wireless sensor network, where each circle denotes one node (the sensing data value is marked above the circle). The edge between a pair of nodes denotes that they can communicate directly with each other. Here we assume that Manhattan distance is used as the similarity function and the error threshold is 0.5. The selected active nodes are marked with black solid circle. The selected active node set with EEDC is ${s_{1}, s_{2}, s_{5}, s_{6}, s_{8}, s_{9}, s_{12}}$ , and it is ${s_{1}, s_{5}, s_{10}, s_{13}}$ with DCglobal. According to Figure 1, the number of selected nodes with DCglobal is smaller than EEDC in this example.

Figure 1

An example to demonstrate different algorithms. (a) EEDC, (b) DCglobal, (c) CSB, and (d) HREF.

The concept of data coverage range is firstly introduced to describe the correlation among nodes and defined as a node set in which the distance between each element and the given node is within the error threshold [22]. In fact, it is a simple extension of one-hop data coverage [23]. Another issue of [22] is the efficiency of proposed node selection algorithm. The partially ordered tuple (residual energy, data coverage range) is used to select an active node set, which ensures that the selected nodes always have high reserved energy, but the number of selected active nodes is not minimized.

To address these problems, we introduce several new concepts, that is, cover set, active node, and covered node, and propose a new Cover Sets Balance algorithm (CSB) to choose a set of active nodes with wide data coverage range and high energy level by using the partially ordered tuple (data coverage range, residual energy) and build the corresponding cover set in sequence to ensure the selected active nodes have high residual energy. In this way, the set of final selection nodes generally owns larger residual energy and smaller size, which helps to extend the network lifetime. Figure 1(b) demonstrates the set ${s_{1}, s_{5}, s_{10}, s_{13}}$ generated by DCglobal assuming that reserved energy is identical to all nodes in the network, which is similar to the partially ordered tuple (data coverage range, residual energy). Figure 1(c) demonstrates the result as ${s_{3}, s_{5}, s_{10}, s_{13}}$ with the proposed CSB algorithm in this paper. ${s_{1}, s_{7}, s_{8}}$ is a cover set for node $s_{3}$ and each node in the set is a feasible candidate regarding $s_{3}$ .

In the following we show some nodes can be further removed from the selected active node set with CSB. As shown in Figure 1(c), the sensing data of $s_{3}$ , $s_{5}$ , $s_{10}$ is 35.5, 36.1, and 34.5, respectively, the average value of $s_{5}$ and $s_{10}$ is 35.3. The Manhattan distances between 35.3 and sensing data $s_{1}$ , $s_{3}$ , $s_{7}$ , and $s_{8}$ are 0.1, 0.2, 0.1, and 0.3 accordingly (they are all less than the error threshold 0.5). It means that the sensing data of ${CS}_{3} + {s_{3}}$ is computed by $s_{5}$ and $s_{10}$ . Accordingly, $s_{3}$ is removed and then we have a smaller active node set ${s_{5}, s_{10}, s_{13}}$ , as shown in Figure 1(d). Following this observation, we introduce a novel concept Correlated Node Set (CNS) and then we propose a High Residual Energy First (HREF) node selection algorithm to reduce the number of active nodes. The main contributions of this paper are as follows. (i)

We propose a Cover Sets Balance algorithm (CSB) to select a set of active nodes with wide data coverage ranges and high energy levels. In each active node selection step, we use the partially ordered tuple (data coverage range, residual energy) to find an initial active node set and then balance the size of the cover sets in order to replace low-energy nodes.

(ii)

We propose a Correlated Node Set Computing algorithm (CNSC) to calculate the correlated node set with minimum set size and maximum geometric mean of residual energy of each node in the sensor network by following the observation that some nodes selected by CSB can be further removed.

(iii)

We propose a High Residual Energy First algorithm (HREF) to reduce the number of active nodes selected with CSB by removing nodes which can be computed by correlated node sets.

The rest of this paper is organized as follows. In Section 2, we describe the system model. Section 3 introduces CSB and HREF algorithms. The theoretical analysis of the algorithms is proposed in Section 4. In Section 5, we describe the simulation results and performance analysis. Section 6 presents the related works and Section 7 is conclusion.

2. System Model

A wireless sensor network generally consists of a set of stationary nodes $V = {s_{1}, s_{2}, \dots, s_{n}}$ , and each node in the network has identical transmission radius r. The network is formulated as an undirected graph $G = (V, E)$ with V as the set of nodes and E as the set of links. Without loss of generality, both $s_{i}$ and i are used to represent one single node in the network. There is a link $(i, j)$ between node i and j if they communicate with each other directly.

The nodes are equipped with unreplaceable or unrechargeable batteries. The reserved energy for node i at time t is denoted by $e_{i} (t)$ . The collected data from one single node is a noise version of the practical phenomena. In these applications, the collected information from the sensor network is tolerant in case that it is within a given error threshold ɛ.

The notations used in this work are listed as the following:

$V :$ Set of nodes in the network

n: Number of nodes in the network

ε: One given error threshold

$x_{i} (t)$ : Sensing data of i at time t

$d (x_{i} (t), x_{j} (t))$ : Distance between the sensing data of i and j

${Energy}_{i} (t)$ : Residual energy of i at time t

${DCR}_{i}$ : Data coverage range of i

${CS}_{i}$ : Cover set of i

${CNS}_{i}$ : Correlated node set of i

$ANS$ : Active node set

r: Transmission radius

$m a x d$ : Maximal number of nodes in a correlated node set

${Event}_{j} (t)$ : Value of event j at time t

Interval: Interval to reselect a new active node set.

The correlation among sensing data especially in a dense wireless sensor network is helpful to extend the network lifetime. Some researchers studied the correlation between nodes and provided some models [25]. Among all these models, it is common to adopt distance function, such as Manhattan distance $d (x_{i} (t), x_{j} (t))$ to represent the correlation between sensing data $x_{i} (t)$ and $x_{j} (t)$ at time t [22], which is represented as $d (x_{i} (t), x_{j} (t)) = | x_{i} (t) - x_{j} (t) |$ . Without loss of generality, we follow this correlation model in this paper. Note that our algorithms are adapted to any other correlation models with minor modification.

The sensing data of $s_{j}$ is called to be computed with the sensing data of $s_{i}$ if the sensing data has a high correlation level; that is, $d (x_{i} (t), x_{j} (t)) \leq ε$ , where ɛ is the given error threshold. $S_{j}$ is also in the data coverage range of $s_{i}$ . The definitions are given as follows.

Definition 1 (data coverage range (DCR)).

Given an error threshold ε in the sensor network, the data coverage range ${DCR}_{i}$ of i is a subset of V, in which Manhattan distance from each node to i is no more than ε, and $i \notin {DCR}_{i}$ .

For the example in Figure 1(b), $ε = 0.5$ , ${DCR}_{1} = {s_{3}, s_{6}, s_{7}, s_{8}}$ , ${DCR}_{3} = {s_{1}, s_{4}, s_{7}, s_{8}}$ , and so on.

Definition 2 (active node set (ANS) and active node).

Given a sensor network $G = (V, E)$ , an active node set (ANS) is a subset of V, in which each $i \in V$ either belongs to ANS or one data coverage range ${DCR}_{j}$ , where $j \in ANS$ . Any node in ANS is named as an active node.

For the example in Figure 1(c), $ANS = {s_{3}, s_{5}, s_{10}, s_{13}}$ and each node in the set is an active node.

Definition 3 (cover set (CS) and covered node).

Given a sensor network $G = (V, E)$ and according ANS, the cover set ${CS}_{i}$ for any given $i \in ANS$ is a subset of ${DCR}_{i}$ , and ${CS}_{i} \cap {CS}_{j} = \emptyset$ in case $i \neq j$ . Any node in ${CS}_{i}$ is named as a covered node.

For the example in Figure 1(c), ${CS}_{3} = {s_{1}, s_{7}, s_{8}}$ , $s_{3}$ is the active node, and $s_{1}$ , $s_{7}$ , and $s_{8}$ are covered nodes.

Sensor data is affected by the events in monitored region, and the influence of each event on a sensor is inversely proportional to their distance. Here we assume that correlation occurs among all active nodes in the sensor network.

Definition 4 (correlated node set ( $CNS$ )).

Given a sensor network $G = (V, E)$ and its corresponding $ANS$ , the correlated node set ${CNS}_{i}$ for $i \in ANS$ is a subset of $ANS$ , and the arithmetic mean $\bar{s}$ for sensing data of nodes in ${CNS}_{i}$ satisfies the error threshold condition; that is, $d (x_{j} (t), \bar{s}) \leq ε$ , $j \in {CS}_{i} + {i}$ . The sensing data of ${CS}_{i} + {i}$ is said to be computed by ${CNS}_{i}$ .

For the example in Figure 1(d), ${CNS}_{3} = {s_{5}, s_{10}}$ , $\bar{s} = 35.3$ , $d (x_{1}, \bar{s}) = 0.2 \leq ε$ , $d (x_{7}, \bar{s}) = 0.1 \leq ε$ , $d (x_{8}, \bar{s}) = 0.3 \leq ε$ , $d (x_{3}, \bar{s}) = 0.2 \leq ε$ .

Definition 5 (CNS computing problem).

Given a sensor network $G = (V, E)$ , ANS, sensing data $X = {x_{1}, x_{2}, \dots, x_{n})$ , cover sets $CS = ({CS}_{1}, {CS}_{2}, \dots, {CS}_{n})$ , and reserved energy $Energy = {e_{1}, e_{2}, \dots, e_{n}}$ , the CNS computing problem is to find a correlated node set ${CNS}_{i}$ for node $i \in ANS$ , and the size of $| {CNS}_{i} |$ is minimized while the geometric mean of residual energy $\hat{e}$ is maximized, where $\hat{e} = \sqrt[n]{e_{1} e_{2} \dots e_{n}}$ .

Note that we adopt the geometric average of the residual energy in the correlated node set by following the observation that the average geometric averaging gives higher results for lower variations in the data values for a given data set with a fixed arithmetic [26].

Definition 6 (active node selection problem).

Given a sensor network $G = (V, E)$ , the sensing data $X = {x_{1} (t), x_{2} (t), \dots, x_{n} (t)}$ , reserved energy levels $Energy (t) = {e_{1} (t), e_{2} (t), \dots, e_{n} (t)}$ at time t and given threshold ɛ, the active node selection problem is to select a set of active nodes $ANS (t)$ at time t, where all sensing data in the network can be computed by their corresponding active nodes, and the network lifetime is maximized, that is, $\max {t}$ .

The active node selection problem is to find the active node set during each epoch and aim at maximizing the network lifetime. The problem is proven to be NP-hard by mapping it to the set covering problem or minimum dominating set problem [26–28]. In this paper, we design two heuristic algorithms, namely, CSB and HREF for this problem.

3. Heuristic Algorithms

3.1. CSB Algorithm

Most related works use the concept of data coverage range combined with energy to solve the active node selection problem. In this section we illustrate the Cover Sets Balance algorithm (CSB) based on the idea of data coverage range.

In data collection process, only active nodes are required to provide perception service, and the rest nodes are closed to preserve energy. An intuitive approach for the node selection process is to use the partially ordered tuple (data coverage range, residual energy) [23]. Another approach is to use partially ordered tuple (residual energy, data coverage range) to select active nodes with higher residual energy [22]. However, the number of selected nodes is generally larger than the former approach, which means that more energy consumption is necessary when providing perception service during the given epoch. Obviously, we need a balance between the two metrics, that is, the data coverage range and residual energy.

The basic idea of Cover Sets Balance (CSB) algorithm is described as the following: (1) generate an initial active node set and the corresponding cover sets through the previous data coverage range priority strategy; (2) replace active nodes with high-energy candidates. Note that the candidates must cover all nodes within the same cover set. For example, in Figure 1(b), ${CS}_{1} = {s_{3}, s_{6}, s_{7}, s_{8}}$ , $C S_{5} = {s_{2}, s_{4}}$ , ${CS}_{10} = {s_{11}}$ , and ${CS}_{13} = {s_{9}, s_{12}}$ . It is seen that $s_{1}$ covers four nodes, namely, $s_{3}, s_{6}, s_{7}$ , and $s_{8}$ , and thus the candidate node for $s_{1}$ must cover the above four nodes too. However, if the cover set is too large, it is possible to find no candidate nodes. The case is similar when the cover set is too small. Obviously we need a new method to provide more candidate nodes so that the network lifetime is extended in an efficient way.

We adopt a cover set balance strategy to balance the set size by moving nodes from larger cover sets to smaller ones. The initial cover sets are sequenced in descending order of the set size, and then we check nodes in one cover set and try to move them to another with smaller size. This process continues until all sets are checked and finally they are balanced. This strategy is helpful to increase the number of candidate nodes with higher residual energy by cutting down the maximal deviation of each cover set in the balance progress.

The final step of the CSB algorithm is to replace the selected active nodes with candidates by order of reserved energy. In this way, we finally build an active node set with the same size as its initial version but higher residual energy, which is helpful to extend the network lifetime.

The CSB algorithm can be divided into three processes and pseudocodes are shown in Algorithm 1.

Algorithm 1: Pseudocodes for Cover Set Balance (CSB) algorithm.

Input: G = (V, E), ɛ , X = ${x_{1}, x_{2}, \dots, x_{n}}$ , Energy = ${e_{1}, e_{2}, \dots, e_{n}}$ ;

Output: ANS, CS.

(1) //Initialization_Process ( )

(2) Calculate DCR = {DCR₁, DCR₂, …, ${DCR}_{n}}$

(3) Set the state of all nodes as Un-Covered;

(4) Sort nodes into sequence with partially ordered tuple $〈 data coverage range, residual energy 〉$ ;

(5) ANS ←∅, CS ←∅;

(6) for one maximal i in the sequnce with state as Un-Covered

(7) ANS ← ${i}$ + ANS, and set i as Primary-Covered;

(8) for each j∈ ${DCR}_{i}$ − ANS with state as Un-Covered

(9) ${CS}_{i} \leftarrow {j} + {CS}_{i}$ , and set j as Primary-Covered;

(10) end for

(11) end for

(12) //Cover_Set_Balance_Process ( )

(13) Sort CS into a sequence with decreasing order of the set size;

(14) for each ${CS}_{i}$ in the sequence

(15) Sort nodes in ${CS}_{i}$ with decreasing order of their deviation to i;

(16) for each j in the sequence

(17) find out all k which satisfies $j \in {CS}_{k}$ and $| {CS}_{k} | < | {CS}_{i} |$ , select k with minimal cover set size;

(18) ${CS}_{i} \leftarrow {CS}_{i} - {j}$ , ${CS}_{k} \leftarrow {CS}_{k} + {j}$ ;

(19) end for

(20) end for

(21) //Node_Replace_Process ( )

(22) for each $i \in ANS$

(23) for each $j \in {CS}_{i}$

(24) if $d (x_{j}, x_{k}) \leq ε$ for any $k \in {CS}_{i} + {i} - {j}$ , then mark j as a candidate of i;

(25) end for

(26) select a node m from all candidates of i with maximal residual energy $e_{i}$ ;

(27) ANS ← ANS + ${m} - {i}$ ; ${CS}_{m} = {CS}_{i} + {i} - {m}$ ;

(28) end for

The Initialization_Process is used to build a primary active node set and corresponding cover sets. The basic steps are described as follows. There are two different states for each node in the network, namely, Primary-Covered and Un-Covered, which are used to mark whether it is within the cover set of one node in the active node set. The states for all nodes are initialized as Un-Covered (Line 3). Then we sort nodes with partially ordered tuple (data coverage range, residual energy) and initialize the active node set as empty set (Line 4-5). Finally, we check nodes in sequence with state as Un-Covered, and add them into the active node set if the required conditions are satisfied (Line 6–11).

The Cover_Set_Balance_Process aims at balancing the size of cover sets generated with the Initialization_Process. Firstly, the cover sets are ordered and checked accordingly to their set size (Line 13). Secondarily, nodes in a given cover set ${CS}_{i}$ are sorted into a sequence with descending order of their deviation to i (Line 15), and they are moved to another cover set with smaller size (Line 16–19). This process continues until all nodes in the cover set are checked (Line 14–20).

The Node_Replace_Process focuses on nodes exchange by replacing the low-energy active nodes with high-residual-energy candidates. All feasible candidate nodes of i are checked (Line 23–25), and we select the one (marked as m) with maximal residual energy among all these candidates (Line 26). Finally, the active node set is updated as well as the cover set for node m (Line 27).

The CSB algorithm follows the idea of replacing the active nodes with candidates with higher residual energy. However, it has the same number of active nodes compared with the approach which only uses the partially ordered tuple (data coverage range, residual energy). In the following we introduce a new HREF algorithm to further reduce the number of active nodes based on CNSC algorithm.

3.2. HREF Algorithm

We first introduce an algorithm for the CNS computing problem and then propose a High Residual Energy First node selection algorithm (HREF) for the active node selection problem.

3.2.1. CNSC Algorithm

The CNS computing problem is to find one subset ${CNS}_{i}$ for $i \in ANS$ and aim at minimizing ${CNS}_{i}$ as well as maximizing the geometric mean of residual energy $\hat{e} = \sqrt[n]{e_{1} e_{2} \dots e_{n}}$ . To find out the optimal ${CNS}_{i}$ , an intuitive way is to calculate the average value of sensing data for each subset ANS stored in a sequenced list L. Then, pick out the average values whose deviation is no more than ε and the corresponding correlated node set in the list. Finally, select the ${CNS}_{i}$ with minimized node set and maximum geometric mean of residual energy as the final correlated node set for i. Obviously, the above solution is to find the optimal result but has exponential time complexity $(O (2^{| ANS |}))$ .

To reduce the time complexity, we assume each ${CNS}_{i}$ has at most $m a x d$ nodes, where $m a x d$ is a given value depending on the network environment. The CNS computing problem is then converted to the problem of selecting at most $m a x d$ number of nodes in ANS within the error threshold. Then, calculate each subset combined with the selected $m a x d$ nodes and add its average value into L with the following iteration process: in the ith iteration, the average value for each subset of ${x_{1}, x_{2}, \dots, x_{i}}$ is calculated based on the average value of subset of ${x_{1}, x_{2}, \dots, x_{i - 1}}$ . There are two basic operations in the iteration process, namely, $(L + x)$ and merge $L [L, L + x]$ . $(L + x)$ represents the new list by adding x into each element in the initial sequence L, as shown in Form. (1); and merge $L [L, L + x]$ represents the ordered list for the combined result of L and $(L + x)$ :

\begin{matrix} L + x = {\frac{L (i) \times L_count (i) + x}{L_count (i) + 1} ∶ i \in L}, \end{matrix}

(1)

where

L (i)

denotes the ith data in L, and L_count

(i)

denotes number of nodes from which the average value is calculated.

Here we demonstrate an example to illustrate the two basic operations. Let $L = {0, 36.1, 34.5, 35.3}$ , and $L_count = {0, 1, 1, 2}$ . Then, $L + 36.9 = {(0 \times 0 + 36.9) / (0 + 1)$ , $(36.1 \times 1 + 36.9) / (1 + 1)$ , $(34.5 \times 1 + 36.9) / (1 + 1)$ , $(35.3 \times 2 + 36.9) / (2 + 1)} = {36.9, 36.5, 35.7, 35.83}$ . And merge $L [L, L + 36.9] = {0, 36.1, 34.5, 35.3} + {36.9, 36.5, 35.7, 35.83} = {0, 34.5, 36.1, 35.3, 36.9, 36.5, 35.7, 35.83}$ , $L_count = {0, 1, 1, 2} + {1, 2, 2, 3}$ $=$ ${0, 1, 1, 2, 1, 2, 2, 3}$ .

In the following we illustrate the CNS computing process for $s_{3}$ in Figure 1(c) by assuming that the residual energy is identical to all. The input for the CNS computing problem is described as ${CS}_{3} = {s_{1}, s_{3}, s_{7}, s_{8}}$ , $ANS - {s_{3}} = {s_{5}, s_{10}, s_{13}}$ , and $X = {36.1, 34.5, 36.9}$ . Initially, $L = {0}$ , $L_count = {0}$ , and the corresponding set list as ${{\emptyset}}$ . (1)

Consider the sensing data 36.5 of $s_{5} ∶ L = {0, 36.1}$ , $L_count = {0, 1}$ , and the corresponding set list as ${{\emptyset}, {s_{5}}}$ ;

(2)

consider the sensing data 34.5 of $s_{10} ∶ L = {0, 36.1} + {34.5, 35.3} = {0, 36.1, 34.5, 35.3}$ , $L_count = {0, 1, 1, 2}$ , and the corresponding set list as ${{\emptyset}, {s_{5}}, {s_{10}}, {s_{5}, s_{10}}}$ ;

(3)

consider the sensing data 36.9 of $s_{13} ∶ L = {0, 36.1, 34.5, 35.3}$ + ${36.9, 36.5, 35.7, 35.83}$ $=$ ${0, 34.5, 36.1, 35.3, 36.9, 36.5, 35.7, 35.83}$ , $L_count = {0, 1, 1, 2, 1, 2, 2, 3}$ , and the set list as ${{\emptyset}, {s_{5}}, {s_{10}}, {s_{5}, s_{10}}, {s_{13}}$ , ${s_{5}, s_{13}}, {s_{10}, s_{13}}$ , ${s_{5}, s_{10}, s_{13}}}$ .

The deviation between 35.3 and the sensing data of nodes in set ${CS}_{3} + {s_{3}}$ is no more than 0.5, and it is similar to 35.7. Accordingly, the corresponding correlated node sets are ${s_{5}, s_{10}}$ and ${s_{5}, s_{10}, s_{13}}$ located at the 4th and 7th positions in L. Finally, ${CNS}_{3} = {s_{5}, s_{10}}$ followed by $| {s_{5}, s_{10}} | < | {s_{5}, s_{10}, s_{13}} |$ .

Algorithm 2 provides the pseudocodes for CNSC algorithm.

Algorithm 2: Pseudocodes for CNSC algorithm.

Input: ANS, ɛ, CS = ${{CS}_{1}, {CS}_{2}, \dots, {CS}_{n}}$ , $X = {x_{1}, x_{2}, \dots, x_{n}}$ , Energy = ${e_{1}, e_{2}, \dots, e_{n}}$ ;

Output: ANS.

(1) for each i in ANS

(2) ${CNS}_{i} = \emptyset$ ;

(3) for each maxd nodes in ANS

(4) placed them in node_vector;

(5) $L [0] = {0}$ ;

(6) for j = 1 to $| n o d e_v e c t o r |$ , L = mergeL $(L, L + x_{n o d e_v e c t o r (j)})$ ;

(7) for each l in $L [i]$

(8) for each $k \in {CS}_{i} + {i}$ if $| l - x_{k} | \leq ε$ then temp = L_pos(l);

(9) for k = $| n o d e_v e c t o r |$ − 1 to 0

(10) if temp > $2^{k}$ and temp ≤ $2^{(k + 1)}$

(11) Dset = Dset + {the kth node in node_vector}, temp = temp – $2^{k}$ ;

(12) end if

(13) end for

(14) if ( $| D s e t |$ < $| {CNS}_{i} |$ ) or ( $| D s e t |$ = $| {CNS}_{i} |$ and ê( $| D s e t |$ ) > ê( $| D s e t |$ ), then ${CNS}_{i}$ ←Dset;

(15) end for

(16) end for

(17) end for

3.2.2. HREF Algorithm

For a given $i \in ANS$ , its sensing data is computed with the nodes in ${CNS}_{i}$ , which makes it possible to shut off to preserve energy. The basic idea of the HREF algorithm is described as follows: (1) build the active node set ANS with CSB algorithm; (2) for each $i \in ANS$ , calculate its correlated node set ${CNS}_{i}$ ; (3) remove certain active nodes from ANS. The pseudocodes are shown in Algorithm 3.

Algorithm 3: Pseudocodes for HREF algorithm.

Input: G = (V, E), ɛ , $X = {x_{1}, x_{2}, \dots, x_{n}}$ , Energy = ${e_{1}, e_{2}, \dots, e_{n}}$ ;

Output: ANS.

(1) Run CSB algorithm to obtain the initial ANS and CS;

(2) Run CNSC algorithm to obtain CNS;

(3) Mark all nodes in ANS as Un-Completed;

(4) Sort CNS with increasing order of their set size;

(5) for one minimal ${CNS}_{i}$ in the sequence with ${CNS}_{i} \subseteq ANS$

(6) if ${CNS}_{i} \neq \emptyset$ , and the state of i is Un-Completed, then

(7) ANS ← ANS − ${i}$ ;

(8) for each j∈ ${CNS}_{i}$ , mark j as Completed;

(9) end if

(10) end for

In Line 3, an active node set is generated with respect to the concept of data coverage range and corresponding correlated node set in ANS. Then we mark all active nodes as Un-Completed. There are two different states for each node in the active node set, namely, Completed and Un-Completed. In Line 4, we sort CNS with ascending order of their set size. In Line 5–10, we check whether if an active node can be removed from ANS and mark each node in ${CNS}_{j}$ as Completed.

4. Theoretical Analysis

Theorem 7.

The CSB and HREF algorithms correctly generate an active node set for a given wireless sensor network even in case that there are message losses.

Proof.

The cases with CSB and HREF are described as follows. (1)

Firstly, we prove that the sink node obtains all sensing data of the nodes in Closed state through the selected active node set. At the beginning of CSB and HREF, all nodes are active nodes. The state that whether one node is closed or not depending on the condition whether the sensing data can be fused by the corresponding correlated node set. In these algorithms, the node is removed from the active node set only in case the condition is satisfied. Thus it is sure that all sensing data can be obtained from nodes in ANS calculated via CSB and HREF.

(2)

Secondly, we prove that CSB and HREF correctly generate an active node set even in case that there are message losses. Note that our algorithms aim at shutting down certain nodes if they can be fused by other active nodes, which means that these nodes keep active if the above condition is not satisfied. It is obvious that the message losses never reduce the number of active nodes, and thus CSB and HREF correctly generate an active node set correctly in case of message losses.

Theorem 8.

The active node set size with CSB is at most $(1 + \log n) \times | O P T 1 |$ , where $O P T 1$ is the optimal active node set with respect to the concept of data coverage range and n is the number of nodes in sensor network.

Proof.

The active node selection problem with respect to the concept of data coverage range is essentially a set covering problems [27]. We regard the problem of selecting a smallest size of active node set as the problem of selecting the minimum size of subset in set-covering issue [22]. Similar to the greedy approximation algorithm of set covering problem, CSB also takes the greedy strategy to maximize the size of data coverage range for each new added active node. Let δ be the size of selected active node set with number of nodes $| OPT 1 |$ , and let ${{DCR}_{δ 1}, {DCR}_{δ 2}, \dots, {DCR}_{δ | OPT 1 |}}$ be the corresponding data coverage ranges of each active node. For each data coverage range ${DCR}_{δ i}$ , the maximal number of selected active nodes is at most $(1 + \log (| {DCR}_{δ i} |))$ with the above greedy strategy. The total number of selected active nodes is

\begin{array}{l} N_{1} \leq \sum_{1}^{| OPT 1 |} (1 + \log (| {CR}_{δ_{i}} |)) \\ \leq | OPT 1 | \times (1 + \log (\max_CR)), \end{array}

(2)

where

\max_DCR = \max {| {DCR}_{δ i} | ∣ i \in V}

Due to $\max_DCR \leq n$ , the size of the active node set with CSB is at most $(1 + \log n) \times | OPT 1 |$ .

Theorem 9.

The time complexity of CSB is $O (n^{2})$ .

Proof.

The CSB algorithm is divided into three processes as mentioned.

In the Initialization_Process, it is easy to know that the time complexity of obtaining all node's data coverage range is $O (n^{2})$ . The time complexity of sorting nodes with the partially ordered tuple is $O (n \log n)$ . The time complexity of selecting a node with maximal data coverage range in the sequence is $O (n)$ and the process runs $O (n)$ times. So the time complexity of Initialization_Process is $O (n^{2})$ .

In the Cover_Set_Balance_Process, the time complexity for each covered node to find the active node is $(n - | ANS |) \times (| ANS | - 1)$ , where $(n - | ANS |)$ denotes the number of covered nodes and $| ANS |$ denotes the number of active nodes. So the time complexity of the process is $O (n^{2})$ .

In the Node_Replace_Process, the progress of selecting the optimized candidate active node and replacing the low-energy node is carried out simultaneously, and the time complexity is $O (n)$ .

So the time complexity of CSB is $O (n^{2})$ .

Theorem 10.

The size of the active node set with HREF is at most $(1 + \log ((1 + \log n) \times | O P T 1 |)) \times | O P T 2 |$ , where $O P T 2$ is the optimal active node set and n is the number of nodes in the network.

Proof.

We adopt a greedy strategy HREF to solve the active node selection problem. The HREF is divided into two phrases: the first step is the CSB algorithm and the second phrase is to further reduce the number of active nodes selected by CSB.

Assume that the size of active node set with CSB is m. According to [28], the optimized number of active nodes has upper bound as $(1 + \log m) \times | OPT 2 |$ , where OPT2 is the optimal node set with respect to the concept of correlated node set and depends on OPT1. According to Theorem 8, the maximal number of active nodes selected by CSB is $m \leq (1 + \log n) \times | OPT 1 |$ . Then the upper bound for the number of active nodes selected by HREF is $(1 + \log ((1 + \log n) \times | OPT 1 |)) \times | OPT 2 |$ .

Theorem 11.

The time complexity of HREF is $O (n^{2} + m \times (\begin{smallmatrix} m \\ m a x d \end{smallmatrix}) \times 2^{m a x d} + m^{2})$ , where $m = (1 + \log n) \times | O P T 1 |$ is the maximal number of active nodes selected by CSB.

Proof.

The time complexity of HREF includes three different phases: the first step runs the CSB algorithm, the second step runs the CNSC algorithm, and the third step shuts down certain nodes. The time complexity for the first step is discussed above as $O (n^{2})$ . In the second step, each node $i \in ANS$ spends time $O ((\begin{smallmatrix} m \\ m a x d \end{smallmatrix}) \times 2^{m a x d})$ to compute an optimized correlated node set from all its correlated node sets, where $(\begin{smallmatrix} m \\ m a x d \end{smallmatrix})$ denotes the number of subsets and $2^{m a x d}$ denotes the time complexity of the sequence L. So all nodes totally cost $O (m \times (\begin{smallmatrix} m \\ m a x d \end{smallmatrix}) \times 2^{m a x d})$ to calculate their corresponding correlated node set. In the third step, the process of shutting down redundant active nodes runs $O (m^{2})$ times. Thus, the total time complexity of HREF is $O (n^{2} + m \times (\begin{smallmatrix} m \\ m a x d \end{smallmatrix}) \times 2^{m a x d} + m^{2})$ .

5. Simulation Results and Analysis

In this section, we demonstrate detailed simulation experiments to evaluate the actual performance of the above algorithms. Note that this paper focuses on the active node selection problem by exploiting correlations among nodes but has no concern with the aggregation operators or probabilistic models. We compare the proposed CSB and HREF algorithms with the DClocal, DCglobal [22], EEDC [24], and Snapshot [28] by running them in the same networks as well as the same parameters for the environment.

Here we adopt two main metrics for the algorithm performance, namely, the number of active nodes and the network lifetime. The number of active nodes is an important measurement since data coverage basically aims at minimizing the number of active nodes. We compare the related algorithms via this metric for a given data collection epoch. Meanwhile, the active node selection problem aims at maximizing the network lifetime, and thus network lifetime is adopted as the other metric for the performance comparison.

In this section, we first introduce the simulation environment, then compare the algorithms via the number of active nodes with different parameters, such as network size, error threshold, and number of events, and finally we compare them by the metric of network lifetime with different parameters as well as interval for each epoch.

5.1. Simulation Environment Setup

We adopt MATLAB as the platform tool which is popularly used in the simulation of wireless sensor networks. The network is set up by placing $| V |$ nodes in a random manner. The events are randomly deployed in the monitored region. The cost of information collection is assumed 0.1 units during each epoch.

We adopt the approach of generating synthetic sensor data on the monitored region. In the synthetic data set, h events are randomly generated as $Event = {{Event}_{1} (t), {Event}_{2} (t), \dots, {Event}_{h} (t)}$ and they are also randomly deployed in the monitored region. The sensing data for a given node is affected by these events which is inversely proportional to their distance. The initial data of each event is randomly selected from [20, 40]. The value of an event ${Event}_{i}$ at time t is formulated as ${Event}_{i} (t) = {Event}_{i} (t - i n t e r v a l) + Z$ where Z is a random variable that follows the normal distribution with mean 0 and variance 0.1, while ${Event}_{i} (0)$ is the initial value of the ith event. The data of node s at time t is computed by Formula (3):

\begin{matrix} x_{s} (t) = \sum_{i = 1}^{m} \frac{1 / (dist (s, {Event}_{i}))}{\sum_{j = 1}^{m} (1 / (dist (s, {Event}_{j})))} \times {Event}_{i} (t), \end{matrix}

(3)

where

dist (s, {Event}_{i})

denotes the square of the distance between node s and event

{Event}_{i}

and h denotes the number of events.

In this paper we focus on the node selection process and its impact on the network lifetime, while the routing/path selection are both ignored. Readers are guided to other works for details about these issues [29–31]. The default values for the simulation parameters are listed in Table 1.

Table 1

Default values for the simulation parameters.

Parameter description	Default value
Target area size	100 m × 100 m
Network size	200
The location of sink	(50, 50)
Transmission radius	20 m
Number of events	10
Error threshold	0.5
maxd	8
Initial energy of each node	100 units
Energy cost for sensing during each epoch	0.02 units
Energy cost for transmission during each epoch	0.03 units
Fraction of alive nodes	75%
Interval for reselecting a new active node set	80 epochs

5.2. Comparison of Number of Active Nodes

In this part, we compare the performance of our algorithms with related works by various parameters, including network size, error threshold, and the number of events.

5.2.1. Impact of Network Size

The network size is set from 100 to 500 with increment as 100, and the simulation result is demonstrated in Figure 2. It shows that the number of the selected active nodes ascends with the network size when the network size is smaller than 400. However, this trend is not obvious when the network size is large enough $(n = 500)$ . A certain number of active nodes are selected to perform the data collection process especially when the network is dense enough. This trend demonstrates the importance of active node selection with correlative optimization during the data collection process.

Figure 2

The impact of network size on the number of active nodes.

HREF always has better performance compared with CSB, as we can see from Figure 2. For example, the number of active nodes selected by HREF is only 80.91% of that by CSB in case that the network size is 300. It demonstrates that HREF is rather significant to reduce the active nodes by removing nodes which can be computed by the corresponding correlated node set with the help of CNSC algorithm.

In all cases, HREF and CSB have better performance compared with related algorithms, that is, EEDC, DCglobal, Snapshot, and DClocal. When $n = 300$ , the number of selected node is 15.05, 18.6, 20.75, 30.15, 28.35, and 34.65 with HREF, CSB, DCglobal, EEDC, Snapshot, and DClocal.

5.2.2. Impact of Error Threshold

The error threshold varies from 0.1 to 1.15 with increment as 0.15 in the simulations. As shown in Figure 3, the number of active nodes selected by HREF is lower than other algorithms in all cases. As the error threshold increases in the range of [0.1, 0.55], the number of active nodes decreases significantly. However, it is not obvious in the case that the error threshold is larger than 0.7. Hence, it is helpful to reduce the number of active nodes if a larger error threshold is tolerant in some applications.

Figure 3

The impact of error threshold on the number of active nodes.

5.2.3. Impact of the Number of Events

The number of events varies from 5 to 40 with increment as 5 and the simulation result is demonstrated in Figure 4. It shows that the number of selected active nodes is independent of the number of events by using the data computing Formula (2). It can be seen that HREF and CSB have better performance compared with related algorithms regardless of the number of events.

Figure 4

The impact of the number of events on the number of active nodes.

5.3. Comparison of Network Lifetime

There are variations of measurement for network lifetime [27], such as the first node to die, the number of alive nodes, and the fraction of alive nodes. The measurement with the first node to die is not a good measure metric in practical applications, especially in the dense-deployed wireless sensor networks. This is because the redundancy among correlated nodes is helpful to illuminate the defect of single-node failure. The definition based on fraction of alive nodes regards that the network is alive when the fraction of surviving nodes remains above a given threshold [32]. The network lifetime is defined in this paper as the time period during which the fraction of alive nodes remains above a given threshold and they are also connected.

To measure the network lifetime, we have to determine the relay nodes forwarding the sensing data from active nodes by constructing a minimum Steiner tree [33]. The nodes selected by the minimum Steiner tree construction step are called Steiner nodes. Note that the relay nodes do not need to sense data. In the following experiments, we compared the network lifetime of our algorithms to related algorithms in various environmental parameters.

5.3.1. Impact of Network Size

The network size is set from 100 to 500 with increment as 100, and the simulation result is demonstrated in Figure 5. It shows that the network lifetime increases along with the network size increasing. This is reasonable because the number of selected nodes might be independent on the network size. When there is enough data redundancy among the sensing data, more redundant nodes are used to extend the network lifetime, as shown in Section 5.2.1, HREF and CSB have better performance compared with related algorithms regardless of network size. Especially, our algorithm works better when the network size is larger than 200.

Figure 5

The impact of network size on the network lifetime.

The HREF has significant improvement on the network lifetime compared with CSB too. For example, the lifetime has about 18.19% increment compared with CSB in case that the network size is 300. It is reasonable since we adopt not only node reduction but also node replacement strategies which are rather helpful to enlarge the network lifetime.

5.3.2. Impact of Error Threshold

The error threshold varies from 0.1 to 1.15 with increment as 0.15 in the simulations. As shown in Figure 6, the network lifetime increases along with the error threshold increasing. HREF and CSB have better performance compared with the related algorithms, that is, EEDC and DCglobal. CSB has a better performance compared with DCglobal especially when the error threshold is larger than 0.4. The network lifetime of HREF algorithm is longer than the other algorithms in all cases.

Figure 6

The impact of error threshold on the network lifetime.

5.3.3. Impact of Interval

The value of interval varies from 20 to 160 with increment as 20 in the simulations. In Figure 7, the network lifetime increases along with the interval when it is smaller than 80. However, this trend slows down when interval is large than 80. It means that it benefits to extend the network lifetime if a larger interval is tolerant in some applications. In addition, HREF and CSB have better performance compared with related algorithms regardless of the interval.

Figure 7

The impact of interval on the network lifetime.

5.3.4. Impact of Fraction of Alive Nodes

The fraction of alive nodes varies from 0.6 to 1 with increment as 0.05 in the simulations. In Figure 8, the network lifetime decreases along with the fraction of alive nodes increasing. The HREF has better performance compared with related algorithms. The network lifetime of CSB is longer than that of DCglobal when the fraction of alive nodes is smaller than 0.95. However the case changes when the fraction of alive nodes is larger than 0.95. This is because CSB balances between the data coverage range priority and the energy priority. As the data coverage range priority prefers to select nodes with larger data coverage ranges, these nodes with lower energy might be selected as well, which results in rapid node failure and a dying network. The similar conclusion is drawn in Section 3.1. However, as the measurement of the first node to die is not suitable measure metric for network lifetime evaluation in practical applications, the CSB is still better than DCglobal in this case.

Figure 8

The impact of the fraction of alive nodes on the network lifetime.

5.3.5. Impact of Number of Events

The number of events varies from 5 to 40 with increment as 5 and the simulation result is demonstrated in Figure 9. It shows that network lifetime is independent of the number of events. However, HREF and CSB have better performance compared with related algorithms regardless of the number of events.

Figure 9

The impact of the number of events on the network lifetime.

6. Related Works

Energy efficiency is a critical design consideration in battery powered and densely deployed wireless sensor networks, which can be achieved by minimizing the number of messages transmitted during the data collection process. Related works include clustering, network coding, in-network data aggregation, and approximate data collection.

Clustering is proven to be an effective approach to provide better data aggregation and scalability for large wireless sensor network [6–11]. Recently, Aslam et al. [7] propose a novel multicriterion optimization technique based on energy-efficient clustering approach. This method takes multiple individual metrics as inputs in the cluster head selection process and simultaneously optimizes the energy efficiency of each individual node as well as the overall system. Karaboga et al. [8] propose an energy-efficient clustering mechanism based on artificial bee colony algorithm to prolong the network lifetime. The simulation results show that the artificial bee colony algorithm based clustering approach can be applied to routing protocols successfully. Naeimi et al. [9] classify routing protocols according to their different objectives and methods by addressing both the shortcomings and the strength of clustering process on each stage of cluster head selection, cluster formation, data aggregation, and data communication and summarized them into categories. Moreover, Lloret et al. demonstrated in [10] that cluster-based mechanisms allow multiple types of network topologies in order to have the most efficient network. Lehasini et al. [11] used clusters to improve the network coverage.

In-network data aggregation [12–18] is another approach to reduce the amount of data transmitted by the nodes and prolong the network lifetime. It performs data aggregation in network to reduce the amount of data transmission by constructing a routing tree. In [12, 13] we can find complete surveys on distributed database management techniques and data aggregation for wireless sensor networks. Al-Karaki et al. [14] present a Grid-based Routing and Aggregator Selection Scheme (GRASS), which achieves low-energy dissipation and low-latency without sacrificing quality. Seyin et al. [15] propose a localized and energy-efficient data aggregation tree approach called Localized Power-Efficient Data Aggregation Protocols (L-PEDAPs) for sensor networks. Gao et al. [16] jointly adopt the cooperative multiple-input-multiple-output and data-aggregation techniques to reduce the energy consumption per bit in wireless sensor network by reducing the amount of data for transmission and better using network resources through cooperative communication.

Approximate data collection is also an energy-efficient approach which is further divided into two subcategories. The first subcategory is approximate data collection via probabilistic models of sensing data collected from wireless sensor networks [19, 20]. Xua and Choi [19] propose a new class of Gaussian processes for resource-constrained mobile sensor networks and propose a distributed algorithm which achieves the field prediction by correctly fusing all observations. Min and Chung [20] present an approximate data gathering approach which utilizes temporal and spatial correlations for wireless sensor network and does not transmit the data to the sink if the data are accurately predicted. The second subcategory is approximate data gathering without probabilistic models. Kotidis [23] propose Snapshot queries for energy-efficient data acquisition in sensor networks. They constitute a network Snapshot through selecting a set of active nodes which is used to provide quick approximate answers to user queries and reducing the energy consumption substantially in wireless sensor network. Gupta et al. [28] design techniques that exploit data correlation among nodes to minimize communication costs incurred during data gathering in a wireless sensor network. They design distributed algorithms that can be implemented in an asynchronous communication model. They also design an exponential approximation algorithm that returns a solution within $O (\log n)$ of the optimal size. Liu et al. [24] propose a data collection approach based on a careful analysis of the sensor data. By exploring the spatial correlation of sensing data, they dynamically divide the nodes into clusters such that the sensors in the same cluster have similar sensing time series which can share the workload of data collection since their future data may likely be similar. Hung et al. [22] propose an algorithm to determine a set of active nodes with high residual energy and wide data coverage ranges. Here, the data coverage range of a node is the set of nodes that have sensor data very close to the particular node. They also develop an algorithm to further reduce the extra cost incurred in messages collection and transmission for selection of active nodes.

In previous work, we have studied the minimum-latency data aggregation problem and proposed a new efficient scheme for it [34]. The basic idea is that we first build an aggregation tree by ordering nodes into layers and then we proposed a scheduling algorithm on the basis of the aggregation tree to determine the transmission time slots for all nodes in the network with collision avoiding. We have proved that the upper bound for data aggregation with our proposed scheme is bounded by $(15 R + Δ - 15)$ for wireless sensor networks in two-dimensional space, where $Δ$ is the maximum degree and R is the network radius. We have also simulated the case in three-dimensional wireless sensor networks and proposed an aggregation tree construction algorithm based on maximum independent set [35]; the height of the spanning tree can be reduced to about 50%.

In previous work, we study the node selection problem with data accuracy guaranteed in service-oriented wireless sensor networks [36]. We exploit the spatial correlation between the service data and aim at selecting minimum number of nodes to provide services with data accuracy guaranteed. Firstly, we have formulated this problem into an integer nonlinear programming problem to illustrate its NP-hard property. Secondarily, we have proposed two heuristic algorithms, namely, Separate Selection Algorithm (SSA) and Combined Selection Algorithm (CSA). The SSA is designed to select nodes for each service in a separate way, and the CSA is designed to select nodes according to their contribution increment.

7. Conclusions

Due to the correlation and redundancy among the sensing data in wireless sensor networks, it is an important issue to develop an energy-efficient active node selection strategy, which not only improves the network lifetime but also is helpful to solve other problems, such as lower network throughput and serious node conflict in dense wireless sensor networks. In this paper, we concern with the active node selection issue and provided a formal definition for this problem. We propose the Cover Sets Balance (CSB) algorithm and High Residual Energy First nodes selection (HREF) algorithm aiming at extending the network lifetime of wireless sensor networks. We also propose a Correlated Node Set Computing (CNSC) algorithm to find the correlated node set for a given node. Experimental results on synthesized data sets show that HREF can significantly reduce the number of active nodes, and these algorithms are able to significantly extend the network lifetime compared with related works. In the future work, we are to further consider the temporal correlation among the sensing data and design an efficient node scheduling scheme with both spatial and temporal correlation.

Footnotes

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

This work is supported by the National Science Foundation of China under Grand nos. 61370210 and 61103175, Fujian Provincial Natural Science Foundation of China under Grant nos. 2011J01345, 2013J01232, and 2013J01229, and the Development Foundation of Educational Committee of Fujian Province under Grand no. 2012JA12027. It has also been partially supported by the “Ministerio de Ciencia e Innovación,” through the “Plan Nacional de I+D+i 2008–2011” in the “Subprograma de Proyectos de Investigación Fundamental,” Project TEC2011-27516, and by the Polytechnic University of Valencia, though the PAID-15-11 multidisciplinary Projects.

References

Yick

Mukherjee

Ghosal

Wireless sensor network survey

Computer Networks 2008 52 12 2292 2330

2-s2.0-46449122114

10.1016/j.comnet.2008.04.002

Sendra

Lloret

García

Toledo

J. F.

Power saving and energy optimization techniques for wireless sensor networks

Journal of Communications 2011 6 6 439 459

2-s2.0-80052054687

10.4304/jcm.6.6.439-459

Diallo

Rodrigues

Sene

Lloret

Distributed database management techniques for wireless sensor networks

IEEE Transactions on Parallel and Distributed Systems 2013

10.1109/TPDS.2013.207

Oliveira

L. M. L.

Rodrigues

J. J. P. C.

Elias

A. G. F.

Zarpelão

B. B.

Ubiquitous monitoring solution for Wireless Sensor Networks with push notifications and end-to-end connectivity

Mobile Information Systems 2014 10 1 19 35

Diallo

Rodrigues

J. J. P. C.

Sene

Real-time data management on wireless sensor networks: a survey

Journal of Network and Computer Applications 2012 35 3 1013 1021

2-s2.0-84858082558

10.1016/j.jnca.2011.12.006

Boyinbode

Takizawa

A survey on clustering algorithms for wireless sensor networks

International Journal of Space-Based and Situated Computing 2011 1 2 130 136

10.1504/IJSSC.2011.040339

Aslam

Phillips

Robertson

Sivakumar

A multi-criterion optimization technique for energy efficient cluster formation in wireless sensor networks

Information Fusion 2011 12 3 202 212

2-s2.0-79954632322

10.1016/j.inffus.2009.12.005

Karaboga

Okdem

Ozturk

Cluster based wireless sensor network routing using artificial bee colony algorithm

Wireless Networks 2012 18 7 847 860

2-s2.0-84859883022

10.1007/s11276-012-0438-z

Naeimi

Ghafghazi

Chow

C. O.

Ishii

A survey on the taxonomy of cluster-based routing protocols for homogeneous wireless sensor networks

Sensor 2012 12 6 7350 7409

10.3390/s120607350

10.

Lloret

Garcia

Bri

Diaz

J. R.

A cluster-based architecture to structure the topology of parallel wireless sensor networks

Sensors 2009 9 12 10513 10544

2-s2.0-77950238209

10.3390/s91210513

11.

Lehasini

Guyennet

Feham

Cluster-based energy-efficient k-coverage for wireless sensor networks

Network Protocols and Algorithms 2010 2 2 89 106

12.

Rajagopalan

Varshney

P. K.

Data aggregation techniques in sensor networks: a survey

IEEE Communications Surveys 2006 6 4 48 63

13.

Maraiya

Kant

Gupta

Wireless sensor network: a review on data aggregation

International Journal of Scientific & Engineering Research 2011 2 4 1 6

14.

Al-Karaki

J. N.

Ul-Mustafa

Kamal

A. E.

Data aggregation and routing in Wireless Sensor Networks: optimal and heuristic algorithms

Computer Networks 2009 53 7 945 960

2-s2.0-62149152392

10.1016/j.comnet.2008.12.001

15.

Tan

H. O.

Korpeoglu

Stojmenovic

Computing localized power-efficient data aggregation trees for sensor networks

IEEE Transactions on Parallel and Distributed Systems 2011 23 3 489 500

2-s2.0-79551520275

10.1109/TPDS.2010.68

16.

Gao

Zuo

Zhang

Peng

X.-H.

Improving energy efficiency in a wireless sensor network by combining cooperative MIMO with data aggregation

IEEE Transactions on Vehicular Technology 2010 59 8 3956 3965

2-s2.0-77958098684

10.1109/TVT.2010.2063719

17.

Wei

Ling

Guo

Xiao

Vasilakos

A. V.

Prediction-based data aggregation in wireless sensor networks: combining grey model and Kalman Filter

Computer Communications 2011 34 6 793 802

2-s2.0-79952038956

10.1016/j.comcom.2010.10.003

18.

Xiang

Luo

Vasilakos

Compressed data aggregation for energy efficient wireless sensor networks

Proceedings of the 8th Annual IEEE Communications Society Conference on Sensor, Mesh and Ad Hoc Communications and Networks (SECON '11)

June 2011

Salt Lake City, Utah, USA

46 54

2-s2.0-80052807351

10.1109/SAHCN.2011.5984932

19.

Xua

Choi

Spatial prediction with mobile sensor networks using gaussian processes with built-in gaussian markov random fields

Automatica 2012 48 8 1735 1740

10.1016/j.automatica.2012.05.029

20.

Min

J.-K.

Chung

C.-W.

EDGES: efficient data gathering in sensor networks using temporal and spatial correlations

Journal of Systems and Software 2010 83 2 271 282

2-s2.0-73349104611

10.1016/j.jss.2009.08.004

21.

Cheng

(ε, δ)-Approximate aggregation algorithms in dynamic sensor networks

IEEE Transactions on Parallel and Distributed Systems 2012 23 3 385 396

2-s2.0-84855744805

10.1109/TPDS.2011.193

22.

Hung

C. C.

Peng

W. C.

Lee

W. C.

Energy-aware set-covering approaches for approximate data collection in wireless sensor networks

IEEE Transactions on Knowledge and Data Engineering 2012 24 11 1993 1200

10.1109/TKDE.2011.224

23.

Kotidis

Snapshot queries: towards data-centric sensor networks

Proceedings of the 21st International Conference on Data Engineering (ICDE '05)

April 2005

131 142

2-s2.0-28444434729

24.

Liu

Pei

An energy-efficient data collection framework for wireless sensor networks by exploiting spatiotemporal correlation

IEEE Transactions on Parallel and Distributed Systems 2007 18 7 1010 1023

2-s2.0-34250374122

10.1109/TPDS.2007.1046

25.

Zhang

Wang

Naït-Abdesselam

Khokhar

A. A.

Distortion analysis for real-time data collection of spatially temporally correlated data fields in wireless sensor networks

IEEE Transactions on Vehicular Technology 2009 58 3 1583 1594

2-s2.0-64049105191

10.1109/TVT.2008.928906

26.

Karasabun

Korpeoglu

Aykanat

Active node determination for correlated data gathering in wireless sensor networks

Computer Networks 2013 57 5 1124 1138

10.1016/j.comnet.2012.11.018

27.

Cormen

T. H.

Leiserson

C. E.

Rivest

R. L.

Stein

Introduction To Algorithms 2001

McGraw Hill

28.

Gupta

Navda

Das

Chowdhary

Efficient gathering of correlated data in sensor networks

ACM Transactions on Sensor Networks 2008 4 1, article 4 402 413

2-s2.0-41849100337

10.1145/1325651.1325655

29.

Campobello

Leonardi

Palazzo

Improving energy saving and reliability in wireless sensor networks using a simple CRT-based packet-forwarding solution

IEEE/ACM Transactions on Networking 2012 20 1 191 205

2-s2.0-84857359551

10.1109/TNET.2011.2158442

30.

Tseng

L. C.

Chien

F. T.

Zhang

Chang

R. Y.

Chung

W. H.

Huang

C. Y.

Network selection in cognitive heterogeneous networks using stochastic learning

IEEE Communications Letters 2013 17 12 2304 2307

10.1109/LCOMM.2013.102113.131876

31.

Rodrigues

J. J. P. C.

Neves

P. A. C. S.

A survey on IP-based wireless sensor network solutions

International Journal of Communication Systems 2010 23 8 963 981

2-s2.0-77955395620

10.1002/dac.1099

32.

Aziz

A. A.

Sekercioglu

Y. A.

Fitzpatrick

Ivanovich

A survey on distributed topology control techniques for extending the lifetime of battery powered wireless sensor networks

IEEE Communications Surveys and Tutorials 2012 15 1 121 144

2-s2.0-84859224309

10.1109/SURV.2012.031612.00124

33.

Mehlhorn

A faster approximation algorithm for the Steiner problem in graphs

Information Processing Letters 1988 27 3 125 128

2-s2.0-0023983382

34.

Hongju

Qin

Xiaohua

Heuristic algorithms for real-time data aggregation in wireless sensor networks

Proceedings of the International Conference on Wireless Communications and Mobile Computing (IWCMC '06)

July 2006

Vancouver, Canada

1123 1128

2-s2.0-34247337330

10.1145/1143549.1143774

35.

Cheng

An efficient scheme for minimum-latency data aggregation in two- and three-dimensional wireless sensor networks

Proceeding of the 2nd International Conference on Cloud and Green Computing (CGC '12)

2012

Xiangtan, China

252 259

36.

Cheng

Guo

Chen

Node selection algorithms with data accuracy guarantee in service-oriented wireless sensor networks

International Journal of Distributed Sensor Networks 2013 2013 14

527965

10.1155/2013/527965

Energy-Efficient Node Selection Algorithms with Correlation Optimization in Wireless Sensor Networks

Abstract

1. Introduction

2. System Model

Definition 1 (data coverage range (DCR)).

Definition 2 (active node set (ANS) and active node).

Definition 3 (cover set (CS) and covered node).

Definition 4 (correlated node set ( CNS )).

Definition 5 (CNS computing problem).

Definition 6 (active node selection problem).

3. Heuristic Algorithms

3.1. CSB Algorithm

Algorithm 1: Pseudocodes for Cover Set Balance (CSB) algorithm.

3.2. HREF Algorithm

3.2.1. CNSC Algorithm

Algorithm 2: Pseudocodes for CNSC algorithm.

3.2.2. HREF Algorithm

Algorithm 3: Pseudocodes for HREF algorithm.

4. Theoretical Analysis

Theorem 7.

Proof.

Theorem 8.

Proof.

Theorem 9.

Proof.

Theorem 10.

Proof.

Theorem 11.

Proof.

5. Simulation Results and Analysis

5.1. Simulation Environment Setup

5.2. Comparison of Number of Active Nodes

5.2.1. Impact of Network Size

5.2.2. Impact of Error Threshold

5.2.3. Impact of the Number of Events

5.3. Comparison of Network Lifetime

5.3.1. Impact of Network Size

5.3.2. Impact of Error Threshold

5.3.3. Impact of Interval

5.3.4. Impact of Fraction of Alive Nodes

5.3.5. Impact of Number of Events

6. Related Works

7. Conclusions

Footnotes

Conflict of Interests

Acknowledgments

References

Definition 4 (correlated node set ( $CNS$ )).