Skyline-Based Aggregator Node Selection in Wireless Sensor Networks

Abstract

In order to achieve the equal usage of limited resources in the wireless sensor networks (WSNs), we must aggregate the sensor data before passing it to the base station. In WSNs, the aggregator nodes perform a data aggregation process. Careful selection of the aggregator nodes in the data aggregation process results in reducing large amounts of communication traffic in the WSNs. However, network conditions change frequently due to sharing of resources, computation load, and congestion on network nodes and links, which makes the selection of the aggregator nodes difficult. In this paper, we study an aggregator node selection method in the WSNs. We formulate the selection process as a top-k query problem, where we efficiently solve the problem by using a modified Sort-Filter-Skyline (SFS) algorithm. The main idea of our approach is to immediately perform a skyline query on the sensor nodes in the WSNs, which enables to extract a set of sensor nodes that are potential candidates to become an aggregator node. The experiments show that our method is several times faster compared to the existing approaches.

1. Introduction

Recently, wireless sensor networks (WSNs) have been used in many applications, such as military target tracking and surveillance [1], meteorological hazards [2], wildlife monitoring [3], natural disaster relief [4], and healthcare [5]. A WSN consists of a sink node, also called a base station, and a group of sensor nodes. Each sensor node has a wireless radio transceiver, a power source, a small microcontroller, and multitype sensors that enable the sensor node to sense and exchange the data with other sensor nodes. On the other hand, the base station is a gateway for the WSN to communicate with the external applications. The base station collects the sensor data from the sensor nodes and combines it into a form requested by the applications.

In a typical WSN, the sensor nodes have limited resources such as battery power, computing capability, and memory. Communication is a dominant source of energy consumption in the WSNs [6, 7]. Thus, the general approach is to jointly process the sensor data, generated by the different sensor nodes while transmitting it to the base station. This process is called as a data aggregation process. By processing, combining, and filtering the sensor data, the data aggregation process reduces a number of data transmissions and improves the bandwidth energy utilization in the WSNs.

In WSNs, the aggregator nodes perform the data aggregation process. The aggregator nodes receive the sensor data from neighboring nodes, perform the data aggregation process, and forward the filtered data to the base station. Careful selection of the aggregator nodes in the data aggregation process results in reducing large amounts of communication traffic in the WSN. However, network conditions change frequently due to sharing of resources, computation load, and congestion on network nodes and links, which makes the selection of the aggregator nodes difficult [8, 9].

Several data aggregation protocols have been proposed to solve the selection problem of aggregator nodes, which can be categorized into two types: tree-based data aggregation protocols [10–13] and cluster-based data aggregation protocols [14–18]. In tree-based data aggregation protocols, the aggregator node is determined, and the data paths of sensor nodes include the determined data aggregator nodes. The main issue of tree-based data aggregation protocols is the construction of an energy efficient data aggregation tree, which is time consuming in the large WSN. On the other hand, in cluster-based data aggregation protocols, sensor nodes are divided into clusters. In each cluster, a cluster head is selected. Cluster head aggregates the sensor data locally and forward the aggregation result to the base station. However, this approach is also inefficient. Because there can be many cluster heads in the large WSNs, which leads to selection problem among cluster heads.

In this paper, we study an aggregator node selection method in the WSNs. We formulate the selection process as a top-k query problem, where we efficiently solve the problem by using a modified Sort-Filter-Skyline (SFS) [19] algorithm. The main idea of our approach is to immediately perform a skyline query on the sensor nodes in the WSN, which enables to extract a set of sensor nodes that are potential candidates to become an aggregator node. Our approach selects a set of aggregator nodes according to their attributes, such as distance from the base station, power consumption, battery life, and communication cost. Thus, we can reduce large amounts of communication traffic by sending only the aggregated data through selected aggregator nodes, instead of individual sensor data, to the base station. The experiments show that our method is several times faster compared to the existing approaches. We also provide an analysis of the major factors that impact the performance of previous approaches.

The remainder of this paper is organized as follows. Section 2 explains data aggregation process in the WSN. Section 3 discusses the related work. Section 4 describes our proposed approach. Section 5 presents performance evaluation. Section 6 highlights conclusions and future work.

2. Data Aggregation in WSN

In this section, we briefly explain the data aggregation process in the WSNs.

A WSN is a collection of sensor nodes with limited battery power, computing capability, and memory. Since the communication is a main source of energy consumption in WSN, it is preferable to jointly process the sensor data, generated by the different sensor nodes while forwarding it to the base station. This process is called a data aggregation process. One of the advantages of data aggregation process is that when the base station initiates the query on the WSN, rather than sending each sensor node's data to the base station, one of the sensor nodes performs the data aggregator process. Thus, the data aggregation process reduces redundant data transmissions and improves the overall lifetime of the WSN.

Figure 1 demonstrates a data aggregation process. In a typical data aggregation process, three types of nodes are used, such as base station, sensor nodes, and aggregator nodes. Sensor nodes sense the data from the target region and send it to the aggregator nodes. The aggregator nodes collect the sensor data from the multiple sensor nodes, perform the data aggregation process using aggregation function, and send the aggregated data to the upper aggregator node or to the base station. The base station collects the aggregated data from the WSN and combines it into a form requested by the applications.

Figure 1

A data aggregation process in the WSN.

The simplest way to perform the data aggregation process is to determine data aggregator nodes in the network. Recall from Section 1 that network conditions change frequently due to sharing of resources, computation load, and congestion on network nodes and links, which makes the selection of the aggregator nodes difficult. In this paper, we study an aggregator node selection method in the WSNs.

3. Related Study

3.1. Tree-Based Data Aggregation Protocols

In tree-based data aggregation protocols, the aggregator node is determined, and the data is transformed to the base station through the determined data aggregator nodes.

Madden et al. [10] proposed the tiny aggregation (TAG), service for aggregation in a low-power, distributed, and wireless environments. TAG has two attributes. First, TAG enables users to express simple, declarative queries for the data collection and aggregation, by borrowing an idea from the aggregation operators in database query language. Second, it semantically distributes and executes aggregation queries in the sensor network in a timely and power-efficient manner and preserves important properties of the WSNs, such as the resource constraints and loss communication.

Lindsey et al. [11] proposed power-efficient gathering in sensor information systems (PEGASIS), which reduces energy cost to increase the lifetime of the WSNs. The method insists that it is near optimal in terms of energy cost for the data gathering application in the WSNs. The main idea of the PEGASIS is to form a chain among the sensor nodes. In order to evenly distribute the energy usage in the WSNs, each sensor node communicates only with a close neighboring node and takes turns forwarding the data to the base station.

Ding et al. [12] proposed an efficient energy-aware distributed heuristic to generate the aggregation tree, called energy-aware distributed aggregation tree (EADAT). The EADAT algorithm makes no assumption of a local network topology and is based on residual power. It uses two techniques, such as neighboring broadcast scheduling and distributed competition among neighbors, which makes the EADAT algorithm efficient. The simulation analysis demonstrates that EADAT algorithm performs well in terms of network lifetime, energy saving, data delivery ratio, and the protocol overhead.

3.2. Cluster-Based Data Aggregation Protocols

In cluster-based data aggregation protocols, sensor nodes are divided into clusters. In each cluster, a cluster head is selected. Cluster head aggregates the sensor data locally and forwards the aggregation result to the base station.

Heinzelman et al. [16] proposed a low-energy adaptive clustering hierarchy, called low-energy adaptive clustering hierarchy (LEACH), protocol architecture for microsensor networks. The LEACH is divided into two phases, such as set-up phase and steady-state phase. In set-up phase, cluster structures are organized. Then, in the steady-state phase, the data is forwarded from the nodes to the cluster head and to the base station. LEACH uses a randomized rotation of the cluster head in order to evenly distribute energy usage among the sensor nodes. The experiment results demonstrate that LEACH reaches the performance needed under the tight constraints of the wireless channel.

Younis and Fahmy [17] proposed protocol, called hybrid energy-efficient distributed clustering (HEED). HEED selects a cluster head according to both residual energy and node proximity to its neighbors. HEED defines the mean of the minimum power levels, called average minimum reachability power (AMRP), required by all nodes within the cluster ranges to reach the cluster head. AMRP provides a good estimation of the communication cost in each cluster. In order to select a cluster head, each sensor node sets its probability of becoming a cluster head that considers the initial percentage of cluster heads, the current residual, and the initial energy of the sensor node. This process continues until each node selects its cluster head. The simulation results show that HEED can prolong the network lifetime and support scalable data aggregation.

Buttyán and Schaffer [18] proposed a position-based aggregator node election in wireless sensor networks (PANEL), which uses the geographical position information of the sensor nodes in order to select an aggregator node. In PANEL, at the beginning of each phase, a metric called a reference point is calculated in each cluster by each node. Once the reference point is calculated, the nodes in the cluster select the sensor node that is the closest to the referent point as the aggregator node for the given phase. In each phase, the reference points are recalculated, and the aggregator node selection procedure is reperformed, in order to ensure load balancing meaning that each sensor node can have the equal probability to become an aggregator node.

4. Proposed Method

This section will describe the proposed method in detail. We start with general assumptions for our approach. Then, we explain the aggregator node selection process using skyline. Finally, we describe a tree building process of the cluster heads.

4.1. Skyline Sensor Nodes

Considering that a set of sensor nodes are scattered in a field, in this paper, we make following assumptions: we assume that each sensor node can perform following functions: sensing, aggregation, and forwarding. Sensor nodes are static and they are aware of their geographical position, that is, not instrumented with GPS-capable antennae. We further assume that a sensor network is subdivided into the clusters. In each cluster, we select a cluster head, in other words an aggregator node. An aggregator node performs the data aggregation process locally, and forwards the aggregated sensor data to the base station. Figure 2 demonstrates an example of a cluster-based data aggregation process.

Figure 2

A cluster-based data aggregation process in the WSNs.

It is important to note that an aggregator node should be selected according to multiple attributes, such as distance from the base station, power consumption, battery life, and communication cost. When a number of sensor nodes are large in a cluster, it may take a long time to compute the combination of these attributes and select an optimal aggregator node. However, we can perform a look up at just the top few results, ranked by a small set of attributes values that define an aggregator node. Thus, we propose to formulate the selection process as a top-k query problem, where we efficiently solve the problem by using a modified SFS algorithm. The main idea of our approach is to immediately perform a skyline query on the sensor nodes in the WSN, which enables to extract a set of sensor nodes that are potential candidates to become an aggregator node. First, we briefly introduce skyline queries, and then, we describe how to apply them in our approach.

Given a set of sensor nodes with n attributes, a skyline query choses those sensor nodes that are not dominated by any other point. A sensor node $s_{i}$ is said to dominate another sensor node $s_{j}$ , if $s_{i}$ is as good as or better than $s_{j}$ attribute wise and definitely better in at least one attribute. Thus, a skyline query choses an optimal aggregator node according to all attributes. We define dominance relationships between sensor nodes based on their attributes, such as distance from the base station, power consumption, battery life, and communication cost. We use dominance relationship to select an aggregator node in a cluster that is not dominated by other sensor nodes in the identical cluster.

Definition 1 (dominance relationship).

Consider a cluster C and the sensor nodes $s_{1}$ and $s_{2}$ that belong to the cluster ${(s}_{1}, s_{2}) \in C$ . We say that $s_{1}$ dominates ( $s_{1} > s_{2}$ ) $s_{2}$ if and only if $s_{1}$ is better than or equal to $s_{2}$ by all attributes and definitely better in at least one attribute.

Definition 2 (skyline sensor node).

The skyline of C, denoted as $S A (C)$ , is a set of sensor nodes in C that are not dominated by any other sensor node of C, that is, $S A (C) = {s_{1} \in C ∣ ∄ s_{2} \in C : s_{2} < s_{1}}$ .

Figure 3 shows an example of skyline sensor nodes of a certain cluster. The sensor nodes are represented as points in the 2-dimensional space, with the coordinates of each point indicating the values of the sensor nodes in two attributes, such as power consumption and communication cost. From the figure, we can observe that $s_{1}$ , $s_{2}$ , $s_{4}$ , and $s_{7}$ are not dominated by other sensor nodes, meaning that there are no other sensor nodes that offer both shorter power consumption and communication cost than these nodes.

Figure 3

An example of skyline method on sensor nodes.

4.2. Aggregator Node Selection Algorithm

We assume that the proposed approach considers aggregator node selection in a medium-scale sensor network. In a typical medium-scale sensor network, the number of nodes can reach 200–300 sensor nodes, where the potential candidate to be an aggregator node does not exceed 30–40 sensor nodes, according to the size of each cluster. However, it is important to mention that in a large-scale sensor network, potential candidates to be an aggregator node can be huge, which means that there is a need to build an index before applying our approach.

Determining the aggregator nodes of a certain cluster requires pairwise comparisons of the attributes of the candidate aggregator nodes. This process can be expensive in terms of computation time if the number of candidate services is large. Several efficient algorithms have been proposed for skyline computation. Given that, for the problem considered here, the process of determining the skyline aggregator nodes is independent of any individual base station request or usage context, it does not need to be conducted online at request time. Therefore, we make use of any of the existing methods for determining the skyline aggregator nodes offline in order to speed up the service aggregator selection process later at request time. For this purpose, we used SFS algorithm, which presorts the data points in skyline according to their scores obtained by a monotone functions f, such that if $f (s_{i}) < f (s_{j})$ , then it is guaranteed that $s_{i} < s_{j}$ . In other words, the function corresponds to a topological sort with respect to the dominance criteria. The SFS method is generally approached as a baseline in benchmarking research; hence, it is suitable for the formulating solution to the skyline query.

SFS is a skyline algorithm based on presorting and uses no index structures. Algorithm 1 [20] describes the steps of extended SFS. It takes an array $C [1, \dots, n]$ of tuples as input which is assumed to fit in the main memory. It returns the skyline set $S (C [1, \dots, n])$ as output. SFS maintains an array S holding skyline tuples and visits all tuples in sorted order (line 3). For each tuple $D [i]$ , SFS performs dominance tests with all skyline tuples in S (line 4). If no skyline tuples in S dominate $D [i]$ , SFS inserts $D [i]$ into S.

Algorithm 1: The extended Sort-Filtering-Skyline algorithm.

Input:

Input an array $C [1, \dots, n]$ of data points

Output:

Output the skyline set $S (C [1, \dots, n])$

(1) $S \leftarrow φ$

(2) sort $C [1, \dots, n]$ topologically with respect to the dominance criteria (Definition 1)

(3) for $i = 1$ to n do

(4) if $\forall e \in S$ , $e \neq C [i]$ then

(5) insert $C [i]$ into S

(6) end for

(7) return S

4.3. Aggregator Node Traversal Algorithm

In each cluster, we select a cluster head in order to aggregate the sensor data locally and transmit the aggregation result to the base station. However, this approach is also inefficient, because there can be many cluster heads in the large WSNs, which leads to the selection problem among cluster heads. Thus, we propose an aggregator node traversal algorithm, in which we form a tree structure to transmit aggregated data by multihopping through other cluster heads. Algorithm 2 presents an aggregator node traversal algorithm.

Algorithm 2: Aggregator node traversal algorithm.

Input:

Ordered tree T with root r

Output:

List to root r

(1) BuildTree $(v, r o o t)$

(2) If $r o o t$ is empty then

$r o o t \leftarrow v$ ;

(3) else

(4) if v≤ value stored in $r o o t$ then

(5) if the left child of root exists then

(6) BuildTree (v, left child of $r o o t$ )

(7) else

(8) Insert v as the left child $r o o t$

(9) else

(10) if the right child of $r o o t$ exists then

(11) BuildTree (v, right child of $r o o t$ )

(12) else

(13) Insert v as the right child $r o o t$

(14) end if

(15) end if

(16) end if

(17) AggregatorNodeTraversal (T)

(18) for each child c of r from left to right

(19) $T (c) =$ subtree with c as its root

(20) AggregatorNodeTraversal ( $T (c)$ )

(21) end for

(22) return r

In Algorithm 2, given an ordered tree T with root r, we aim at receiving a list of aggregator nodes to traverse through the tree. The algorithm starts with a BuildTree function (line 1) that builds a tree of selected aggregator nodes. In this procedure, the aggregator node that has the best combination of attributes, calculated in Algorithm 1, is selected as a root node (lines 2 and 3). Then, by checking each child nodes, the algorithm recurs down the left or right subtree and builds a tree of aggregator nodes (lines 3–16). Once BuildTree function builds a tree of aggregator nodes, the algorithm calls aggregator node traversal function, which traverses a tree in a postorder manner. A postorder traversal involves first postorder traversing the subtrees rooted at each of the children of a node and then visiting the node itself, starting at the root.

Definition 3 (postorder traversal).

Let T be an ordered rooted tree with root r. If T consists only of root r, then root r is the postorder traversal of T. Otherwise, suppose that $T_{1}, T_{2}, \dots, T_{n}$ are the subtrees at r from left to right in T. The postorder traversal begins by visiting $T_{1}$ , then $T_{2}$ in postorder until $T_{n}$ , and ends by visiting r.

It is important to note that the discussed Algorithms 1 and 2 are mutually related to each other. Algorithm 1 selects a set of aggregator nodes according to their attributes, such as distance from the base station, power consumption, battery life, and communication cost. However, there can be many cluster heads in the large WSNs, which leads to selection problem among cluster heads. Algorithm 2 solves this problem by forming a tree structure to transmit aggregated data by multihopping through other cluster heads which results in significant energy savings.

5. Performance Evaluation

In this section, we present performance evaluation of our approach. The aim of the experiment is to compare the computation time of the proposed approach with the method when the data aggregation process is not used and with the method of clustering.

5.1. Experiment Results

Experiments were carried out on a 2.4 GHz Pentium processor with 512MB of RAM running Windows XP Professional. For implementation of our proposed approach, we used C++ programming language. Data size used in our experiments consists of 1 K, 10 K and 100 K, data. The following experiments are carried out.

We compare node selection time. Graphs in Figures 4(a), 4(b) and 4(c) demonstrate this comparison. In all of these figures, x -axis represents aggregation node selection time in milliseconds and y represents d dimensions in universe. d dimensions can be interpreted as the sensor node attributes such as distance from the base station, power consumption, battery life, and communication cost. Data size used in our experiments consists of 1 K (a), 10 K (b), and 100 K (c) data.

Figure 4

A comparison of node selection time with 1 K (a), 10 K (b), and 100 K (c).

From the graphs in Figures 4(a), 4(b) and 4(c), we can observe that the proposed approach outperforms the method when the data aggregation process is not used and with the method of clustering by up to two times. This is because the main idea in our approach is to perform a skyline query on the sensor nodes in WSNs in order to extract among those sensor nodes that are potential candidates for the leading role and those that cannot possibly become an aggregator node. Our approach selects a set of leading aggregator nodes according to their attributes, such as distance from the base station, power consumption, battery life, and communication cost. Thus, we can select aggregator nodes more efficiently. On the other hand, the method of clustering suffers from load balancing as it shows the next best result. The method when data aggregation process is not used uses a random aggregator selection algorithm. Thus, it shows the worst performance in selection aggregate node.

6. Conclusion

In this paper, we have studied an aggregator node selection method in the WSNs. We have proposed to formulate the selection process as a top-k query problem, where we efficiently solve the problem by using a modified SFS algorithm. Our approach selects a set of aggregator nodes according to their attributes, such as distance from the base station, power consumption, battery life, and communication cost. Thus, we can reduce large amounts of communication traffic by sending only aggregated data through selected aggregator nodes, instead of individual sensor data, to the base station. The experiments have showed that our method is several times faster comparing to the existing approaches. We have also provided an in-depth analysis on the major factors that impact the performance of previous approaches.

Footnotes

Acknowledgments

This work was supported by the SRC Research Center for Women's Diseases of Sookmyung Women's University (2009).

References

Bekmezci

Alagoz

Energy efficient, delay sensitive, fault tolerant wireless sensor network for military monitoring

International Journal of Distributed Sensor Networks 2009 5 6 729 747

10.1080/15501320902768625

Manes

Collodi

Fusco

Gelpi

Manes

Continuous remote monitoring in hazardous sites using sensor technologies

International Journal of Distributed Sensor Networks 2012 2012 13

317020

10.1155/2012/317020

Garcia-Sanchez

A. J.

Garcia-Sanchez

Losilla

Kulakowski

Garcia-Haro

Rodríguez

López-Bao

J.-V.

Palomares

Wireless sensor network deployment for monitoring wildlife passages

Sensors 2010 10 8 7236 7262

2-s2.0-77957244440

10.3390/s100807236

Cayirci

Coplu

SENDROM: sensor networks for disaster relief operations management

Wireless Networks 2007 13 3 409 423

2-s2.0-34247573439

10.1007/s11276-006-5684-5

Alemdar

Ersoy

Wireless sensor networks for healthcare: a survey

Computer Networks 2010 54 15 2688 2710

2-s2.0-77956882701

10.1016/j.comnet.2010.05.003

Pottie

G. J.

Kaiser

W. J.

Wireless integrated network sensors

Communications of the ACM 2000 43 5 51 58

2-s2.0-0345851235

Lee

Y. C.

Zomaya

A. Y.

On effective slack reclamation in task scheduling for energy reduction

Journal of Information Processing Systems 2009 5 4 175 186

10.3745/JIPS.2009.5.4.175

Subhlok

Lieu

Lowekamp

Automatic node selection for high performance applications on networks

Proceedings of the 7th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP ′99)

May 1999

Atlanta, Ga, USA

163 172

2-s2.0-0032643892

Park

Y. H.

Whang

K. Y.

Lee

B. S.

Han

W. S.

Efficient evaluation of linear path expressions on large-scale heterogeneous XML documents using information retrieval techniques

Journal of Systems and Software 2006 79 2 180 190

2-s2.0-32544458977

10.1016/j.jss.2005.05.009

10.

Madden

Franklin

M. J.

Hellerstein

J. M.

Hong

TAG: a tiny aggregation service for ad-hoc sensor networks

Proceedings of the 5th Symposium on Operating Systems Design and Implementation (OSDI ′02)

December 2002

131 146

11.

Lindsey

Raghavendra

Sivalingam

K. M.

Data gathering algorithms in sensor networks using energy metrics

IEEE Transactions on Parallel and Distributed Systems 2002 13 9 924 935

2-s2.0-0036766616

10.1109/TPDS.2002.1036066

12.

Ding

Cheng

Xue

Aggregation tree construction in sensor networks

Proceedings of the 58th IEEE Vehicular Technology Conference

October 2003

2168 2172

2-s2.0-4143079149

13.

Yoon

Kim

Y. K.

Chang

J. W.

A new data aggregation scheme to support energy efficiency and privacy preservation for wireless sensor networks

International Journal of Security and Its Applications 2013 7 1 129

14.

Sheikhpour

Jabbehdari

A two-level cluster based routing protocol for wireless sensor networks

International Journal of Advanced Science and Technology 2012 45 19 30

15.

Pourpeighambar

S. B.

Sabaei

A spatial correlation analysis for moving object data aggregation in WSNs

International Journal of Software Engineering and Its Applications 2013 6 3 89

16.

Heinzelman

W. B.

Chandrakasan

A. P.

Balakrishnan

An application-specific protocol architecture for wireless microsensor networks

IEEE Transactions on Wireless Communications 2002 1 4 660 670

2-s2.0-33646589837

10.1109/TWC.2002.804190

17.

Younis

Fahmy

HEED: a hybrid, energy-efficient, distributed clustering approach for ad hoc sensor networks

IEEE Transactions on Mobile Computing 2004 3 4 366 379

2-s2.0-10944266504

10.1109/TMC.2004.41

18.

Buttyán

Schaffer

Position-based aggregator node election in wireless sensor networks

International Journal of Distributed Sensor Networks 2010 2010 15

2-s2.0-79952858512

10.1155/2010/679205

679205

19.

Chomicki

Godfrey

Gryz

Liang

Skyline with presorting

Proceedings of the 9th International Conference on Data Engineering (ICDE ′03)

March 2003

Bangalore, India

717 719

2-s2.0-0344927722

10.1109/ICDE.2003.1260846

20.

Park

Parallel skyline computation on multicore architectures

Information Systems 2011 36 4 808 823

10.1016/j.is.2010.10.005