An Energy-Efficient Sequence-Aware Top- k Monitoring Scheme in Wireless Sensor Networks

Abstract

We focus on top-k monitoring in wireless sensor networks and propose a novel sequence-aware top-k monitoring algorithm called SAT. Top-k monitoring is important to many applications of sensor networks. Conventional top-k monitoring algorithms install a filter at each sensor node and suppress unnecessary sensor updates. However, they have some drawbacks such as the fact that the sensor nodes consume energy extremely to probe sensor reading or to update filters. Our basic idea is to collect readings sequentially by their values. First, sequence-aware data collection is investigated to make sensor nodes to determine their orders for data gathering phase. Next, sensor nodes transmit their sensor readings sequentially to the base station. When the base station collects k-readings, it broadcasts a simple message to stop data gathering phase. Therefore, SAT may minimize the communication cost for processing top-k queries. Moreover, we expand our approach to a cluster-based top-k monitoring to filter out false positives in hierarchical levels. In order to show the superiority of our top-k monitoring approach, we simulate its performance with the conventional filter-based top-k monitoring algorithm. In the results, our approach reduces communication overhead and prolongs the network lifetime largely.

1. Introduction

Wireless sensor networks are widely used in military and civil fields such as environment and habitat monitoring, object tracking, nuclear reactor controlling, fire detection, and traffic controlling [1–5]. Monitoring sensor readings is very important in many applications [6–10]. Sensor nodes collect useful information such as temperature, humidity, and seismic intensity. They transmit sensor readings to the base station for sophisticated processes. Because sensor nodes have limited batteries and consume a lot of energy for communication, energy-efficient methods are required to reduce the network traffic. Recently many works on a top-k query processing have been done to reduce the network traffic. A top-k query retrieves the highest (or lowest) k readings and sensor nodes with them in the wireless sensor networks. Top-k queries are required environment monitoring applications. A top-k query is issued to find out the nodes and their corresponding areas with the highest pollution indexes for the purpose of pollution control or research study. In network management, power supply is critical to the operation of a wireless sensor network. Thus, a top-k query may be issued to continuously monitor the sensor nodes with the least residual energy to extend the network lifetime.

In a centralized approach for processing top-k queries, all sensor readings can be periodically collected by the base station that computes the top-k result set. However, it incurs network traffic in the data collection phase. Madden et al. [11] proposed an in-network aggregation technique, known as TAG, to reduce network traffic. Initially, a routing tree is first established, and the data are then aggregated and collected along the routing tree in the base station. Each sensor node sends its sensor readings to the base station. It compares the readings received from child nodes with its own readings and sends the highest reading to the base station or to the parent. In the result, the base station can just collect the top-k result set. However, all sensor nodes must always send their corresponding sensor readings in every epoch. Therefore, this approach incurs unnecessary updates in the network. To overcome this problem, Wu et al. [12] propose a filter-based approach, known as FILA. It installs a filter at each sensor node to suppress unnecessary sensor updates. Filter setting and query reevaluation upon updates are two fundamental issues to the correctness and energy-efficiency of the FILA approach. Whenever an update occurs in FILA, however, some sensor nodes that filter ranges that contain the update value consume energy extremely to probe their sensor readings or to update the filters. That is, the network consumes more energy to update filters or to reevaluate the top-k result. Especially, it becomes worse as the top-k result changes more frequently.

In this paper, we propose a novel sequence-aware top-k monitoring approach. The basic idea is to collect the readings sequentially by their values. First, the sequence-aware data collection algorithm is investigated to make the sensor nodes determine their orders for the data gathering phase. Next, we perform a simple sequence-aware top-k monitoring algorithm, called SAT. The sensor nodes transmit their sensor readings sequentially to the base station.

When the base station collects k-readings, it broadcasts a simple message to stop the data gathering phase. Therefore, SAT may minimize the communication cost for processing top-k queries. We also expand our approach to a cluster-based top-k monitoring, called SAT-c. The expanded approach filters out many false positives in two levels such as a cluster level and a tree level. Sequence-aware approaches perform better than the existing filter-based approaches, because they avoid filter updates and query reevaluation, and minimize communication cost in networks. However, there are the following fundamental issues. How are sensor readings collected in sequence without centralized control and filters? How do sensor nodes recognize their sequence to transmit their readings to the base station? These issues that are essential to achieve sequence-aware top-k monitoring without filters are investigated in this paper. Our contributions are as follows. (i)

We investigate a simple and intuitive top-k monitoring without filters, which leads to adjusting filter setting(s) and to reevaluating queries in the existing algorithm.

(ii)

We exploit the semantics of TDMA scheduling and propose a sequence-aware approach called SAT for monitoring a top-k query in wireless sensor net-works. Moreover, we extend our approach to SAT-c, which performs better in the hierarchical network topology. To the best of our knowledge, this is the first work that exploits TDMA scheduling to process a top-k query.

(iii)

We develop a data collection algorithm (SAC) that is capable of collecting sensor readings in sequence without any centralized control or filters.

(iv)

We design a lazy k-validation algorithm to avoid incorrect termination in the multihop network and to adapt SAC for top-k monitoring efficiently.

(v)

Extensive experiments are conducted to evaluate the performance of our approaches by using real data traces. The results show that our approaches substantially outperform FILA in terms of the network lifetime and energy consumption under various network configurations. In particular, our approaches are more suitable for the large-scale networks.

The rest of the paper is organized as follows. Section 2 reviews the related work on top-k query processing and presents some problems of the conventional approaches. Section 3 presents our proposed approach, SAT, and discusses how we collect sensor readings in sequence and terminate top-k query processing for each epoch. Also, we describe the extended top-k query processing approach, SAT-c, for the hierarchical network topology. In Section 4, we show the superiority of our proposed compression algorithm through performance evaluation and analysis. Finally, we conclude this paper and present some future works in Section 5.

2. Related Work

Querying the sensor readings is an important and challenging problem in wireless sensor networks. COUGAR [13] and TinyDB [14] are two typical perceptual data management systems. They implement a type of SQL interface to describe readings that users collect and how users want to combine, transform, and summarize readings. In-network processing is an energy-efficient approach for sensor networks [15, 16]. It allows sensor nodes to perform the compression, filtering, and aggregation functions on the collected sensor readings. Several collected readings will be merged into only one single value using aggregation functions, such as MIN, MAX, SUM, AVG, and COUNT.

In-network aggregation reduces the overall energy consumption by performing computation within the network and reducing the size of data propagated upward. Madden et al. [11] proposed an in-network data aggregation technique to reduce network traffic for data collection. Madden et al. established first a routing tree that collects readings and outputs aggregated values over the routing tree in sensor networks. Yoon and Shahabi [17] utilized additional temporal suppression schemes to reduce the communication cost on the TAG. Heinzelman et al. [18] considered both the temporal and spatial correlations among sensor readings for further reduction on aggregation cost.

The clustering algorithm also could be an efficient protocol for data collection and data aggregation in sensor networks as well [19–24]. In the clustering algorithm, sensor nodes elect their own cluster heads and are organized into different clusters. When a node wants to transmit data to the base station, it first transmits the sensed readings to its cluster head, and then the cluster head forwards the data to the base station. In this phase, the cluster heads can apply some aggregate functions to reduce the communication cost (e.g., combining and compressing the readings of the member nodes).

Recent researches focus on not only common queries but also the more complicated queries including a top-k query, a kNN query, a join query, and a skyline query. A top-k query retrieves the highest (or lowest) k readings and sensor nodes with them in the wireless sensor network. Wu et al. [12] exploited the semantics of a top-k query and proposed a filter-based monitoring scheme, called FILA, to maintain the current top-k points by assigning a dedicated filter to each sensor. Silberstein et al. [25] proposed a sampling-based approach to optimize top-k queries. Silberstein et al. considered the top-k query evaluation problem by providing approximate solutions with high probability based on a prediction model that is built on the samples of previous top-k query results. Zeinalipour-Yazti et al. [26] proposed TJA algorithm to find k top-ranked answers under the user-defined similarity. Zeinalipour-Yazti et al. [27] proposed a materialized in-network top-k views algorithm (MINT) to find the fixed top-k ranges in the networks. It has some global information in advance. Mai et al. [28] proposed a linear regression-based filter setting algorithm and a decentralized query reevaluation algorithm to reduce the communication cost for sending probe messages.

Skyline is a set of objects which are not dominated by other objects in the data set. Unlike other aggregation queries, a skyline query requires much comparison among objects. Chen et al. [29] addressed the issues of skyline monitoring in WSN and proposed a threshold-based hierarchy filter algorithm, MINMAX, to reduce network data traffic. Xin et al. [30, 31] proposed the sliding window skyline query in the WSN and methods to effectively reduce the data transmission volume of the sensor nodes, thereby reducing the overall network energy consumption. Shen et al. [32] proposed a location-based skyline query processing approach, called Ring-Skyline. It divides the monitoring area into several rings and adopts in-network query processing to reduce energy consumption. Seong et al. [33] proposed a lazy filtering-based in-network skyline query processing algorithm to improve the filtering scheme.

Another related work is a TDMA scheduling approach. Designing a TDMA schedule is to allocate time slots depending on the topology and the node packet generation rates. A proper schedule not only avoids collisions by silencing the interferers of every receiver in each time slot but also minimizes the number of time slots hence the latency [34].

3. Sequence-Aware Data Collection

In this section, we describe a novel top-k approach to reduce the energy consumption due to the query reevaluation and the filter updates in FILA. Our key idea is to self-sort by the values of sensor readings. A top-k query requires the highest (or lowest) k-readings. If all the sensor readings can be collected in sequence, it is possible to get just k sensor readings which are included in the top-k result. It reduces the number of sensor readings which are collected from the sensor network. First, we describe how sensor readings can be collected in sequence without any centralized control and how the data gathering is finished after collecting k readings. Next, we explain an extension for hierarchical network topology.

3.1. Sequence-Aware Data Collection

Our major concern is how to collect the sensor readings from the highest to the lowest sequentially. If we can collect the sensor readings sequentially, it provides opportunities to terminate data gathering intuitively after collecting the highest (or lowest) k readings. We propose a sequence-aware data collection algorithm (SAC) to collect sensor readings from the highest to the lowest sequentially. SAC uses the equi-width histogram to avoid the reevaluation of a query and the adjustment of filters. The base station diffuses a histogram setting including ${d, v_{\min}, v_{\max}, D}$ to the sensor network. Every sensor node and the base station have to agree on the same bucket partition, which can be specified by the bucket width w and the value range $[v_{\min}, v_{\max}]$ . The value range $[v_{\min}, v_{\max}]$ can be predefined by an operator. Otherwise, the base station gets the minimum and maximum data values in the sensor network to calculate the value range. The histogram is a partitioning of both the data space and the TDMA schedule, as shown in Figure 1. Let L be the width of $[v_{\min}, v_{\max}]$ defined as $L = v_{\max} - v_{\min}$ . Sensors divide the value range $[v_{\min}, v_{\max}]$ into d uniform buckets with width $(= ⌈ L / d ⌉)$ . That is, the ith bucket is defined by $[l_{i}, u_{i}] = [v_{\min} + (i - 1) \cdot w, v_{\min} + i \cdot w]$ . D denotes time duration for a single-round of top-k collection. Sensors divide the time duration D into d uniform TDMA frames with duration $⌈ D / d ⌉ \cdot t_{s}$ denotes the start time in the time duration. The ith TDMA frame is defined by $[t_{s} + (i - 1) \cdot ⌈ D / d ⌉, t_{s} + i \cdot ⌈ D / d ⌉]$ . Each TDMA frame consists of time slots, in which sensor readings are sent to the base station. The number of time slots is equal to the number of sensor nodes. Whenever the sensor is the same as the nodes measured new readings, they determine the corresponding buckets and TDMA frames themselves without any centralized controls. The sensor nodes keep the sleep-mode. They are usually activated during their corresponding time slot and send their readings to the base station.

Figure 1

A histogram-mapped TDMA schedule.

Figure 2 shows an example of gathering the lowest k readings sequentially. We assume that there are five time slots in each TDMA frame and ten TDMA frames per round. All the sensor nodes measure temperature from 0 to 100 and transmit sensor readings directly to the base station. Sensor nodes ${A, B, C, D, E}$ measure temperature and determine their corresponding frames to ${# 2, # 2, # 2, # 3, # 2}$ . In the result, the lower the sensor reading is, the earlier it is collected. Each sensor node can transmit its readings to the base station in ascending order.

Figure 2

An example of data gathering with SAC.

3.2. Lazy k-Validation to Terminate Data Collection

We propose a lazy k-validation algorithm to avoid incorrect termination in the multihop network and adapt a sequence-aware data gathering algorithm to top-k monitoring. Data gathering must be finished to minimize energy consumption after collecting kth sensor reading. A simple implementation of collecting only k-readings is to count the number of the collected readings. If the base station collects k-readings, then it broadcasts a stop signal into the sensor network and terminates the data collection. However, it may lead to incorrect results or termination in the multi-hop network. We assume that in the multi-hop network, sensor nodes B and G pick the same bucket and TDMA frame, but the depth of each sensor node and time durations that transfer readings to the base station are different. G measures higher sensor reading and transmits its sensor reading earlier than B, as shown in Figure 3. If time slots are assigned into a TDMA frame as many as the number of sensor nodes, each sensor node can transfer its readings to the base station without interference. Otherwise, it cannot be guaranteed that the sensor readings of G will arrive in the base station earlier than the sensor readings of B. The multi-hop network can generally suffer from packet order inversion. As shown in Figure 3, for example, if the readings of B $(= 51)$ are collected earlier than the readings of G $(= 56)$ due to the communication latency, the base station increases the number of collected readings and tries to terminate data gathering for the top-1 query. To overcome this incorrectness, we use the TDMA scheduling scheme like it is shown in Figure 4. Each TDMA frame has to guarantee enough time for transmission from leaf nodes to the base station. The base station validates whether or not the count value becomes more than k after each TDMA frame is finished.

Figure 3

An example of communication latency (k = 1).

Figure 4

TDMA schedule for data gathering.

Algorithm 2 presents the pseudocode to terminate data gathering for each epoch. We assume that a routing tree is constructed by the conventional algorithm like TAG. All sensor nodes determine their own buckets and TDMA frames by Algorithm 1. In SAC, the sensor nodes transmit their sensor readings to the base station. The base station counts the number of received readings. When each TDMA frame is finished, the base station validates whether the count value is more than k or not. If so, the base station broadcasts the beacon message into the sensor network. Otherwise, the validation for the next TDMA frame is continued to collect more sensor readings. The beacon message is a signal to stop gathering sensor readings, and it is significantly smaller than other messages. We exploit an intuitive approach for processing continuous top-k queries. Our proposed algorithm repeats ad hoc top-k processing at each epoch. Although this manner is not sophisticated, it is suitable for sensor networks, which have low computation performance.

Algorithm 1: Find the corresponding bucket for each epoch.

Require:

${d, v_{\min}, v_{\max}, D}$ from the base station

$t_{s}$ = {start time of the current epoch}

${b u c k e t s}_{i = [1 d]} = [l_{i}, u_{i}] = [v_{\min} + (i - 1) \cdot ⌈ \frac{L}{d} ⌉, v_{\min} + i \cdot ⌈ \frac{L}{d} ⌉]$

${F r a m e}_{j = [1 d]} = [t_{s} + (i - 1) \cdot ⌈ \frac{D}{d} ⌉, t_{s} + i \cdot ⌈ \frac{D}{d} ⌉]$

(1) sleep mode {turn off communication circuit}

(2) v←current sensor reading

(3) selectedFrame←nil

(4) for $(i = 1; i \leq d; i + +)$ do

(5) if (sensor reading v falling in ${b u c k e t}_{i} [l_{i}, u_{i}]$ ) then

(6) selectedFrame← ${F r a m e}_{i}$

(7) end if

(8) end for

(9) for $(i = 1; i \leq d; i + +)$ do

(10) if $s e l e c t e d F r a m e = {F r a m e}_{i}$ then

(11) active_mode (turn on communication circuit)

(12) transmit v

(13) sleep_mode (turn off communication circuit)

(14) else

(15) waiting for next TDMA Frame

(16) end if

(17) end for

(18) active_mode (turn on communication circuit)

(19) waiting for stop signal

Algorithm 2: Termination of data gathering for each epoch.

Require:

k = the number of final result

${d, v_{\min}, v_{\max}, D}$ from the base station

$t_{s}$ = {start time of the current epoch}

${b u c k e t s}_{i = [1 d]} = [l_{i}, u_{i}] = [v_{\min} + (i - 1) \cdot ⌈ \frac{L}{d} ⌉, v_{\min} + i \cdot ⌈ \frac{L}{d} ⌉]$

${F r a m e}_{j = [1 d]} = [t_{s} + (i - 1) \cdot ⌈ \frac{D}{d} ⌉, t_{s} + i \cdot ⌈ \frac{D}{d} ⌉]$

(1) broadcast ${d, v_{\min}, v_{\max}, D, Q}$

(2) top-k candidates = Ø

(3) count = 0

(4) for $(i = 1; i \leq d; i + +)$ do

(5) v = receive sensor readings for $F r a m e_{i}$

(6) if $v \neq Ø$ then

(7) count = count + ${t h e # o f v}$

(8) top-k candidates = top-k candidates $\cup v$

(9) if count ≥ k then

(10) break

(11) end if

(12) end if

(13) end for

(14) broadcast stop-signal

(15) top-k = compute final top-k result from top-k candidates

(16) return top-k

3.3. Extension for Hierarchical Network Topology

We propose an extended sequence-aware top-k monitoring which operates in three phases: (i) establishing a routing tree, (ii) top-k query processing in the cluster level, and (iii) top-k query processing in the tree level. The expanded algorithm establishes a routing tree once. Next, it repeats ad-hoc top-k query processing in both cluster and tree levels.

(1) Establishing a Routing Tree. The sensor nodes are clustered using clustering schemes like LEACH and HEED. After the formation of the clustered network, the routing tree is organized by connecting cluster heads, as shown in Figure 5. A cluster head directly communicates with a distant observer.

Figure 5

Establishing a routing tree with clusters.

(2) Top-k Processing in Cluster Level. The sensor nodes transmit their readings to their cluster heads over sequence-aware data routing in the cluster level. The incorrectness due to multi-hop routing can be ignored, because cluster members can communicate with each other in a single hop. In the cluster level, each cluster head elects k or less candidates for top-k results. Figure 6 represents an example of top-2 query processing. The current TDMA frame is mapped to a bucket which has the range $[90, 100]$ . The sensor nodes which are assigned into the TDMA frame are activated and transmit their readings to their cluster heads. The cluster heads collect the sensor readings ${93}$ , ${95}$ , and ${96, 92}$ in the clusters A, B, and C, respectively. Then, the cluster head in the cluster C broadcasts a beacon message to stop collecting sensor readings from their members, because two readings are already collected for the top-k result in the cluster. Finally, the cluster heads determine ${93}$ , ${95}$ , and ${96, 92}$ as the top-k candidates.

Figure 6

Top-k query processing in the cluster level.

(3) Top-k Processing in the Tree Level. The tree level is composed of cluster heads. Top-k candidates which are determined in the cluster level are transmitted from the leaf nodes to the root node along the routing tree, like TAG. Parent nodes collect top-k candidates from their child nodes, filter out unnecessary readings, and transmit top-k candidates to their parents repeatedly. The root node tries to determine the final top-k results and transmits top-k results to the base station. If the number of readings is over k in the root node, the root node broadcasts a beacon message into the sensor network to stop data gathering. Otherwise, the next frame is performed. As shown in Figure 7, for example, the candidates ${95}$ and ${96, 92}$ are determined in the clusters B and C. They are transmitted to the head of the cluster A. Then, the head determines ${96}$ and ${95}$ as top-2 candidates. Finally, the root transmits ${96, 95}$ as the final result to the base station and broadcasts a beacon message into the sensor network to stop the top-k processing.

Figure 7

Top-k query processing in the tree level.

4. Performance Evaluation

4.1. Simulation Setup

We have developed a simulator based on Java to evaluate the proposed two algorithms, sequence-aware top-k monitoring (SAT), and an extended SAT (SAT-c). The energy consumption for sending a message is determined by a cost function, $S \cdot (C_{t} + C_{a} \cdot D^{2})$ , where S is the message size, $C_{t}$ is the transmission cost, $C_{a}$ is the amplitude cost, and D is the distance of message transmission. As in [8, 9], we set $C_{t} = 50$ nJ/b and $C_{a} = 100$ pJ/b/m² in the simulation. The energy consumption for sending a message is determined by a cost function ( ${S \cdot C}_{r}$ ), where S is the message size and $C_{r}$ is the transmission cost. We set $C_{r} = 50$ nJ/b in the simulation. Wireless transmission laws dictate that power attenuation is proportional to the square of the covered distance. If the distances are small (up to hundreds of meters), then the power attenuation can be assumed to be linear with the transmission radius. Practically, other factors such as noise or physical obstacles may also affect the energy consumption. For simplicity, we assume the absence of these factors in our experiment, and therefore use the distance between nodes to account for the required transmission power level among them. The detailed simulation parameters are listed in Table 1. We assume that both a sensor identity and a sensor reading take 4 bytes, and a filtering window takes 8 bytes. The initial energy sets 0.1 Joule at each sensor node. The default value of k in a top-k query is set to 3. The variation rate (τ) means the rate of the number of top-k results that are changed in the current epoch over top-k results in the previous epoch. The default variation rate is set to 0.02. We simulated a multi-hop network over the tree-based network topology, as shown in Figure 8. We assume that the TAG routing tree is employed for the base station to communicate with the sensor nodes as FILA.

Table 1

Simulation parameters.

Parameters	Values
Number of distributed sensor nodes ( $E A$ )	48–440
Initial energy of sensor nodes (joule)	0.1
Size of sensor identifier (bytes)	4
Size of sensor reading (bytes)	4
Default value of k in a top-k query	3
Default variation rates	0.02
Size of filter window (bytes)	8

Figure 8

Network layout.

Figure 9 shows the representative segments of the TEMP data traces in the simulation. The sensor readings are simulated using the real traces provided by the Live from Earth and Mars (LEM) project [35] at the University of Washington. We extracted many subtraces starting at different times. We assigned each subtrace to the readings of each sensor. We controlled the variance of readings with the starting times of subtraces.

Figure 9

Sample of real data traces.

We first compare our proposed algorithms, SAT and SAT-c, with the different number of buckets. We then compare our approaches with FILA in terms of (i) the value k, (ii) the scale of sensor networks (the number of sensor nodes), (iii) variation rate. The following metrics are employed for performance comparisons. (i)

Network Lifetime. The network lifetime is defined as the time duration before the first sensor node runs out of power. It serves as the primary metric in the performance evaluation.

(ii)

Residual Energy. We measure the minimum (Min), maximum (Max), and average (Avg) residual energy from sensor nodes for a specific round (=1000).

4.2. Query Reevaluation and Filter Updates in FILA

Figure 10 shows the number of sensor nodes that perform query reevaluation and filter update(s) as k values change from 1 to 10. The variation rate and the number of sensor nodes are set to 0.02 and 120, respectively. The number of top-k candidates is usually under k because the filters prune unnecessary updates effectively. However, k or more sensor nodes must perform query evaluation and filter updates, therefore consuming much energy for these executions. Similar performance trends are obtained as the number of sensor nodes changes from 48 to 440, as shown in Figure 11. Although top-k candidates are constant, usually as the k value changes, query reevaluation and filter updates still increase. This is because the number of sensor nodes that fall in the value range bounded for top-k candidates increases with the number of the deployed sensor nodes. When the number of sensor nodes becomes over 224, the number of sensor nodes that must perform the filter update is constant with the top-k candidate. Figure 12 shows the number of sensor nodes that perform query reevaluation and filter update(s) as the variation rate changes from 0.01 to 0.1. Query reevaluation and filter updates are indispensable to the variation of top-k results. These executions increase linearly with the variation rate.

Figure 10

Query Reevaluation and Filter Update(s) ( $d = 70$ , $τ = 0.02$ , 120 sensor nodes).

Figure 11

Query Reevaluation and Filter Update(s) (k = 3, $d = 70$ , and $τ = 0.02$ ).

Figure 12

Query Reevaluation and Filter Update(s) (k = 3, $d = 70$ , 120 sensor nodes).

4.3. The Number of Buckets (=d)

Figure 13 shows the network lifetime as the number of buckets d changes from 10 to 100. The k value and the variation rate are set to 3 and 0.02, respectively. As shown in Figure 13, the network lifetime generally increases with the number of buckets in both schemes. This is mainly because the number of collecting and routing false positives is reduced. SAT-c shows better performance than that of SAT because it effectively prunes the false positives during routing them in the network. Though the number of buckets is an important factor that affects the network lifetime, finding optimal value without prior knowledge may be difficult because it depends on unknown underlying probability distribution of sensor readings. It can be predefined by some machine learning algorithms in the base station. For simplicity, we assume the absence of finding optimal d and set the default number of buckets to 70 in our next experiments.

Figure 13

Network lifetime (k = 3, $τ = 0.02$ , 120 sensor nodes).

4.4. The Value k

Figures 14 and 15 show the network lifetime and the residual energy as the value k changes from 1 to 10. The variation rate and the number of buckets are set to 0.02 and 70, respectively. The network lifetime linearly decreases when the values k increase in FILA and our approaches. While the communication cost is increased significantly for adjusting filter setting(s) and reevaluating the query with the scale of the value k in FILA, the probability of collecting the false positives is increased in our approaches. Although these factors increase energy consumption simultaneously, our approaches narrow the gap between the maximum and the minimum and are more energy-efficient to process top-k queries than FILA.

Figure 14

Network lifetime ( $d = 70$ , $τ = 0.02$ , 120 sensor nodes).

Figure 15

Residual energy ( $d = 70$ , $τ = 0.02$ , 120 sensor nodes).

4.5. The Number of Sensor Nodes

Figures 16 and 17 show the network lifetime and the residual energy as the number of sensor nodes is changed from 48 to 440. The number of buckets, variation rate, and the value k are set to 70, 0.02, and 3, respectively. FILA performs better than our approaches when the number of sensor nodes is less than 80. Our approaches collect k readings periodically regardless of the variation of the top-k result, while FILA suppresses unnecessary sensor updates. As the number of sensor nodes is increased to more than 80, our approaches outperform FILA. FILA consumes much energy for adjusting filter setting(s) and reevaluating the query over our approaches. As mentioned above, this is because the number of sensor nodes that fall in the value range bounded for top-k candidates is increased with the number of the deployed sensor nodes. Our approaches are more suitable for the large-scale networks. SAT-c that suppresses false-positives in the cluster level is more energy-efficient than SAT.

Figure 16

Network lifetime (k = 3, $d = 70$ , and $τ = 0.02$ ).

Figure 17

Residual energy (k = 3, $d = 70$ , and $τ = 0.02$ ).

4.6. Variation Rate

Figure 18 shows the network lifetime as the variation rate changes from 0.01 to 1. The number of buckets and the value k are set to 70 and 3, respectively. FILA shows that the network lifetime decreases as the variation rate increases. This is because filter updates and query reevaluation are increased simultaneously. On the contrary, our approaches are independent of the variation rate and prolong the network lifetime. This is mainly because the variation of top-k helps to distribute the communication overhead of the sensor nodes that send their readings to the base station for the top-k result, as shown in Figure 19. If the top-k result is constant, specific k nodes send their readings to the base station periodically for each round and consume energy intensively. It is also interesting to observe that the network lifetime can be extended with the variation rate in the result. If the variation rate can be measured, it may be possible to improve the energy-efficiency by adjusting the collection period adaptively.

Figure 18

Network lifetime (k = 3, $d = 70$ , 120 sensor nodes).

Figure 19

Residual energy (k = 3, $d = 70$ , 120 sensor nodes).

5. Conclusion

In this paper, we have proposed a novel top-k monitoring algorithm, called SAT. SAT is based on a sequence-aware data collection method that collects readings sequentially by their values. The sensor nodes transmit their sensor readings sequentially to the base station. When the base station collects k-readings, it broadcasts a simple message to stop the data gathering phase. Moreover, we extend our approach to a cluster-based top-k monitoring to filter out false positives in hierarchical levels such as the cluster level and the tree level. It has been shown through various experiments that our approaches reduce the energy consumption efficiently with no filters and are more suitable for the large-scale networks over the existing approach. In our future work, we will study approximate top-k query approach based on SAT. We plan to extend the sequence-aware data collection to other aggregate functions such as skyline, MIN, MAX, and kNN. We are also interested in adjusting the collection period adaptively to improve the energy-efficiency.

Footnotes

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

This research was supported by the MSIP (Ministry of Science, ICT and Future Planning), Korea, under the ITRC (Information Technology Research Center) support program (NIPA-2013-H0301-13-4009) supervised by the NIPA (National IT Industry Promotion Agency), by the MSIP (Ministry of Science, ICT and Future Planning), Korea, under the CPRC (Communications Policy Research Center) support program supervised by the KCA (Korea Communications Agency) (KCA-2013-003) and the Korea Institute of Science and Technology Information (K-13-L01-C02).

References

Singh

Lobiyal

A novel energy-aware cluster head selection based on particle swarm optimization for wireless sensor networks

Human-Centric Computing and Information Sciences 2012 2012 2 1 18

Shiha

Lobiyal

D. K.

Performance evaluation of data aggregation for cluster-based wireless sensor network

Human-Centric Computing and Information Sciences 2013 2013 3 1 17

Yoon

Kim

Chang

An energy-efficient routing protocol using message success rate in wireless sensor networks

Journal of Convergence 2013 4 1 15 22

Lee

Chung

Jhang

A study of wireless sensor network routing protocols for maintenance access Hatch condition surveillance

Journal of Information Processing Systems 2013 9 2 237 276

Sumathi

Srinivas

M. G.

A survey of QoS based routing protocols for wireless sensor networks

Journal of Information Processing Systems 2012 8 4 589 602

Considine

Kollios

Byers

Approximate aggregation techniques for sensor databases

Proceedings of the 20th International Conference on Data Engineering (ICDE '04)

April 2004

449 460

2-s2.0-2442576849

10.1109/ICDE.2004.1320018

Deligiannakis

Kotidis

Roussopoulos

Hierarchical in-network data aggregation with quality guarantees

Proceedings of the International Conference on Extending Database Technology (EDBT '04)

2004

Mouratidis

Papadias

Bakiras

Tao

A threshold-based algorithm for continuous monitoring of k nearest neighbors

IEEE Transactions on Knowledge and Data Engineering 2005 17 11 1451 1464

2-s2.0-28244469516

10.1109/TKDE.2005.172

Sharaf

M. A.

Beaver

Labrinidis

Chrysanthis

P. K.

Balancing energy efficiency and quality of aggregate data in sensor networks

Journal of Very Large Data Bases 2004 13 4 374 403

2-s2.0-10644243702

10.1007/s00778-004-0138-0

10.

Shrivastava

Buragohain

Agrawal

Suri

Medians and beyond: new aggregation techniques for sensor networks

Proceedings of the 2nd International Conference on Embedded Networked Sensor Systems (SenSys '04)

November 2004

239 249

2-s2.0-27644511275

11.

Madden

S. R.

Franklin

M. J.

Hong

TAG: a tiny aggregation service for Ad-Hoc sensor network

Proceedings of the 5th Annual Symposium on Operating Systems Design and Implementation

2002

131 146

12.

Tang

Lee

W.-C.

Top-k monitoring in wireless sensor networks

IEEE Transactions on Knowledge and Data Engineering 2007 19 7 962 976

2-s2.0-34249805447

10.1109/TKDE.2007.1038

13.

Cougar, http://www.cs.cornell.edu/bigreddata/cougar/index.php

14.

TinyDB, http://telegraph.cs.berkeley.edu/tinydb/

15.

Sharaf

M. A.

Beaver

Labrinidis

Chrysanthis

P. K.

Balancing energy efficiency and quality of aggregate data in sensor networks

Journal of Very Large Data Bases 2004 13 4 384 403

2-s2.0-10644243702

10.1007/s00778-004-0138-0

16.

Madden

Franklin

M. J.

Hellerstein

J. M.

Hong

The Design of an Acquisitional Query Processor for Sensor Networks

Proceedings of the ACM SIGMOD International Conference on Management of Data

June 2003

491 502

2-s2.0-1142291591

17.

Yoon

Shahabi

The Clustered AGgregation (CAG) technique leveraging spatial and temporal correlations in wireless sensor networks

ACM Transactions on Sensor Networks 2007 3 1

2-s2.0-34047198289

10.1145/1210699.1210672

1210672

18.

Heinzelman

W. R.

Chandrakasan

Balakrishnan

Energy-efficient communication protocol for wireless microsensor networks

Proceedings of the 33rd Annual Hawaii International Conference on System Siences (HICSS '33)

January 2000

908 918

2-s2.0-0033877788

19.

Heinzelman

W. B.

Chandrakasan

A. P.

Balakrishnan

An application-specific protocol architecture for wireless microsensor networks

IEEE Transactions on Wireless Communications 2002 1 4 660 670

2-s2.0-33646589837

10.1109/TWC.2002.804190

20.

Younis

Fahmy

HEED: a hybrid, energy-efficient, distributed clustering approach for ad hoc sensor networks

IEEE Transactions on Mobile Computing 2004 3 4 366 379

2-s2.0-10944266504

10.1109/TMC.2004.41

21.

Soro

Heinzelman

W. B.

Cluster head election techniques for coverage preservation in wireless sensor networks

Ad Hoc Networks 2009 7 5 955 972

2-s2.0-60249083032

10.1016/j.adhoc.2008.08.006

22.

Youssef

Younis

Overlapping multihop clustering for wireless sensor networks

IEEE Transactions on Parallel and Distributed Systems 2009 20 12 1844 1856

2-s2.0-74449091634

10.1109/TPDS.2009.32

23.

Yeo

M. H.

Kim

Y. M.

Yoo

J. S.

An energy-efficient distributed clustering approach in wireless sensor networks

IEICE Transactions on Communications 2009 92-B 2 620 623

2-s2.0-77952664629

10.1587/transcom.E92.B.620

24.

Yeo

Seo

Yoo

Data correlation-based clustering algorithm in wireless sensor networks

KSII Transactions on Internet and Information Systems 2009 3 3 331 343

2-s2.0-67649464498

10.3837/tiis.2009.03.007

25.

Silberstein

Braynard

Ellis

Munagala

Yang

A sampling-based approach to optimizing top-k queries in sensor networks

Proceedings of the 22nd International Conference on Data Engineering (ICDE '06)

April 2006

2-s2.0-33749605102

10.1109/ICDE.2006.10

26.

Zeinalipour-Yazti

Vagena

Gunopulos

Kalogeraki

Tsotras

Vlachos

Koudas

Srivastava

The threshold join algorithm for top-k queries in distributed sensor networks

Proceedings of the 2nd International Workshop on Data Management for Sensor Networks (DMSN '05)

August 2005

61 66

2-s2.0-77954452193

10.1145/1080885.1080896

27.

Zeinalipour-Yazti

Andreou

Chrysanthis

P. K.

Samaras

MINT views: materialized In-Network Top-k views in sensor networks

Proceedings of the 8th International Conference on Mobile Data Management (MDM '07)

May 2007

182 189

2-s2.0-48649108962

10.1109/MDM.2007.34

28.

Mai

H. T.

Y. L.

W. L.

Myoung

H. K.

Processing top-k monitoring queries in wireless sensor networks

Proceedings of the 3rd International Conference on Sensor Technologies and Applications (SENSORCOMM '09)

June 2009

545 552

2-s2.0-70449500750

10.1109/SENSORCOMM.2009.91

29.

Chen

Zhou

Guan

Towards energy-efficient skyline monitoring in wireless sensor networks

Proceedings of the European Conference on Wireless sensor networks (EWSN ’07)

2007

101 116

30.

Xin

Wang

Chen

Zhang

Wang

Continuously maintaining sliding window skylines in a sensor network

Proceedings of the International Conference on Database systems for Advanced Applications (DASFAA ’07)

2007

509 521

31.

Xin

Wang

Zhang

Energy-efficient skyline queries over sensor network using mapped skyline filters

Proceedings of the Asia-Pacific web and International conference on web-age information management conference on Advances in data and web management (APWeb/WAIM ’07)

2007

144 156

32.

Shen

Chen

Deng

Location-based skyline queries in wireless sensor networks

Proceedings of the International Conference on Networks Security, Wireless Communications and Trusted Computing, (NSWCTC '09)

April 2009

391 395

2-s2.0-67650479279

10.1109/NSWCTC.2009.314

33.

Seong

D. O.

Park

J. H.

Yeo

M. H.

Yoo

J. S.

An energy-efficient skyline query processing method using priority-based bottom-up filtering

Proceedings of the International Conference on Convergence and Hybrid Information Technology (ICHIT '09)

August 2009

159 162

2-s2.0-77954709525

10.1145/1644993.1645022

34.

Ergen

S. C.

Varaiya

TDMA scheduling algorithms for wireless sensor networks

Wireless Networks 2010 16 4 985 997

2-s2.0-77954087704

10.1007/s11276-009-0183-0

35.

Live from Earth and Mars (LEM) Project

2013, http://www-k12.atmos.washington.edu/k12/grayskies/index.html