Prediction-Based Filter Updating Policies for Top- k Monitoring Queries in Wireless Sensor Networks

Abstract

Processing top-k query in an energy-efficient manner is an important topic in wireless sensor networks. Redundant data transmitting between base station and sink node is avoided by installing filters on sensor nodes; thus, communication overhead between base station and sensor nodes is decreased. However, existing algorithms such as FILA, and DAFM consume much energy when updating the filter window. In this paper, we propose a new top-k algorithm named PreFU which is based on prediction models to update window parameters of filters. PreFU can predict the next s step sensor values based on time series predicting models which can be built by historical data. By estimating the cost of updating window parameters based on predicted sensor values, updates of filter window parameters can be reduced. Thus, the cost of updating window parameters is decreased. Experimental results show that our PreFU algorithm is more energy-efficient than existing algorithms while guaranteeing the accuracy of top-k query results.

1. Introduction

Despite all progress within research, energy consumption is still a major issue for wireless sensor networks [1, 2]. In addition, the quantity of generated data is large and dense. However, users are always interested in max or min k objects among them. Thus, top-k query processing is in high demand for many applications in uncertain databases and relational databases [3–5]. The top-k query in wireless sensor networks is different from general queries that data itself on sensor node could not determine whether it would be in final results. An intuitive way is that top-k query results will be determined by base station after collecting data from all sensor nodes. For our top-k queries, k is not restricted and the query determines the k highest observed values. The query also determines the full set of nodes that report the k highest values and is executed periodically starting at some point in time and reporting values for a number of subsequent rounds. The centralized query processing produces a large amount of communication cost and wastes lots of energy. So, how to process query in an energy-efficient manner is an important topic in wireless sensor networks, which can supply top-k results to users by minimizing the energy consumption. Generally speaking, query processing in wireless sensor networks is essentially different from that in traditional databases. A wireless sensor network containing N sensor nodes can be viewed as a distributed system, while this special distributed system is different from the general distributed system for there is not any single powerful node serving as the collection center to collect data from all the sensors. Each sensor transmits its data to the base station through multihop relays, which consumes energy for each data transmission. From another aspect, as the major optimization objective for query processing in wireless sensor networks, the network lifetime is determined not only by the total energy consumption of all sensors but also by the maximum energy consumption among the sensors. The sensors near to the base station consume more energy than the others, because they relay the data for the others and they will exhaust their batteries first. Once they run out of energy, the rest of the sensors will be disconnected from the base station, no matter how much residual energy left the rest of the sensors. Consequently, the network is no longer functioning even if the total energy consumption per query is reasonably small. Hence, how to evaluate queries effectively and efficiently in wireless sensor networks poses great challenges.

A typical solution for answering top-k queries in wireless sensor networks is by making the use of filters. Filters are broadcasted into the network and used by individual sensor node to decide whether its value is relevant to compute the query result. FILA (filter-based monitoring approach) [6] is an energy-efficient approach among these algorithms. The basic idea is to install a filter at each child sensor node, which avoids redundant data from transmitting to parent nodes or base station. However, the approach consumes much energy on updating filters. Mai et al. [7, 8] proposed DAFM algorithm according to FILA. The algorithm predicts the next sensed values of sensor nodes by linear regression model. Then, base station decides whether to update filtering windows based on predicted benefits. When data is in complicated distributions and varies widely over time, the predicted performance of linear regression is unsatisfactory such that the performance of DAFM algorithm gets worse. The contributions of this paper are summarized as follows. (i)

We adopt a powerful time series model ARIMA to predict next s steps sensor values based on historical sensor data. The ARIMA time series model is suitable for sensor data due to the temporal correlations. Our proposed PreFU approach focuses on reevaluation when filters need updating. PreFU introduces prediction mechanism to achieve less energy consumption compared to eager filter update and lazy update policies in FILA approach. Also, compared to DAFM approach, our PreFU approach guarantees smaller mean squared error and thus can perform effectively (Algorithm 1).

(ii)

Instead of one-step-forward prediction of DAFM approach, we adopt adaptive s step prediction based on prediction errors. To select suitable s values based on specified sensor data, PreFU outperforms existing approaches.

(iii)

Extensive experiments are conducted to evaluate proposed PreFU approach by using real data traces. The results show that our PreFu approach outperforms DAFM, FILE-e, and FILA-l in terms of both energy consumption and network lifetime under single hop and multihop network configurations.

Algorithm 1: PreFU algorithm.

(1) top-k← Query $(k)$ ⊳ Initialize top- $k$ set after collecting data from all sensor nodes

(2) if Get new values on base station then

(3) $N_{u}$ ← reevaluation $()$ ⊳ Obtain sensor node set in which the filter window may be updated

(4) end if

(5) $s t e p$ ← 0

(6) repeat

(7) for $i = 1$ to $| N_{u} |$ do

(8) $v$ _ $p r e$ ← predict( $N_{i}$ ) ⊳ Using ARIMA models to predict

(9) end for

(10) $s t e p$ ← $s t e p + 1$

(11) Calculate $C_{e}$ , $C_{update}$ , $C_{old}$ for s-step-ahead prediction

(12) until $s t e p$ = =s

(13) if $C_{e} + C_{update} < C_{old}$ then

(14) Update-Window $()$ ⊳ Update windows in for nodes in $N_{u}$

(15) end if

The rest of this paper is organized as follows. In Section 2, we discuss related work. Typical filtering methods for top-k query processing are introduced in Section 3 and our proposed PrePU algorithm is provided in Section 4. In Section 5, extensive simulations are conducted to show the efficiency and accuracy of the proposed method. Finally, we conclude this paper in Section 6.

2. Related Work

A naive implementation of monitoring top-k query is to use a centralized approach in which all sensor readings are periodically collected by the base station, which then computes the top-k results set directly. However, a wireless sensor network can be viewed as a distributed network which consists of lots of energy-limited sensor nodes, and communication cost is the main energy consumption. Therefore, there is no doubt that the centralized approach will consume extra energy because of the transmission of massive data. In order to reduce the communication cost in data collection, Madden et al. proposed an in-network aggregation technique, known as TAG [9], which keeps unavailable data from transmission compared with centralized algorithms. However, this approach incurs unnecessary updates in the network and is not really energy-efficient.

Wu et al. [6] propose FILA approach for top-k monitoring, which is to install a filter on each sensor node to filter out unnecessary data that are not contributed to final results. Reevaluation and filter setting are two critical aspects that ensure the correctness and effectiveness. When the new filters are different from the old ones maintained by sensor nodes themselves, base station needs to update them. But, when sensing values on nodes vary widely, base station needs updating filters to related nodes frequently which leads to large scales of updating cost and makes the performance of algorithm worse. Mai et al. [8] propose DAFM approach which aims to reduce the communication cost of sending probe messages in reevaluation aspect as well as the transmission cost in filter updating in FILA. DAFM approach predicts the next value of sensor nodes whose values are whether or not out of filtering windows by linear regression model. To some extent, this approach decreases the cost of filter updating. However, the sensed values are affected by various factors; the performance is worse when linear regression model was used to complicate data.

Besides these, filter-based and aggregation are two main strategies at present that they cope with each other to process top-k query in wireless sensor networks. Chen et al. propose QF (quantile filter, QF) [10] approach which treats sensing values and its sensor as a point. A top-k query is to return k points with highest sensing values. The goal of algorithm is to decrease energy consumption and prolong the lifetime of network which is not only minimizing the total energy consumption but also consuming less energy on each node. Liu et al. [11] propose a new cross pruning (XP) aggregation framework for top-k query in wireless sensor networks. There is a cluster-tree routing structure to aggregate more objects locally and a broadcast-then-filter approach in the framework. In addition, it provides an in-network aggregation technique to filter out redundant values which enhance in-network filtering effectiveness. Abbasi et al. [12] proposed MOTE (model-based optimization technique) approach based on assigning filters on nodes by model-based optimization. Nevertheless, it is an NP-hard problem on how to get optimal filter settings for top-k set.

In addition, there are other related works about how to process top-k query in wireless sensor networks. Yeo et al. propose a novel technology called PRIM (priority-based top-k monitoring) which relied on the semantic of top-k query [13]. Its basic idea is to gather data according to priority; that is to say, the higher readings are collected earlier. Cho et al. propose POT algorithm (partial ordered tree) which considers the space correlation to maintain k sensor nodes with highest sensing values [14]. Michel et al. propose a framework KLEE to process top-k query [15] which allows for trade-off efficiency against result quality and bandwidth saving against communications. Different from the above, Silberstein et al. propose a sampling-based approach to evaluate approximate top-k queries in wireless sensor networks [16].

Energy-efficiency is a critical issue in wireless sensor networks and also an important indicator to evaluate the effectiveness and practicality of algorithm. In recent years, some researchers introduced time series models to wireless sensor networks. Tulone and Madden [17] apply autoregressive models to data collection in wireless sensor networks. The basic idea is to build a model on base station and each sensor node. When base station predicts the values of nodes until outlier readings produced, the nodes send sensing readings to base. When a reading is not properly predicted by the model, models are relearned to adapt changes. They are approximate readings collected by the approach. [18] is similar to [17]; both of them are approaches based on time series predicting in wireless sensor networks. The main difference is that the previous one is based on ARIMA. Although time series have been applied to wireless sensor networks, they just utilize to minimize energy consumption on data collection, not applied to top-k query processing. In this paper, we propose a novel top-k monitoring algorithm PreFU that combined the time series model ARIMA with FILA approach which avoids unnecessary filter updating cost and minimizes sensor node energy consumption.

3. Filtering Method for Top-k Query Processing

A typical system architecture of a wireless sensor network includes a base station and a number of sensor nodes. The base station has enough energy while sensor nodes are powered by batteries and energy-limited. When the base station is beyond a sensor node's radio coverage, data are sent to base station via other sensor nodes through multiple jumps. Otherwise, sensed data are sent to base station directly.

Sensor nodes sense the physical phenomenon at a fixed sampling rate, such as temperature, humidity and light. When receiving the top-k query request, base station starts query and returns the results set to users at regular intervals. Assume that the sensed value on node $N_{i}$ is $v_{i}$ , a top-k query is to return the ordered list of sensor nodes ℛ with the highest readings at every epoch; that is, $ℛ = 〈 N_{1}, N_{2}, \dots, N_{k} 〉$ , $\forall i < j$ , $v_{i} \geq v_{j}$ , and $\forall l \neq i$ ( $i = 1,2, \dots, k$ ), $v_{l} \leq v_{k}$ . The results are maintained by base station and returned to users finally. The goal in this paper is to prolong the lifetime of wireless sensor networks by minimizing the overall energy consumption.

Initially, after collecting the readings from all sensor nodes, the base station sorts the readings and gets the initial top-k result set. The base station computes a filtering window $[l_{i}, u_{i}]$ for each node based on the initial top-k result set. Then, it sends these windows to corresponding sensor nodes. At the next sampling epoch, if the value on node $N_{i}$ is within $[l_{i}, u_{i}]$ , node $N_{i}$ need not update its value maintained by base station. Otherwise, updating request is sent to base station. The base station will then reevaluate the top-k result and adjust the filter settings for influenced sensor nodes. According to different updating strategies, the base station sends the new filters to relevant sensor nodes based on updating strategies. In FILA, they provided two strategies: lazy filter update (FILA-L) and eager filter update (FILA-E).

In order to ensure the correctness of the algorithm, the space which is formed by all filtering windows should be continuous; that is, we set $u_{i + 1}$ to be equal to $l_{i}$ . FILA devises two filter setting approaches: uniform filter setting and skewed filter setting. In this paper, we adopt uniform filter setting; take Figure 1 as an example; the filtering windows of a top-k result are set by the following equation:

\begin{matrix} u_{i + 1} = l_{i} = \frac{v_{i} + v_{i + 1}}{2} (1 \leq i \leq k) . \end{matrix}

(1)

Figure 1

$k + 1$ windows in FILA.

To maximize the filtering capability, the upper bound of top-1 is set to $+ \infty$ ; the lower bound of nontop-k nodes are set to $- \infty$ . Then, the filtering window of nontop-k nodes which are ranked at kth is $[- \infty, l_{k}]$ . Seen from the above, the base station just needs maintaining $k + 1$ filtering windows as shown in Figure 1.

To some extent, FILA avoids redundant data from transmission and saves energy. However, there is massive unnecessary energy consumption. Assume that, after sampling in epoch t, the filtering windows are shown in Figure 2.

Figure 2

Fluctuation of $v_{3}$ at $t_{i}$ and $t_{i} + 1$ .

In Figure 2(a), at epoch $t_{i}$ , before sampling top-3 has the value of ${v_{1}, v_{4}, v_{3}}$ , and the results set is ${v_{1}, v_{4}, v_{3}}$ after sampling as shown in Figure 2(b); that is, the value $v_{3}$ jumps out of its filter and falls into the filtering window installed on $N_{4}$ . The base station needs to adjust the filtering windows for relevant nodes which are shown in Figure 2(c). However, after sampling in epoch $t_{i + 1}$ as shown in Figure 2(d), the value $v_{3}$ changes again and jumps into the frontal window. In FILA, fluctuation of data will cause frequent updates of the filter windows which will incur large scale of communication cost. Mai et al. [8] proposed an updating algorithm based on prediction which determines whether to update the filters according to the possible cost on updating. The algorithm decreases the cost on updating the filters to some extent. However, this approach is limited by the prediction performance of linear regression model.

In addition, from Figure 2, the values on the nontop-k vary in a small range, their filtering windows are useful to filter out irrelevant sensor nodes. However, the upper bound of nontop-k nodes is determined by the values of kth and $(k + 1)$ th sensor nodes; when the value which is ranked at kth varies, the filters on nontop-k will be affected directly. For a wireless sensor network that generally consists of large number of sensor nodes, k is usually small in a top-k query while a large number of sensor nodes fall into nontop-k node set. If the algorithm updates the filters once reevaluating the top-k results, it will consume more energy on updating the filters of nontop-k nodes.

Inherent defects of FILA and DAFM need better mechanisms to avoid too much communications while updating windows. So, we propose a new updating algorithm PreFU based on prediction by autoregressive integrated moving average (ARIMA) models. The algorithm evaluates the possible communication cost based on the s-step-ahead predicted values; usually s is smaller than 10. Then, the base station decides whether or not to update relevant filtering windows.

4. PreFU Approach

In FILA, when setting of the new filtering windows for nodes is changed after reevaluating the top-k results, the filters on the nontop-k nodes are affected, and the frequent fluctuation of data will lead to unnecessary communication cost on updating windows. Considering two aspects above, by updating approach based on prediction, the algorithm decides whether to send the new filtering window parameters to corresponding nodes. In fact, we could not get the exact future values for each sensor node, but it can be obtained by prediction. In this paper, our PreFU approach predicts the next value(s) by ARIMA models.

4.1. ARIMA Model

ARIMA is a time series predicted model which predicts the next value(s) according to historical data. A typical ARIMA model consists of three components: AR (auto regressive), I (the integrated), and MA (moving average). AR is a linear regression of the current value of the time series against one or more prior values of the series, which captures the relationship between current value and the latest p historical values. If there is a time series $Z_{t}$ , then the value at time t is represented by the latest $t - 1$ historical values, and the equation is shown as below:

\begin{matrix} Z_{t} = φ_{1} Z_{t - 1} + φ_{2} Z_{t - 2} + \dots + φ_{p} Z_{t - p} + a_{t} . \end{matrix}

(2)

Equation (2) is an autoregressive model and p is the order of the model. The parameters of the model are $φ_{1}, φ_{2}, \dots, φ_{p}$ and $a_{t}$ is a white noise series. Since usually a time series may receive random shocks in a noisy environment, the MA is introduced to capture the influence of random shocks to the future. We call it moving average model MA(q), in which random errors at time t and $t - 1$ in the time series satisfy a linear regression model, as shown below:

\begin{matrix} Z_{t} = a_{t} - θ_{1} a_{t - 1} - θ_{2} a_{t - 2} - \dots - θ_{q} a_{t - q}, \end{matrix}

(3)

where

θ_{1}, θ_{2}, \dots, θ_{q}

are moving average parameters of model which will be estimated. In general, the orders of AR or MA are high in order to describe the dynamic structure adequately. Without integrated components in ARIMA, the model is simplified as follows:

\begin{matrix} Z_{t} = φ_{0} + \sum_{i = 1}^{p} ‍ φ_{i} Z_{t - i} + a_{t} - \sum_{i = 1}^{q} ‍ θ_{i} a_{t - i}, \end{matrix}

(4)

where

φ_{0}

is an initialized constant. However, ARMA model usually assumes that data is stationary; that is, the statistical properties of data do not change over time. This assumption does not hold in most real data series; therefore, the integrated term is introduced to remove the impact of nonstationary data by differencing. The time series is satisfied with ARMA model after several times of differencing. In general, first-order differencing is sufficient. Therefore, a time series is represented by an ARIMA (

p, d, q

) model that the time series is stationary after d times of differencing. And p represents the amount of historical data with successive timestamps; q represents the number of the latest random shocks on time series.

4.2. Time Series Model-Driven Prediction

Prediction by ARIMA model usually consists of two steps. The first step is model identification and parameter estimation. The second is prediction.

4.2.1. Model Identification and Parameter Estimation

Each parent node builds an ARIMA model to predict the next threshold $Q_{filter}$ . In order to predict, parent v maintains enough thresholds, $L_{i} = 〈 t_{1}, z_{i, 1} 〉, 〈 t_{2}, z_{i, 2} 〉, \dots, 〈 t_{T}, z_{i, T} 〉$ , where $t_{j}$ ( $1 \leq j \leq T$ ) is an epoch number and $z_{i, j}$ is the sensor reading of $v_{i}$ at epoch $t_{j}$ . ARIMA model on node $v_{i}$ is built by $L_{i}$ . For simplicity, we suppose that $d = 0$ ; that is, the time series of $L_{i}$ is stationary and is satisfied with ARMA( $p, q$ ) model.

In order to get p and q, we introduce function of self-correlation (ACF) and partial autocorrelation (PACF). Self-correlation describes the simple correlation between values in time series. If $γ_{l}$ represents the self-correlation parameters, then it describes the correlation of sensor values. The computation is shown as below:

\begin{matrix} γ_{l} = \frac{\sum_{t = 1}^{T - l} ‍ (Z_{t} - \bar{Z}) (Z_{t + l} - \bar{Z})}{\sum_{t = 1}^{T} ‍ {(Z_{t} - \bar{Z})}^{2}} . \end{matrix}

(5)

T is the number of samples; l is the distance of interval; in general, $l = 20$ . Z is used to estimate the expectation of time series which describes the average value in arithmetic. PACF describes the conditional correlation between $Z_{t}$ and $Z_{t - k}$ when given time series of $Z_{t - 1}, Z_{t - 2}, \dots$ , and $Z_{t - k + 1}$ . The degree of correlation is measured by $φ_{l l}$ and is estimated by partial autocorrelation parameters. Consider

\begin{matrix} φ_{l l} = \frac{γ_{l} - \sum_{j = 1}^{l - 1} ‍ φ_{l - 1, j} γ_{l - j}}{1 - \sum_{j = 1}^{l - 1} ‍ φ_{l - 1, j} γ_{j}} . \end{matrix}

(6)

We can get the estimated values of p and q according to PACF and ACF. That is, the sample PACF cuts off at lag p. As for q, we can get it by ACF. For a time series $Z_{t}$ with $γ_{l}$ , if $γ_{l} \neq 0$ when $l \leq q$ and $γ_{l}$ when $l > q$ , then the order of MA is q. However, we just get the estimated p and q; they can be decided by information criteria called AIC (Akaike information criterion). AIC criteria find the minimum $p^{*}$ and $q^{*}$ to minimize the value of AIC. We view $p^{*}$ and $q^{*}$ as the optimal estimated values of p and q. The computation of AIC is shown as below:

\begin{matrix} AIC = \ln {\hat{σ}}^{2} + \frac{2 (p + q)}{T} . \end{matrix}

(7)

T is the sample size.

{\hat{σ}}^{2}

is the maximum-likelihood estimate of

σ^{2}

. Consider

\begin{matrix} {\hat{σ}}^{2} = \frac{\sum_{i = c + 1}^{T} ‍ {(Z_{t} - \sum_{j = 1}^{c} ‍ φ_{j j} \times Z_{i - j})}^{2}}{T - c} . \end{matrix}

(8)

After order determination of the ARMA model, we estimate parameters $(φ_{0}, φ_{2}, \dots, φ_{k})$ by the conditional least-squares method. Now, we can predict the next value of threshold by ARIMA model. In this procedure, the order of model will not be changed, but the parameters of the model are self-learning while processing top-k queries which ensures that prediction error is acceptable.

4.2.2. Predicting

As illustrated by (1) in Section 4.2, suppose that we are at time epoch h; Z is a time series formed by previous epoches just before h; one-step-ahead predicted value $v_{pre}$ of the time series is calculated as below:

\begin{array}{l} v^{pre} = E (Z_{h + 1} | Z_{h}, Z_{h - 1}, \dots, Z_{1}) \\ = φ_{0} + \sum_{m - 1}^{p} ‍ φ_{m} Z_{h + 1 - m} - \sum_{m = 1}^{q} ‍ θ_{m} α_{h + 1 - m} . \end{array}

(9)

4.3. Evaluation of the Cost

Predicting the next s values for each node in $𝒩_{u}$ , the algorithm evaluates the possible cost adopting the new and the old filtering windows separately. We denote the cost by $C_{new}$ and $C_{old}$ . The cost is referred to the number of communications caused by the fact that the values of nodes violate filters at the next sampling epoch. Only when $C_{new} < C_{old}$ , the base station sends the new filtering windows to nodes in $N_{u}$ .

Let $v_{i_{j}}^{pre}$ be jth step prediction value of node $i N_{i}$ ( $N_{i} \in 𝒩_{u}$ ); we calculate $C_{old}$ and $C_{new}$ as follows:

\begin{matrix} C_{old} = \sum_{i = 1}^{N} ‍ \sum_{j = 1}^{s} ‍ δ_{v_{i_{j}}^{pre}}, δ_{v_{i_{j}}} = {\begin{cases} 1 & v_{i_{j}}^{pre} \notin [l_{i}, u_{i}] \\ 0 & else, \end{cases} \end{matrix}

(10)

\begin{array}{l} C_{n e w} = C_{e} + C_{u p d a t e} = | N_{u} | + \sum_{i = 1}^{N} \sum_{j = 1}^{s} δ_{v_{i_{j}}^{p r e}}^{'} \\ δ_{v_{i_{j}}}^{'} = {\begin{array}{l} 1 & v_{i_{j}}^{p r e} \notin [l_{i}^{'}, u_{i}^{'}] \\ 0 & e l s e, \end{array} \end{array}

(11)

where

C_{e}

is the cost of updating filter windows for each sensor node in

N_{u}

and for s-step-ahead prediction, each sensor node in

N_{u}

just needs to be updated only once; that is,

C_{e} = | N_{u} |

. And

l_{i}^{'}

and

u_{i}^{'}

are computed in the same way as (1); that is,

\begin{matrix} u_{i + 1}^{'} = l_{i}^{'} = \frac{v_{i}^{'} + v_{i + 1}^{'}}{2} (1 \leq i \leq k) . \end{matrix}

(12)

If the value of the sensor node $N_{i}$ violates the filter, $v_{i}^{'}$ uses the new value sent to the base station for calculation. Otherwise, $v_{i}^{'} = v_{i}$ , that is, using old value for calculation.

In our PreFU algorithm, if the base station updates the filters to nodes, the cost is composed of two aspects: the cost of updating the new filters and the cost of sending the updated values violating the filters at the next s step sampling to base station, denoted as $C_{update}$ and $C_{e}$ , respectively. If the base station need not to update the filters, the cost of using the old filters at the next sampling epoch is referred to as $C_{old}$ . When $C_{new} < C_{old}$ , the base station updates the new filters. Otherwise, all nodes adopt the old filters.

5. Experiments

5.1. Experiment Settings

We evaluate the performance of the proposed algorithm PreFU by MATLAB simulations. The experimental data is from Intel Lab [19]. We adopt temperature, humidity and light data on March 1st, 2004, as experimental data. The data were collected from 54 sensor nodes, and data are collected every 31 seconds. In the paper, we treat the average of every two sample values as one.

We compare our proposed PreFU algorithm with FILA-E, FILA-L, and DAFM under uniform filter setting [6] for each window. We view the times of communication for a top-k query as the measurement with different k. Considering that the energy consumption of sending is different from receiving, we simulate Mica2Dot sensor node which the energy consumption of sending is 0.37 times of receiving in this paper. The network lifetime is defined as time duration before the first sensor node runs out of power [6]. We adopt 2 AAA batteries with 15120 J for total energy capacity while the energy to establish a connection is 0.645 J and sending a byte data consumes 0.0144 J.

5.2. Experiment Results

5.2.1. Comparison of Prediction Performance in Different Models

We learn the ARIMA(2,0,2) utilizing historical data in the experiment. The parameters on different nodes are maintained by the base station dynamically. Figure 3 shows the predicted performance comparison of two models for s-step-ahead predicting. The prediction performance of ARIMA models is prior to linear regression model. Using ARIMA model to predict, the error rate is much smaller than linear regression model when $k = 5, 6, 7$ , and 8. We set s to 5; that is, for every 5-step-ahead predicting we check whether to update the window parameters for each sensor.

Figure 3

Error comparison of LR and ARIMA models.

5.2.2. Comparison of Four Algorithms in Different Network Structures and Data Distributions

First, the different network structures may affect the performance of algorithm. We evaluate the effectiveness of algorithm based on single-hop and multihop networks as shown in Figure 4 (the hop number is 2). Second, the different data distributions have affected the performance of algorithm. We evaluate the algorithm using following sensor attributes temperature, humidity, and light in different network structures to illustrate the efficiency of the proposed algorithm. Figures 5 and 6 show the results of our proposed algorithm and existing algorithms in different network structures and data distributions.

Figure 4

2-hop structure for Intel Lab sensors.

Figure 5

Comparison of different approaches for top-k query processing on Intel Lab data (single hop).

Figure 6

Comparison of different approaches for top-k query processing on Intel Lab data (multihop).

As shown in Figures 5 and 6, under two possible network structures single-hop and multihop and three different data temperature, humidity, and light, the proposed PreFU algorithm is prior to existing algorithms. Take Figure 5(b) as an example, when $k = 10$ , PreFU algorithm is less 7000 communications than FILA-L and less 4000 communications than DAFM algorithm. The advantages lie in two aspects: first, k is smaller compared with the total number of nodes in the wireless sensor network. In addition, the nodes in the set of nontop-k share the same filter which is affected by the value on node ranked kth. Once the value on node ranked kth changed, the filtering windows in the set of nontop-k do not need to be updated. So, we update the filters-based s-step-ahead prediction which decrease the energy consumption. The second aspect is that the ARIMA model is prior to linear regression model on performance of prediction.

5.2.3. Comparison of Lifetime under Different Approaches

We now consider the lifetime of DAFM, FILA-E, FILA-L, and our PreFU approaches under single-hop (Figure 7) and multihop (Figure 8) network distributions. In Figure 7, our PreFU approach is prior to other three approaches under temperature and humidity data. When k is small, our approach performs much better. However, under light data, when k is small, our PreFU approach is worse than DAFM approach. The reason is that light changes rapidly in real world and one-step-prediction will be better under small k values. When k increases, we should consider more sensor nodes as top-k results, our proposed PreFU approach performs better. In Figure 8, the same situation occurs and the reason is the same as above.

Figure 7

Comparison of lifetime for top-k query processing on Intel Lab data (single hop).

Figure 8

Comparison of lifetime for top-k query processing on Intel Lab data (multihop).

6. Conclusion

In order to cope with the problem that there is unnecessary updating cost in top-k query processing, we propose a new top-k query algorithm called PreFU which is based on time series prediction models. The algorithm evaluates the cost of updating the filters based on ARIMA prediction models which is built on the historical sensor data. Our PreFU reduces the energy consumption and prolongs the life of network by avoiding unnecessary updating of filtering windows. The extensive experiments demonstrate the correctness and effectiveness of the proposed PreFU approach compared with FILA-E, FILA-E, and DAFM approaches.

Footnotes

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

This work is supported by National Program on Key Basic Research Project of China (973 Program) under Grant no. 2014CB744900, the Fundamental Research Funds for the Central Universities of China under Grant no. NZ2013306, the Ph.D. Programs Foundation of Ministry of Education of China under Grant no. 20103218110017, Aeronautical Science Foundation of China under Grant no. 20115552030, and NUAA Research Funding of China under Grant NS2013089. The authors would also thank the reviewers for their helpful comments and advices to improve the presentation of the paper.

References

Akyildiz

I. F.

Sankarasubramaniam

Cayirci

Wireless sensor networks: a survey

Computer Networks 2002 38 4 393 422

2-s2.0-0037086890

10.1016/S1389-1286(01)00302-4

Anastasi

Conti

Di Francesco

Passarella

Energy conservation in wireless sensor networks: a survey

Ad Hoc Networks 2009 7 3 537 568

2-s2.0-56449087483

10.1016/j.adhoc.2008.06.003

Ilyas

I. F.

Beskales

Soliman

M. A.

A survey of top-k query processing techniques in relational database systems

ACM Computing Surveys 2008 40 4, article 11

2-s2.0-55349093583

10.1145/1391729.1391730

Soliman

M. A.

Ilyas

I. F.

Chang

K. C.-C.

Top-k query processing in uncertain databases

Proceedings of the 23rd International Conference on Data Engineering (ICDE ′07)

April 2007

896 905

2-s2.0-34548724406

10.1109/ICDE.2007.367935

Soliman

M. A.

Ilyas

I. F.

Chang

K. C.-C.

Probabilistic top-k and ranking-aggregate queries

ACM Transactions on Database Systems 2008 33 3, article 13

2-s2.0-51149112283

10.1145/1386118.1386119

Tang

Lee

W.-C.

Top-k monitoring in wireless sensor networks

IEEE Transactions on Knowledge and Data Engineering 2007 19 7 962 976

2-s2.0-34249805447

10.1109/TKDE.2007.1038

Mai

H. T.

Y. L.

W. L.

Myoung

H. K.

Processing top-k monitoring queries in wireless sensor networks

Proceedings of the 3rd International Conference on Sensor Technologies and Applications (SENSORCOMM ′09)

June 2009

545 552

2-s2.0-70449500750

10.1109/SENSORCOMM.2009.91

Mai

H. T.

Lee

Y. W.

Lee

K. Y.

Kim

M. H.

Distributed adaptive top-k monitoring in wireless sensor networks

Journal of Systems and Software 2011 84 2 314 327

2-s2.0-78650603218

10.1016/j.jss.2010.10.018

Madden

Franklin

M. J.

Hellerstein

J. M.

Hong

Tag: a tiny aggregation service for ad-hoc sensor networks

Proceedings of the 5th symposium on Operating systems design and implementation (OSDI ′02)

2002

131 146

10.

Chen

Liang

Zhou

J. X.

Energy-efficient top-k query processing in wireless sensor networks

Proceedings of the 19th International Conference on Information and Knowledge Management (CIKM '10)

October 2010

New York, NY, USA

ACM

329 338

2-s2.0-78651301821

10.1145/1871437.1871482

11.

Liu

Lee

W.-C.

A cross pruning framework for Top-k data collection in wireless sensor networks

Proceedings of the 11th IEEE International Conference on Mobile Data Management (MDM ′10)

May 2010

157 166

2-s2.0-77955219522

10.1109/MDM.2010.41

12.

Abbasi

Khonsari

Farri

Mote: efficient monitoring of top-k set in sensor networks

Proceedings of the 13th IEEE Symposium on Computers and Communications (ISCC ′08)

July 2008

Marrakech, Morocco

957 962

2-s2.0-55849141952

10.1109/ISCC.2008.4625647

13.

Yeo

M. H.

Seong

D. O.

Yoo

J. S.

Prim: priority-based top-k monitoring in wireless sensor networks

Proceedings of the International Symposium on Computer Science and its Applications (CSA ′08)

October 2008

326 331

2-s2.0-56649086925

10.1109/CSA.2008.20

14.

Cho

Son

Chung

Y. D.

POT: an efficient top-k monitoring method for spatially correlated sensor readings

Proceedings of the 5th International Workshop on Data Management for Sensor Networks (DMSN '08)

August 2008

8 13

2-s2.0-65249122451

10.1145/1402050.1402053

15.

Michel

Triantafillou

Weikum

KLEE: a framework for distributed top-k query algorithms

Proceedings of the 31st International Conference on Very Large Data Bases (VLDB ′05)

September 2005

637 648

2-s2.0-33744898753

16.

Silberstein

Braynard

Ellis

Munagala

Yang

A sampling-based approach to optimizing top-k queries in sensor networks

Proceedings of the 22nd International Conference on Data Engineering (ICDE '06)

April 2006

2-s2.0-33749605102

10.1109/ICDE.2006.10

17.

Tulone

Madden

Paq: time series forecasting for approximate query answering in sensor networks

Proceedings of the 3rd European Conference on Wireless Sensor Networks (EWSN ′06)

2006

21 37

18.

Chong

Kui

Min

Energy efficient information collection with the ARIMA model in wireless sensor networks

Proceedings of the IEEE Global Telecommunications Conference (GLOBECOM '05)

December 2005

2470 2474

2-s2.0-33846602237

10.1109/GLOCOM.2005.1578206

19.

Intel berkeley research lab

http://www.select.cs.cmu.edu/data/labapp3/