Abstract
Event query processing is a very important issue in wireless sensor networks (WSNs). In order to detect event early and provide monitoring information and event query timely in WSNs, an efficient intelligent collaborative event query (ICEQ) algorithm is proposed, in which sensor nodes that are near to the boundary of events are selected to accomplish complex event monitoring and query processing through intelligent collaboration. ICEQ will select range-nearest neighbors as the basic components of surrounding nodes. Then it will identify the gaps between the surrounding nodes and try to select the nearest neighbor collaborative nodes for enclosing the event in the node selection phase, which can avoid redundant sensor nodes to join surrounding nodes via identifying a set of association surrounding nodes between the nearest sensor nodes and the query events. Detailed experimental results and comparisons with existed algorithm show that the proposed ICEQ algorithm can achieve better performance in terms of query-processing time, average number of selected collaborative nodes, and query message consumption.
1. Introduction
The rapid development in computing, sensing, and wireless communication technologies has made the availability of wireless sensor networks (WSNs) [1, 2]. Their low cost, small size, and untethered nature make them sense information at previously unobtainable resolutions [3]. WSNs can be deployed in battlefield applications, and a variety of vehicle health management, habitat monitoring, environment monitoring, and condition-based maintenance applications on industrial, military, space platforms [4, 5].
In WSNs, an important task is to monitor dynamic and unpredictable events. Since the sensor network can be viewed as a distributed database [6, 7], due to the distributed nature and resource constraint of WSNs, we cannot maintain a centralized index to support query processing in WSNs [8]. Meanwhile, because of limitations imposed by impoverished computing environment, data collection and query in WSNs must support an unusual set of software requirements. Several previous works [8, 9] proposed declarative SQL-like query which enable users to acquire the information about the network through issuing queries to the sink. Even though each sensor node will be rather limited in terms of storage, processing, and communication capabilities, they will be able to accomplish complex event monitoring and query processing through intelligent collaboration, especially in large-scale WSNs.
Since sensor nodes have rigid energy constraints, it is hard to displace sensor nodes in the monitoring region [8, 10] due to unattended and untethered deployment. Most existing data collection systems [11–14] are query-based ones. Most existed event query schemes will decrease the lifetime of WSN greatly due to power consumption for the real-time monitoring. Traditional query-processing techniques of WSNs mainly deal with retrieving sensor node locations, sensing values, and aggregating the sensed values. However, in a lot of applications, users expect event and data information about areas of their interests. If a sensor is queried by many users, it may experience congestion and great power consumptions. Thus, a natural requirement is that each user sets a proper query range to both avoid overhead and achieve a global optimality at the same time.
In order to balance the inherent tradeoff between query reliability versus energy consumption in query-based wireless sensor systems, an adaptive fault-tolerant quality of service (QoS) control algorithms based on hop-by-hop data delivery utilizing “source” and “path” redundancy is proposed in [11] to maximize the lifetime of the system. In order to allocate the multihop query range for each user such that a certain global optimality is achieved, Han et al. [15] investigated the NP complete scheme in its generic form and proposed a distributed heuristic to resolve the query problem. A data-querying scheme was proposed in WSN [16] where queries formed for each sensing task are sent to task sets, and the sensed data is retrieved from a sensor network in the level of detail specified by users, and a tradeoff mechanism between data resolution and query cost is provided. To disseminate data required for processing monitoring queries in a WSN, the notion of event-monitoring queries and algorithms were proposed in [17] for building and maintaining efficient collection trees that provide the conduit to minimize important resources such as the number of messages exchanged among the nodes or the overall energy consumption. In order to improve the performance of area query-processing in wireless sensor networks, an energy-efficient in-network area query processing scheme is proposed in [18], which partitioned the monitored area into grids and constructed a reporting tree to process merging areas and aggregations and conserve energy consumption. In [19], two approaches were proposed for processing such queries in WSN in-network instead of collecting all data at the base station of the spatiotemporal queries in WSN.
In order to improve the query performance, range nearest-neighbor (RNN) query [20], and nearest-surrounder (NS) queries [21], retrieve data based on location information in sensor networks. This kind of query schemes may enable us to find out surrounding nodes needed and is an efficient way to monitor event with less power consumption. In [22], a distributed Bayesian algorithm was proposed based on the concept of spatial correlation. However, it assuming that event measurements are either much larger or much smaller than normal measurements. In order to find out approximate real boundary, an efficient event query scheme is proposed in [23], which considers that WSN is composed of two distinct homogeneous regions. In order to achieve the boundary node efficiently, the localized fault-tolerant event boundary detection scheme was proposed in [24]. An efficient noise-tolerant event boundary detection algorithm was proposed in [25], which defined boundary nodes as sensor nodes which lie within real boundary with certain confidence interval guarantee. The problem of in-network processing and queries of trajectories of moving targets in a sensor network is investigated in [26], which exploits the spatial coherence of target trajectories for opportunistic information dissemination with no or small extra communication cost, as well as for efficient probabilistic queries searching for a given target signature in a real-time manner. In [27], collaborative query processing among multiple heterogeneous sensor networks was investigated and formulated into an optimization problem with respect to energy efficiency. WinyDB [28], a relational query-processing system on Windows CE-based personal digital assistants (PDAs) for sensor networks, is proposed to improve both the energy efficiency and the data quality collaboratively. To overcome the faulty data query problem to improve the accuracy of data query, an efficient fault-tolerant event query algorithm (FTEQ) was proposed in [29], which takes the short-term and the long-term spatial and temporal similarities between sensors and environment into consideration to decrease faulty detection rate and data query cost.
Although a number of event query schemes have been proposed to improve the query performance in WSNs, event query processing is still a very challenging task due to its complexity and ill-posed nature, and all of these works do not comprehensively consider the correlation between sensors and environment. And the most existing research work has focused on data aggregation to provide efficient data transmission. The overhead of query processing is generally ignored with the assumption that query transmission contributes to only a small portion of overall data transmission in the sensor network. However, there are many cases where this assumption does not hold any more. Therefore, the methods mentioned above all use statistical methods to differentiate whether sensor nodes are boundary nodes or not.
Another problem is that existed work always assumes that the monitoring nodes are often interested in obtaining either the actual readings or their aggregate values; from sensor nodes that detect interesting events, the detection of such events can often be identified by the readings of each sensor node. In such scenarios, each sensor node is not forced to include its measurements in the query output at each epoch, but rather such query participation is evaluated on a per epoch basis, depending on its readings and the definition of interesting events. However, in actual complex environment, due to the characteristics of WSNs, sensors are usually deployed in a noneasily accessible or harsh environment, and sensors are prone to failure, and these faulty sensors are likely to report arbitrary data very different from the true environmental phenomenon, and the faulty data of sensors are very common, which greatly influence the accuracy of data query. Hence, how to select appropriate nodes to accomplish complex event monitoring and query processing through intelligent collaboration is an important task.
Motivated by the above reasons, an efficient intelligent collaborative event query (ICEQ) algorithm is proposed, in which sensor nodes that are near to the boundary of events are selected to accomplish complex event monitoring and query processing through intelligent collaboration. ICEQ includes initial phase and node selection phase. In initial phase, ICEQ will select range nearest-neighbors as the basic components of surrounding nodes. Then, it will identify the gaps between surrounding nodes and try to select nearest neighbor collaborative nodes for enclosing the event in node selection phase, which can avoid redundant sensor nodes to join the surrounding nodes via identifying a set of association surrounding nodes between the nearest sensor nodes and query events. The main contributions of ICEQ may be summarized as follows.
To retrieve a set of the nearest collaborative nodes of a specific event, ICEQ can identify a set of association surrounding nodes between the nearest sensor nodes and the query events that frequently appear in the system, which converts the demographic values and sensed data items presented in each query transaction into demographic types and event categories, respectively. Hence, ICEQ can select the nodes appropriately to decrease the number of selected nodes and prolong the lifetime of WSNs. ICEQ is able to identify where gaps exit between surrounding nodes by finding large or frequent demographic query itemsets of query, and then try to select proper collaborative nodes for enclosing the event with rule decision and computing confidence between rules. Hence, ICEQ can select the appropriately nodes according to the network topology and environment.
The rest of this paper is organized as follows. The proposed intelligent collaborative event query algorithm is given in Section 2. Performance studies are conducted in Section 3. This paper concludes with Section 4.
2. Proposed Algorithm
2.1. Problem Description
In a distributed WSN, assume that that each sensor node
In order to select the nodes appropriately, the proposed ICEQ algorithm will identify a set of association surrounding nodes between the nearest sensor nodes and the query events that frequently appear in the system, which will consider the spatial and the temporal correlation between sensors and environment. Suppose there are k demographic attributes with domains being
Since the goal is to identify the associations between demographic types and event categories; the demographic values and sensed data items presented in each query transaction must be converted into demographic types and event categories, respectively, resulting in an extended query transaction. Here we include all demographic types of each demographic value and all sensed data categories of all item appeared in the sink node. Therefore, the ith query transaction can be translated to the extended query transaction:
Similarly, we say that
2.2. Intelligent Collaborative Event Query Algorithm
The proposed ICEQ algorithm consists of two phases: initial phase and node selection phase. ICEQ will select range nearest neighbors as the basic components of surrounding nodes in initial phase. Node selection phase is to identify gaps between the rough surrounding nodes and then try to select proper surrounding nodes for monitoring the event by intelligent collaborative processing among nodes to decrease power consumption.
In the initial processing step, let Q denote a priority queue, let
In order to differentiate different segments of corresponding query-line nearest-neighbor nodes, end nodes of subsegments of a specified query line
1: Initialization (4). 2: 3: Dequeue node 4: 5: 6: 7: Add an end node 8: Update 9: Add an end node 10: 11: 12: 13: 14: Collaborative node selection; 15:
2.3. Collaborative Node Selection
The goal of collaborative node selection phase is to select proper sensors to enclose the event. From Algorithm 1, we can see that the rough nearest surrounding nodes of the event have been put in the selected nodes set S. Then, we construct neighbourhood relationships for each sensor node
If we sort nodes in the queue S counter clockwise with reference to the centre point of the approximate polygonal boundary of the event, a sensor node
In the phase, each sensor node
The proposed ICEQ algorithm will check whether gaps exit between this node and its adjacent neighbors in S. When a sensor node
A neighbor node
We also construct neighborhood relationship for
In order to select the appropriate node, we need to identify generalized profile association rules in the rough node set S, the itemsets that will interest us are of the following form
Let
Considering that some of the strong generalized profile association rules could be related to each other in either the demographic itemset part or the sensor nodes, and, therefore, the existence of one such rule could makes some others not interesting. To overcome the problem, let Π be the set of all demographic attribute types:
We call a rule
a D-ancestor of another rule
In context of collaborative nodes selection, we say a rule is valid if it can be used for making decision. Given a set of strong rules say
Let the immediate D-descendants of R be
Let the immediate B-descendants of R be
Suppose that we have obtained the confidences of both the D-deductive rule and the B-deductive rule of a given rule R. Let E-Conf(DB-deductive rule∣D-deductive rule) be the estimated confidence of R, DB-deductive rule given R; let D-deductive rule and E-Conf(DB-deductive rule∣B-deductive rule) be the estimated confidence of R's DB-deductive rule given R, B-deductive rule. We have
Therefore, we define the estimated interestingness of R, denoted
We approximate the confidence of a D-deductive rule by using the following theoretic results.
Lemma 1.
Let
Proof.
Since
Similarly,
Therefore,
Theorem 2.
Without loss of generality, let
Proof.
Let
By adding the denominators and numerators, respectively, from the left-hand sides of the two equations, we can obtain
Since
Now we discuss how to compute the confidence of a B-deductive rule
If
At last, we report all sensor nodes in S as nearest surrounding nodes of the event. And the proposed collaborative node selection algorithm is shown in Algorithm 2.
1: 2: Find out strong rules 4: 5: 6: Choose Construct neighbourhood relationship for Insert 7: 8: Choose Construct neighbourhood relationship for Insert 9: 10: 11:
3. Simulation Results
3.1. Simulation Setup
In order to evaluate the performance of the proposed ICEQ algorithm, we implemented the ICEQ in the well-known simulation tool NS-2 [30]; the range nearest neighbor (RNN) query algorithm [20] is simulated as discussed here. There are 5000 sensor nodes deployed in our monitoring region. The shape of the event that occurs in the monitoring region is a circle. The approximate polygonal boundary of the event can be obtained via the boundary nodes of the event. Thus, the deployment strategy totally ensures the assumption that there is no communication hole in the network. And the system generates critical and noncritical events randomly. The performance analysis has been done by deploying variable number of nodes on the fixed squared area, to verify the effect of node densities on the data gathering path length and average number of hop counts. The deployment of sensor nodes is dense enough so that we can find out surrounding nodes of the event. And we set the
3.2. Validation in Different Range of Event
In the first scenario, we vary the size of the event with varying its radius from 5 m to 30 m. Figures 1 and 2 show the query-processing time and average number of selected nodes of different algorithms, respectively.

Processing time with varying event radius in grid topology.

Number of selected nodes with varying event radius in grid topology.
As shown in Figure 1, it is observed that the proposed ICEQ outperforms in terms of query-processing time irrespective of the event radius. For ICEQ, the query processing is less than 0.23 s when the event radius, increases, while for RNN, the query processing time is higher than 0.3 s. The reason is that RNN needs to search RNNs edge by edge according to the approximate polygonal boundary of the event, and the cost of RNN is essentially higher than ICEQ. It is noticed that the event radius has some impact on the query-processing time for both ICEQ and RNN. It is reasonable since larger event radius means that more nodes will be evaluated in node selection. So when we increase the size of the event, the cost to find out surrounding nodes of the event will raise accordingly.
Figure 2 shows the numbers of selected nodes for the monitoring event. With the number of event radius increasing, the number of selected nodes of two schemes will increase obviously. When the event radius increases from 5 m to 30 m, the number of selected nodes of RNN increases to 61, which is slightly larger than that of ICEQ. We can see that the number of selected nodes of ICEQ always is slightly less than that of ICEQ. The reason is that the objective of the ICEQ is to minimize the selected nodes for the given constrained conditions. So in collaborative node selection phase, ICEQ considers the sufficient confidence and rules between nodes, which will avoid redundant sensor nodes to join surrounding nodes via identifying a set of association surrounding nodes between nearest sensor nodes and query events.
Figures 3 and 4 show the query-processing time and average number of selected nodes of different algorithms in random topology, respectively.

Processing time with varying event radius in random topology.

Number of selected nodes with varying event radius in random topology.
From Figure 3, we can see the similar results as in Figure 1. And the proposed ICEQ outperforms in terms of query-processing time irrespective of the event radius. For ICEQ, the query processing is less than 0.5 s, and it will increase slightly when the event radius increases. While for RNN, the query-processing time is obviously higher than that of ICEQ. When the event radius increases, the query-processing time of RNN will increase suddenly. And the query-processing time will larger than 2.5 s. It is noticed that the event radius has great impact on the query-processing time for RNN. The reason is the RNN needs to search RNNs edges, which depends on the network topology. Hence, the cost of RNN is essentially higher than ICEQ. For ICEQ, the reason of processing time increasing slightly is that ICEQ will visit more candidates with the radius increasing.
The results of number of selected nodes of different algorithms with the varying event radius are shown in Figure 4. As the event radius increases, the number of selected nodes of different algorithms will increase obviously. Especially for RNN, the number of selected nodes increases to 57. The main reason is that RNN selects surrounding nodes by searching according to the approximate polygonal boundary of the event, which leads to select more nodes. So the event radius has a great influence on RNN. The event detection ratio of RNN is higher than that of ICEQ clearly. Because ICEQ can identify the gaps between surrounding nodes and try to select nearest neighbor collaborative nodes for enclosing the event in node selection phase, which can avoid redundant sensor nodes to join surrounding nodes via identifying a set of association surrounding nodes between the nearest sensor nodes and the query events. For the proposed ICEQ, the number of selected nodes is less than that of ICEQ. For example, when the event radius increases to 30m, the number of selected nodes of ICEQ increases to 42. Compared with RNN, ICEQ tries to select proper collaborative nodes for enclosing the event with rule decision and computing confidence between rules. And the collaborative node selection scheme of ICEQ also helps to decrease the number of selected nodes.
3.3. Validation in Different Query Lines
In this scenario, we vary query lines in random topology. And sensor nodes are deployed in random topology. The radius of the circular event is fixed at 15 m, and we assume that it occurs arbitrarily in the monitoring region so that we obtain different number of edges of the approximate polygonal boundary. Figures 5 and 6 show the query-processing time and the average number of selected nodes of different algorithms with varying query lines in random topology, respectively.

Processing time with varying query lines in random topology.

Number of selected nodes with varying query lines in random topology.
The results of query-processing time of different algorithms with the varying number of query lines are shown in Figure 5. As the number of query lines increases, the query-processing time of different algorithms will increase. Especially for RNN, the query-processing time increases to 0.98 s when the number of query lines increases to 15. The main reason is that RNN needs to search edges according to the approximate polygonal boundary of the event, which leads to longer delay time and increases query-processing time. So the number of query lines has a great influence on RNN. For the ICEQ, the query-processing time is obviously lower than that of RNN. When the number of query lines increases to 15, the query processing time of ICEQ only increases to 0.19 s, which is less than that of RNN 0.79 s. Another phenomenon is that the number of query lines has no much effect on the ICEQ algorithm. The query-processing time almost keeps in the range of 0.15 s to 0.2 s. The reason is that ICEQ will determine the number of surrounding node candidate to traverse in the search process. Therefore, ICEQ can adaptively select appropriate collaborative nodes to decide the number of surrounding node candidates to form the approximate polygonal boundary. We also notice that the advantage of ICEQ over RNN becomes more evident when the number of query lines becomes large.
Figure 6 shows the number of selected nodes by different schemes when the number of query lines varies from 10 to 15. It can be seen that the number of selected nodes of two schemes oscillates slightly as the number of query lines increases. For RNN, the average number of selected nodes increases slightly, from 20 (10 query lines) to 23 (15 query lines). The reason is that RNN must find out the edge and the search surrounding nodes, which will increase the selected nodes when query lines increas. However, ICEQ always outperforms RNN due to the concurrent use of the rule mining among nodes and collaborative nodes. For instance, it achieves average 18.4% nodes saving compared with RNN scheme. Less selected nodes lead to longer network lifetime since sensors can turn to the power-saving mode once the event monitoring in their region is done.
3.4. Validation in Query Consumption
In this scenario, we investigate the total message consumption with varying network size from 100 to 10000 sensor nodes. Figures 7 and 8 show the query total message consumption by varying network size from 100 to 10000 sensor nodes. The duration of each query is set by 300 s and number of event types.

Total messages with varying network size.

Standard deviation with varying network size.
In Figure 7, the total number of messages of two mechanisms increases with the network size. The network size impacts on a number of messages significantly in RNN because the number of sensor nodes selected increases as the network size grows. In ICEQ, only a few numbers of reply messages exist so that the network size impact ICEQ a little due to the increasing network size. As the network size is small, the total number of messages of ICEQ is smaller than RNN. This is because overhead rose from the change of rough event boundary node larger than the benefit obtained from the nodes due to small network size. That is because network size is large enough, and the benefit of surrounding nodes is cost effective. Hence, ICEQ works efficiently in a large-scale WSN. Moreover, in ICEQ, the involvement reduces the duplicated data packets, making that ICEQ outperforms RNN in a large-scale network.
Figure 8 compares the degree of power balance of all mechanisms in various network sizes. Each sensor node has the same probability for detecting event data, and the network traffic is uniformly distributed over the whole WSN. When the network size is smaller than 500, standard deviation of RNN is small. This is because that the network size is too small, and the sensor nodes intend to select rough event boundary nodes; therefore, the difference of query event on nodes is small. However, as the network size increases, the RNN sensor nodes that are close to event boundary have higher traffic than nodes in other location, and; therefore, their standard deviations is higher than ones of ICEQ mechanisms as the network size is 300 nodes. However, as the network size increases larger than 400, the standard deviations of RNN decrease significantly because the number of detected event is fixed and the event detected is distributed in the whole WSN. The ICEQ mechanism obtains a smaller standard deviation as network size is larger than 500. The results also validate the effectiveness of the proposed ICEQ in power consumption.
4. Conclusion
In this paper, we presented an efficient intelligent collaborative event query (ICEQ) algorithm to detect the event early and provide monitoring information and event query timely in WSNs. ICEQ can identify a set of association surrounding nodes between the nearest sensor nodes and the query events that frequently appear in the system, which converts the demographic values, and sensed data items presented in each query transaction into demographic types and event categories, respectively. Hence, ICEQ can select the nodes appropriately to decrease the number of selected nodes and prolong the lifetime of WSNs. ICEQ is able to identify where gaps exit between surrounding nodes by finding large or frequent demographic query itemsets of query, and then try to select proper collaborative nodes for enclosing the event with rule decision and computing confidence between rules. Hence, ICEQ can select the appropriately nodes according to the network topology and environment. Through ICEQ, we can select a set of surrounding nodes of the event instead of all the sensor nodes in the monitoring region to check if there is any event evolution. Therefore, sensor nodes which are not surrounding nodes can enter into sleep modes temporarily to save their battery energies and thus extend the lifetime of sensor networks. The future work will focus on the issues of query moving objects and track objects in WSNs.
Footnotes
Acknowledgments
This work was supported by the National Natural Science Foundation of China (no. 60902053), the Science and Technology Research Planning of Educational Commission of Hubei Province of China (no. B20110803), and the Natural Science Foundation of Hubei Province of China (no. 2008CDB339). The author also gratefully acknowledges the helpful comments and suggestions of the reviewers.
