Abstract
Storage, query cost, and energy efficiency are among the most important data dissemination issues in wireless sensor networks. Data-centric storage approach can be viewed as energy efficient and low cost solution to these issues, which itself suffers from hotspot storage problem. Some provided solutions tried to handle this mentioned problem and also brought load balancing and scalability to the network. In this article, we present a method called adaptive resilient data-centric storage which reduces the storage cost and also improves the load balancing and fault tolerance in the network by making data-centric storage adaptive based on the frequency of sensed events. Then, we propose a mathematical model to determine the threshold for the frequency of occurrence of sensed events. Finally, we developed a discrete-event-based simulator for two-dimensional graphical view of the network and collected the results. Simulations and analytical results show the superiority of the proposed method over the compared approaches with the miscellaneous frequency of events.
Keywords
Introduction
Wireless sensor networks (WSNs) are interacting sensor nodes with energy and connectivity constraints. They are deployed in order to gather raw information from pervasive environments or industrial applications. In all the applications related to WSN, the most challenging issue is how to store sensed data through the network. The effective use of the vast amount of data gathered by large-scale sensor networks will require scalable, self-organizing, and energy-efficient data dissemination algorithms. 1 Two main approaches were proposed in literature: distributed data storage (DDS) and data-centric storage (DCS). 2 In DDS, each node sensing the data is responsible for data storage. Although the cost of data storage is low in this model, since the station has to query all the nodes within a network using flooding method, the cost of query process increases dramatically as the network size increases. On the other hand, DCS methods use geographic hash tables (GHTs) 3 to map an event-type to a specific geographic location. As a result, whenever an event is sensed by a sensor, data will be sent toward a geographic location using geometric routing algorithm like Greedy Perimeter Stateless Routing (GPSR). 4 Then, data will be stored in the nearest node to that hash location. Queries are also sent toward the storage location instead of flooding the network. Therefore, the performance of DCS is much better than DDS in a similar situation.
Some restrictions of WSN were not considered in the basic version of DCS. Since sensor nodes are more prone to failure, data loss will occur in WSN. Therefore, data availability becomes very low. Use of data replication mechanisms will help to avoid such situations.5–7 Approaches such as structured-replication data-centric storage (SR-DCS) 1 and resilient data-centric storage (R-DCS) 8 have enhanced the basic DCS in order to overcome the mentioned limitation. These methods bring information redundancy by replicating the data from storage node to the nodes called replicas. In R-DCS, to store the sensed data in its specific area, network is divided to a collection of zones. This reduces the load on replicas by distributing the data storage load across the network, consequently increases the average lifetime of the WSN. Also, the monitor nodes are involved in the storage and query process for directing the data, queries and responses. Using monitors for data query and data storage management causes traffic reduction over the replicas and their neighbors. Although R-DCS brings reliability and load balancing to WSN storage and query process, it has its own issues: First, since monitors participate in storage and query process, the network traffic will increase, especially when the frequency of occurrence of events is high. Second, monitors work as a gateway in storage and query process. Therefore, their neighbors tolerate high amount of traffics and lose their energy fast. This brings more destinations to reach monitors and finally unreachability of them.
In order to overcome the R-DCS issues, a novel hybrid method called adaptive resilient data-centric storage (AR-DCS) is proposed in this article, which is an adaptive version of R-DCS. 8 In AR-DCS, the number of zones is determined adaptively based on the frequency of occurrence of sensed events. Whenever the frequency of events rises over the predetermined threshold in a specific zone, division happens and subzones are created. On the other hand, as the frequency of events returns to its normal state, zones are aggregated and create their widespread parent zone. This makes a WSN adaptive based on the frequency of sensed events. The benefit is threefold: load balancing, reduction in the storage cost, and avoidance of the unreachability of the monitors with a negligible effect on the query cost.
The rest of the article is organized as follows: section “Related work” covers related works previously carried out in different areas contributing to design and implementation of DCS solutions. Section “AR-DCS” proposes our method and compares the suggested method with both basic DCS and R-DCS and section “Simulation” evaluates the results of simulation. Section “Conclusion” provides the conclusion of the article.
Related works
The DCS basic method introduced in Ratnasamy et al. 1 uses a hash function to map an event-type to a specific location. Therefore, DCS avoids flooding mechanisms used in DDS and brings energy saving and scalability to WSN. However, some issues such as node failure and mobility remain unsolved. Hence, various methods were proposed to solve the mentioned issues and bring data recovery and fault tolerance to the basic DDS method. DCS methods that work on the mentioned issues are categorized into two groups: load-balancing-based DCS and reliable-based DCS.
The load-balancing DCS methods offer load balancing over data dissemination on storage and query process which brings energy saving and consequently better network lifetime. Grid-based dynamic load-balancing (DLB) 9 resolves the node failure problem and provides load balancing using multi-threshold levels in each grid. DLB divides the whole network into a grid of cells, and each cell consists of the nodes which are within one hop distance. Sink nodes use the hash function to map an event-type to a grid ID. After detecting an event, the node sends a put packet to the grid ID and uses GPSR to forward the mentioned packet to the closest node to that grid point. 7 The main drawback of this method is less data availability due to node failure. Query hotspot is one of the problems that happen when the stored data in few specific nodes are targeted by numerous queries. To resolve this problem, several methods have been proposed such as zone partitioning and zone partial replication 10 based on Multi-dimensional Range Query which locally detect and dissolves query hotspots. Also, in time-parameterized DCS, 11 the time at which the sensed data is generated determines the node which is used to store the data. Since data zones are altered periodically, both storage and query process are scattered across the network. Structured replication DCS 3 is a method to support load balancing and scalability. In this method, events are assigned to a hierarchical depth based on the decomposition of the space key. This mechanism can be considered as a median method between DDS in which there is no storage cost but there is high query cost, and the basic DCS in which the storage and query cost are at an average level. In SR-DCS, if a clustered failure occurs, the network will not be able to retrieve the data of the specific event-type. Also the root storage node is a single point of failure. Adaptive structured replication DCS (ASR-DCS) 12 is another novel mechanism to bring load balancing into the WSN and reduce energy consumption on the bottlenecks, especially in the time of high frequency of sensed events. In addition to solve the problem of root node bottleneck and high query cost, it provides SR-DCS benefits such as load balancing on storage nodes and scalability to the network. Two thresholds are defined for this method. If the event frequency exceeds the first threshold, storage node will create the first level of hierarchy and event data are sent to first-level replicas. If event frequency also exceeds the second threshold, first-level replicas will act like the root node and will create the second level of hierarchy. Upon the return of the network to its normal state and the decline in the event frequency, replica nodes of each level will be merged into their immediate upper-level root node. This makes SR-DCS adaptive based on the various frequency of the sensed events.
The reliable DCS methods work on data recovery due to node failure by replicating the stored data over multiple storage nodes. Aly et al. 13 introduced K-D trees-based DCS schema (KDDCS) to address the node failure problem in DIM. 14 In this method, the whole network is divided into equal-sized data regions with equal number of sensors, leading to a balanced K-D tree. Although KDDCS increases the quality of data and data persistence, it causes more delay in query process. A dynamic DCS mechanism is suggested in Lai et al. 15 This method is able to reduce the storage cost by mapping the sensed data to the storage points. Despite the increase in the overhead on data storage process, network robustness is improved by enhancing GPSR routing protocol to store several copies of data. A new dynamic DCS mechanism 16 can be a solution which changes storage node regularly over a fixed period of time called epoch. This makes it possible having temporal queries to the previous storage nodes and providing access to the historical data, which results in more data availability. But storing all events during an epoch in the selected storage nodes brings node failure and high storage cost problem to the network. Ahmed and Gregory 17 proposed a method called data-centric storage with metric-based similarity searching (DCSMSS) which is particularly useful where users seek data within a WSN that is either a match or close to a match. In this method, the sensed data which is related to an event-type is sent toward a specific node called sector head (SH). SH nodes aggregate the data and then transmit the result toward the storage nodes. This enhances reliability and provides efficient similarity searching within a distributed network. Adaptive DCS 18 offers a hybrid method to solve the node failure problem, which dynamically determines the network conditions such as query rate and event production for each event-type. Decision-making is done in the sink where a suitable storage method is determined, based on the rate of sensed data and query for each event-type. This method provides more energy efficiency and increment in network lifetime, with the cost of less data availability. Gonizzi et al. 2 presented a memory-based replication method based on the remaining local memory capacity of nodes. In this method, each node broadcasts the information about memory availability to its neighbor. On the other hand, each node receives an update from other node indicating memory availability, stores the data in its in-memory table. Whenever lack of storage happens, the most memory-available neighbor is chosen as a replica. Thus, it brings reliability by the cost of local broadcast between neighbors. Also, it may be a situation where all the neighbors of the storage node suffer from lack of memory. Then, there is no neighbor to be chosen as a replica so that the storage node flushes its memory and data loss will happen. Ghose et al. 8 presented the resilient-DCS (R-DCS) which proposes two classes as “control nodes” and “data nodes” in the strategic location through WSN. This method reduces the energy consumption of a single node and brings scalability to the network. In R-DCS, the sensor network space divides into “Z” zones. Using this method, data related to an event-type is stored on data nodes called replicas. On the other hand, control nodes called monitors are responsible for data storage and query management. Having increment in storage nodes related to an event-type, and preserving control and summary data in multiple nodes, R-DCS leads to reduction in average storage and query cost and makes network scalable. Another method named as efficient resilient schema DCS 19 uses multiple geographical locations to store an event-type in mobile sensor networks. Locations will be selected from the different distributed points across the network. In this method, hash function works based on the event-type, data importance, and initial collection zone. In another research, Joung et al. 20 offered an adaptive and cost-optimal mechanism called Tug-of-War (ToW). In this approach, rather than using just a single home location, the mechanism can dynamically adjust the number of home locations based on the storage and query rate to minimize the total communication cost.
AR-DCS
Data storage is one of the most challenging issues in WSN because it is directly related to the sensor lifetime and network topological structure. As a result, various approaches try to optimize the storage cost. DDS and DCS are the main approaches in the literature dealing with storage process optimization. In DDS, sensed data are stored locally so that queries are flooded through the network to gain specific data. This brings high traffic of query. On the other hand, DCS uses GHT 3 to map an event-type to a specific location. Nodes sensing the data send storage traffic toward the hash location and store the data in the nearest node to the mentioned location using geometric routing algorithm like GPSR. 4 Queries are also forwarded to the hash location of specific event-type instead of being flooded through the network.
As DCS introduced in Ratnasamy et al., 1 some serious challenging issues such as node failure and mobility were not considered. Therefore, improvements in the field of DCS were continued until methods such as SR-DCS 1 and R-DCS 8 have been proposed to handle data recovery from node failure and mobility. These methods bring information redundancy by replicating the data from storage node to the nodes called replicas. This makes WSN fault tolerant against the mentioned situations. In SR-DCS, storage is done in a hierarchy of replicas. Each event-type is mapped to a specific location and a specific depth of hierarchy. In R-DCS, network is divided into partitions called zones. In each zone, in addition to replicas, another node is used to manage storage and query traffic called monitor. Both R-DCS and SR-DCS are proposed to cover data loss due to node failure but they have their own issues. In SR-DCS, due to static structure of hierarchy of replicas, load on replica’s neighbors increases dramatically, especially when the frequency of occurrence of event rises. This will increase the discharged nodes near the replica. Finally, the replicas become unreachable. Also, it suffers from single point of failure and the traffic of query process is relatively high. Therefore, in Hejazi and Amin, 12 a method called ASR-DCS is proposed to make SR-DCS adaptive, based on the frequency of occurrence of events. As the frequency rises, the depth of hierarchy increases so that more replicas are used in the storage process. Whenever it returns to its normal state, the depth of hierarchy decreases in order to reduce the query cost. On the other hand, R-DCS has two main challenging issues: first, the monitors also participate in storage and query process, which cause more traffic of storage and query of data. Second, as the monitor is a gateway node and is responsible for data query and storage management, its neighbors discharge fast and the monitor becomes a neighborless node. Therefore, storage and query traffic cannot reach the monitor and the storage and query process will be disturbed. To solve R-DCS issues while keeping its capabilities, we proposed a novel hybrid approach called AR-DCS. In AR-DCS, the number of zones within a WSN network is determined adaptively based on the frequency of occurrence of sensed events. Whenever this frequency passes a predefined threshold, each zone divides into four subzones which have their own replicas and a monitor. As a result, the traffic of storage does concentrate on monitors of subzones instead of one root monitor. This brings traffic reduction over the neighbors of the monitor. Also, as zones are divided, the distance between sensing nodes, the monitor, and the replica reduces and this will cause traffic decrease in the storage process. However, when the frequency returns to its normal state, subzones are merged to the parent zone with the main monitor and the main replica for the reduction of the query traffic. The simulation results for basic DCS, R-DCS, and AR-DCS demonstrate that AR-DCS keeps R-DCS capabilities in reliability and data-loss recovery and also it performs better on the storage cost. AR-DCS relatively increases the query cost, but this increment is less than the rate of reduction of the storage cost. AR-DCS also has some points to consider:
1. Rapid swing in the frequency of occurrence of events: These swings will increase the rate of division and aggregation of zones. As a result, stored data are always in transfer between root zone and its subzones replicas. This will bring more traffic over the network and neighbors of replicas. To avoid this, rapid swings are handled in our simulation. Sampling of frequency is done on every storage packet arrived at monitor but evaluation is done after every five received storage packets.
2. Queries sent during the divisions and aggregations: The divisions and aggregations of zones are done based on the frequency of storage packet received at monitor and it is independent of traffic of query. Therefore, there may be a situation where queries are sent during division or aggregation of zones toward the replicas. In this situation, queries are received by a node which is not a replica anymore. If this happens at division time, queries will be received at the root replica and it can answer it directly, which it is not an issue. However, if the situation mentioned above happens at aggregation time, there will be no data to answer. Therefore, replicas in subzones keep their data valid for the specific amount of time to answer queries during aggregation. This traffic is also evaluated in the query cost of our simulation.
3. Threshold definition for zone division and aggregation: The main advantages of AR-DCS over ASR-DCS are single point of failure avoidance due to monitor node usage and dynamic determination of threshold for division and aggregation. Assume that
In order to compute the density of nodes (
where
Finally, to calculate the threshold, equation (4) is used where
4. AR-DCS is complicated: In the aspect of process cost, AR-DCS is more complicated than basic DCS and R-DCS. However, there is a rule that declares the following: The cost of process of 3000 instructions is equal to send 1 bit over 100 m. 3 Therefore, converting communication cost to process cost will bring growth in the node lifespan and the network lifetime.
Maximum number of sensors sensing an object, excerpted from Graham et al. 21
Table 2 characterizes the mentioned methods based on the dominant parameters in WSN scenarios.
Performance characterization of methods.
DCS: data-centric storage; SR-DCS: structured-replication data-centric storage; ASR-DCS: adaptive structured-replication data-centric storage; R-DCS: resilient data-centric storage; AR-DCS: adaptive resilient data-centric storage.
AR-DCS is a hybrid method based on R-DCS 8 and ASR-DCS 12 bringing load balancing and fault tolerance over the nodes involved in storage and query process. This will cause balanced energy consumption over these mentioned nodes and brings better performance in WSN lifetime. As mentioned above, AR-DCS is a suitable method for decreasing the traffic load over neighbors of replicas so that it has positive impact on network lifetime. Also, AR-DCS has R-DCS capabilities in storage and query process using monitors and redundant replicas. Therefore, recovery of lost data and reliability are better preformed than basic DCS. It should be mentioned that although AR-DCS has better performance than R-DCS and basic DCS in storage process, due to zone division, traffic of query process is partly raised. Finally, the total amount of traffic in AR-DCS is near the boundary of basic DCS.
Simulation
Energy consumption is the most critical issue in WSN. Wireless communications between sensor nodes need energy. Therefore, data dissemination approaches in WSN try to optimize communications in order to improve the node battery lifetime. However, mapping communication and energy consumption are complicated tasks which are related to parameters such as hardware and transmission pattern but experiences show that parameters such as “Total number of packets sent for storage:
Input parameters for the network.
Input parameters for the simulation.
DCS: data-centric storage; R-DCS: resilient data-centric storage; AR-DCS: adaptive resilient data-centric storage.
As shown in Tables 3 and 4, input parameters are categorized into two major groups: The first group contains parameters that describe network characteristics. The second one contains simulation-run parameters. Each parameter has a default value, which the simulation is based on. Default values are determined according to Ratnasamy et al. 1 and Ghose et al. 8
For evaluation and analysis of the mentioned methods, we developed a discrete-event system simulator based on event-scheduling approach explained in Jerry 23 in which events and their effects on the state of the system during the predetermined amount of time are used for system conceptualization and modeling. The superiority of the developed simulator against the one introduced in Ratnasamy et al. 1 can be seen in each run of simulation, where the three mentioned methods are evaluated serially based on the same circumstances and target movements. Also, each run is repeated three times and the collected data are averaged. Therefore, the collected data are closer to real-world scenarios. Another advantage of this simulator is random propagation of nodes according to uniform distribution instead of using predetermined positions.
The final advantage is the sensing which is performed from the moving targets within the network instead of injecting unreal sensed data to the nodes. This simulator was developed with three built-in engines mentioned below:
Simulation engine: This engine is the implementation of event-scheduling algorithm based on Jerry. 23
Two-dimensional (2D) view engine: As its name illustrates, this engine is used for graphical drawing of the network.
Network engine: DCS, R-DCS, and AR-DCS are implemented within a simulated WSN by this engine.
The 2D graphical views of the simulator using DCS, R-DCS, and AR-DCS methods are shown in Figures 1–3, respectively. Figure 1 shows the simulation sample for data dissemination of DCS; sensing nodes are illustrated in blue. The yellow nodes are receiving sensed data and the brown ones are replicas. The light-blue node specifies a Personal Digital Assistant (PDA) querying the network. Figure 2 shows the simulation sample with R-DCS. As the figure illustrates, in each zone, there is a green node which acts as a monitor for storage and query management. Figure 3 shows the behavior of AR-DCS in the burst traffic of storage process. When the frequency of storage packets sensed by the monitor passes the second-level threshold, AR-DCS acts like Figure 3. If the frequency of storage packets sensed by the monitor is between the first-level and second-level thresholds, then its behavior is like Figure 2 and whenever the mentioned frequency falls below the first-level threshold, AR-DCS performs data dissemination like Figure 1.

Storage data propagation from the sensing nodes to the replicas (DCS).

Storage data propagation from the sensing nodes to the monitors and replicas (R-DCS).

Storage data propagation from the sensing nodes to the monitors and replicas (AR-DCS burst fashion).
To determine the performance of the mentioned methods, output parameters including

Number of packets sent for storage process (

Number of packets sent for query process (

Number of packets sent for response process (

Total number of packets sent for storage and query process (
Figure 4 illustrates the
Figures 5 and 6 illustrate the total packets in query/response process. As the figures show, AR-DCS partly increases the traffic of query/response process compared with DCS and R-DCS. The reason for such a behavior is the zone division. Queries must be sent to more monitors. Monitors will query more replicas and more replicas will answer to the PDA. But AR-DCS adapts itself to the frequency of packets of storage process so that whenever the mentioned frequency is low, zones are aggregated and queries will be sent to less monitors and replicas. This will cause overall downgrade in the query process cost. That is why query/response process increases relatively. If we compare the increase in query process and decrease in storage process of AR-DCS with the DCS, we will recognize that they are similar in overall performance (Figure 7). But it should be mentioned that AR-DCS covers node failure in the storage process by means of replicating the data over replicas.
According to the results, AR-DCS not only brings load balancing and fault tolerance to the WSN, but also shows better performance in the storage process against R-DCS and basic DCS. The only drawback of AR-DCS is a short increase in query/response process cost compared with basic DCS. It should be mentioned that the overall costs of the storage and query/response process of AR-DCS and basic DCS are similar. However, AR-DCS is resilient against the node failure which is not the feature of basic DCS.
Conclusion
DCS approach can be viewed as energy efficient and low cost solution for data dissemination in WSN. However, in the basic version of the DCS, nodes are reliable and the network topology will not change during the long period of time so that the node failure is not considered. Therefore, methods like R-DCS have been proposed in order to overcome such an unsolved issue. This article proposed a novel hybrid approach called AR-DCS which makes R-DCS adaptive to the various circumstances of the network. As explained in section “AR-DCS,” instead of a predetermined threshold which introduced in Hejazi and Amin, 12 a mathematical one is used for zone division and aggregation. This makes storage process more adaptive to the number of targets and their movements which bring load balancing and fault tolerance to the WSN. Also, a discrete-event-based simulator is developed in order to analyze the basic DCS, R-DCS, and AR-DCS in various circumstances of target number and their movements. The analytical results show that AR-DCS has better performance against R-DCS and DCS in storage process but also brings a short overhead in query/response process.
Footnotes
Academic Editor: Shigeng Zhang
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
