Abstract
In wireless sensor networks, the aim of storage protocols is to efficiently replicate data across nodes and to improve data collection and querying by sinks. Among them, in-network storage protocols replicate data in a set of nodes that depends on some characteristics such as network topology and geographic location. Researchers have proposed various techniques to implement in-network storage. In this article, we summarize and highlight the key ideas of existing protocols which are further classified into three categories (reactive, unstructured proactive, and structured proactive) based on topology, load balancing, transmission strategy, and reliability. Benefits and drawbacks of each protocol are studied and compared with different requirements. Finally, future research directions are provided for efficient in-network storage in wireless sensor network.
Introduction
In the last decades, wireless sensor network (WSN) was widely studied in academic and industry worlds. A wireless sensor node is a low-cost communicating electronic device, allowing the monitoring of its environment. Contrary to classic ad hoc networks, WSNs have limited resources including processing capabilities, memory, and energy/battery. Thus, specific algorithms/protocols are developed for WSN at physical, medium access control (MAC), routing, transport, and application layers.1,2 The main MAC protocols for WSN could be found in the study of Huang et al., 3 whereas the most important routing protocols are presented by Guo and Zhang. 4 Transport protocols for WSN are summarized in the study of Ghaffari. 5 Finally, application scenarios for WSN are presented by Borges et al. 6 In addition, various low-power technologies were standardized for WSN such as IEEE 802.15.4 7 covering physical and MAC levels. On top of IEEE 802.15.4, ZigBee standard 3 was also proposed as routing and application protocol.
Indeed, the common WSN goal is to collect the environmental data and send it to users. Data quit WSN through sink nodes. The sink is a powerful node which allows linking WSN to an external system such as Internet. WSN data could reach sink node through the following methods: local storage, external storage, or in-network storage.
Local storage is a data management scheme that stores data in the node producing them. To gather that data, the sink sends a query to the node that stores it. This process has some disadvantages. First, when nodes capture many events, they have to store a lot of data and thus they soon exhaust its scarce resources. Second, usually sinks do not cognize in advance which node is storing the data. Thus, sinks have to flood a query to all nodes (i.e. communication overhead problem).
External storage is a data management scheme that sends data toward the sink using routing protocols as soon as they are produced. This process has some disadvantages. First, the produced data cannot be aggregated. Second, if many sinks are deployed in WSN, the node duplicates the data and sends it toward each sink. In addition, the sink could miss the data under mobile scenario. The external storage approach is not feasible in applications where the WSN has an intermittent connection with a mobile sink.
In-network storage scheme is oriented toward solving above drawbacks by replicating data in a set of nodes in WSN. In this scheme, rendezvous nodes across the network are used for data storage (i.e. data replicators). When a node detects an event, it sends data to the rendezvous nodes. In turn, sink interested on that data queries the rendezvous nodes to retrieve the information using the minimal number of hops (i.e. reduce the complexity and the energy consumption for sink query). In the work of Rumín et al., 8 in-network and external storage processes are compared. They show that in-network storage offers lower energy consumption and thus improve the WSN lifetime during data collection. The concept of in-network storage and its benefits during sink querying are illustrated in Figure 1(a) and (b), respectively. Rendezvous nodes are illustrated by gray dots and the data producer by black dot.

In-network storage in WSN and its advantages: (a) in-network storage and (b) external sink request.
In the literature, in-network storage schemes have been developed mainly for the following WSN purposes:9–14
Mobile sink management: each node replicates its data in rendezvous nodes. A mobile sink could then collect data by only visiting or querying these rendezvous nodes.
Node failure resilience: each node replicates its data in various nodes in the network. The information is thus available in WSN even when a set of nodes are unreachable (e.g. damaged or battery-exhausted).
In-network storage concept is first introduced by Shenker et al. 15 in 2003 as a new protocol allowing data dissemination and replication in WSN. Various authors have developed in-network storage protocols for improving/extending the original proposal depending on network topology, sink position, sink mobility, optimal rendezvous nodes position, and so on. Several well-known techniques in WSN were exploited by authors for this purpose including position-aware communication, flooding, tree, clustering, and random walk. This article aims to provide an overview of existing in-network storage protocols. In the literature, there is no survey on this subject. Thus, to our knowledge, this article is a first contribution toward overviewing such scientific research field. It surveys the published protocols starting from 2003 to 2018 (i.e. in-network storage mechanism is first introduced in 2003 as discussed before). The existing protocols are classified into different categories, and their techniques are detailed. Their advantages/drawbacks are discussed and compared regarding various requirements of in-network storage in WSN. Each protocol performances are shown based on existing comparison works in the literature which are cited along the article. Based on our literature investigation, this article is finally a contribution toward outlining future research issues for an efficient in-network storage solution.
The article is organized as follows. Design requirements of in-network storage protocols and the associated challenges are explained in section “Requirements and challenges.” In section “In-network storage protocols for WSN,” the existing protocols are classified and discussed. The open research issues are outlined in section “Opportunity and open issues for efficient in-network data storage in WSN.” Finally, section “Conclusion” concludes the article.
Requirements and challenges
This section presents first the WSN features. Then, the in-network storage requirements and the corresponding challenges are summarized.
WSN features
In WSN, many sensor nodes are usually deployed in inaccessible zone for human-like environment supervision applications.16,17 To emphasize these application requirements, the following WSN features are considered.
Large scale
The communication range of each node is limited to few meters. Thus, to cover the whole deployment area, a large number of nodes are used (i.e. hundreds even thousands).
Limited resources
Sensor nodes use very limited memory, processing, and energy resources (i.e. nodes are battery powered). Despite this feature, WSN has to successfully operate for long period of time.
Low power
Since nodes are battery powered, several approaches are used in WSN applications for reducing the energy consumption and hence ensuring long network lifetime. 17 For example, some applications employ duty-cycle mode. For that, each node is in sleep mode most of the time to reduce the power consumption to the minimum. And periodically, the node wakes up to process few tasks. Other applications tend to decrease the rate of radio transmissions/receptions as much as possible (i.e. radio communication is the main activity responsible for the energy consumption in WSN 3 ). Some other applications tend to exploit energy harvesting resources for improving the network lifetime such as solar, vibration, and piezoelectricity. 1 Despite these techniques, reducing the energy consumption is always a major feature of WSN.
Dynamic network
WSNs are ad hoc-based networks, thus environmental changes affect the network topology. In some application scenarios, the replacement of node battery is impossible (e.g. volcano supervision and nodes integrated inside the building concrete). Thus, battery-exhausted nodes might cause significant topology change in WSN. In other applications, topology changes are due to the mobility of several nodes (e.g. sink attached to mobile object and nodes attached to animals or vehicles).
Requirements of in-network storage
In-network storage spreads data from a source node to all other nodes in WSN. To ensure sufficient quality of service (QoS), in-network storage defines three requirements as follows.
Reliability
High reliability is usually required for in-network storage. Here, high reliability means (1) each data message can reach any node in the network; (2) the data message must be received by nodes without errors (i.e. erroneous packets should not be stored). Any fault in these two points leads to network inconsistency and data loss. Thus, high reliability is a major requirement for successful in-network storage process.
Efficiency
The two major efficiency requirements of in-network storage are time and energy. Time efficiency means the in-network storage process should be finished as soon as possible. If the wireless channel is occupied for long time by a high number of messages, the network remains ineffective during this period. 18 Hence, this period of time has to be reduced as much as possible.
Energy efficiency means the in-network storage process should be executed with low power consumption. This is due to the limited energy resources of sensor nodes as discussed above in WSN features. Energy consumption includes flash memory operation, radio communication, and idle listening. The flash memory operation is unavoidable to store data. Wireless communication consumes most of the energy in WSN, 19 thus the radio-on time for executing in-network storage should be limited.
Scalability
A successful in-network storage approach should settle to any WSN scale (i.e. to any number and density of nodes).
Challenges of in-network storage
Regarding WSN features and above requirements, in-network storage confronts various key challenges as follows.
Reliability
High reliability is mostly required; however, it is not trivial to obtain. The network connectivity may change over time due to the dynamic feature of WSN. Sink or several nodes may break away from the network and come back later. Thus, it could miss the disseminated data which lead to in-network storage inconsistency and data loss. 20 In-network storage must cope with such issue and ensure that all nodes and sinks are able to receive the data.
Communication overhead
Communication overhead is the excessive redundant transmission outcome to the broadcast storm problem. 21 A high traffic toward rendezvous nodes leads to many interference/collisions of messages. Thus, excessive radio transmissions incur high energy consumption in WSN. This problem should be avoided for reliable and energy-efficient in-network storage process.
Hotspot problem
Nodes near the rendezvous location are more likely to expend their energy resources earlier than other nodes. This is due to multi-hop route intersection and traffic concentration around the rendezvous nodes as shown in Figure 2. Hotspot nodes are illustrated by red dots in Figure 2. When these nodes expend their energy earlier, rendezvous location will no longer be accessible for data source nodes. This issue is known as the hotspot problem 22,23 which must be avoided.

Illustration of the hotspot problem.
Memory overhead
Here, limited resources bring limitations to storage capacity. For example, MicaZ node has 4KB RAM and 128KB flash and TelosB has 10KB RAM and 48KB flash. In-network storage should be fitted to the memory capacity of sensor nodes. For example, if same rendezvous nodes (storage nodes) are always used during long period of time, the memory of these nodes would be saturated leading to memory overhead problem. Then, the network will not be able to further store data. For solution, different nodes should be selected as rendezvous ones in each period of time.
Energy efficiency
For energy-efficient storage process, the wireless communication and the radio-on time should be minimized. As discussed before, excessive redundant transmissions incur high energy consumption in WSN. This issue must be avoided.
In the following section, in-network storage protocols are discussed in terms of these requirements and challenges.
In-network storage protocols for WSN
In this section, our research methodology to accomplish this study is first detailed. Then, the existing in-network storage protocols are classified into various categories, and their benefits and disadvantages are discussed.
Research methodology
In our study, we collected articles that have been published in popular scientific journals and conferences and provided the most important information to researchers who investigate in-network storage in WSN. To this end, an extensive search was carried out to find the WSN terms “data storage, in-network storage, data replication, or data dissemination” in titles, abstracts, and keywords. Our study attempts to document the high interest and the techniques of in-network storage protocols in WSN and to provide an overview of the literature. For this end, we looked first for survey papers in the literature, and we found that no article attempts to overview existing in-network storage protocols. Then, we collected and studied articles that only develop protocols for in-network storage in WSN (i.e. 31 protocols are collected and studied). Furthermore, some extra papers were also studied in an effort to propose a research direction for the efficient in-network storage solution in WSN.
During the overview process, we targeted four main library databases that cover most of the in-network storage studies in WSN, namely, IEEE Xplore (41% of the studied papers), ScienceDirect (32%), Springer (12%), and ACM Digital Library (10%). Other library databases (5% of the studied papers) are also targeted in this article including SAGE, MDPI, and Wiley. All the cited conferences in this study were published by IEEE Xplore and ACM Digital Library. We focused on papers published over the last two decades, starting from 2003 (i.e. in-network storage mechanism is first introduced in 2003 15 ). Figure 3 gives the valuable information regarding the number of papers, over the years, that are overviewed and discussed in this study.

Number of papers studied (in this survey) per year.
Overview results
Considerable protocols for WSN have been published in the literature to achieve in-network data storage. In this article, existing protocols are classified as reactive and proactive as shown in Figure 4. In reactive scheme, following an event, nodes send and store data in the zone around the event location (e.g. mobile sink appearance). In contrary, in proactive scheme, nodes replicate their data in WSN for future event anticipation. Two approaches are identified in proactive scheme: structured and unstructured. In structured approach, data are stored in a virtual structure of nodes (i.e. hexagon, border, line, grid, quadtree, rail, and ring) making data disposable later for sink collection (i.e. the sink sends request toward the structure to gather information). In unstructured approach, data are however stored in multiple nodes across the whole network. The protocols of each category are presented in the following.

Classification of in-network storage protocols for WSN.
Reactive storage
This class of in-network storage protocols relies on position and trajectory prediction for reacting to mobile sinks appearance in WSN. As shown in Figure 4, reactive storage protocols include QBDCS, ALURP, MobiQuery, and BPPDD. Each protocol is presented as follows.
QBDCS
QBDCS 24 is proposed for mobile sink collection and based on sink trajectory prediction. First, the sink sends requests to its interest WSN zone. Request message contains the following sink information: direction, speed, and its current position. When the request is received, each node in the interest zone predicts the future sink position. Data messages are then sent toward the predicted location as shown in Figure 5. Each data message is locally flooded and replicated in nodes within the expected sink zone as shown in Figure 5. When crossing the area, the sink can then easily collect data.

Illustration of QBDCS, the storage nodes are represented by light gray dots and the mobile sink is the triangle.
Advantages and drawbacks: QBDCS limits the flooding to a few nodes in WSN which reduce the overall communication overhead and the energy consumption during storage process. However, the trajectory prediction feature restricts the application domain of QBDCS, especially for indoor scenario as discussed in Cheng et al. 24
ALURP
Unlike QBDCS, ALURP 25 does not require the position of nodes. First, the mobile sink uses global flooding for notifying its presence to the whole nodes. As consequence, the source node sends data message toward a local zone (of few hops wide) in the network instead of sending it to the sink. Data are replicated in nodes of the local zone, and further transmitted to the sink when this one is in their radio range. Then, the mobile sink re-notifies his movement using local flooding in this zone. When the sink quits the local zone, the global flooding is again employed.
Advantages and drawbacks: Similar to QBDCS, ALURP introduces a local zone around the sink to limit flooding, and hence it reduces the communication overhead and the energy consumption. Nevertheless, the global flooding is still required. ALURP performance depends highly on the local area width. A large local area width increases the overhead of local flooding, and a small local area width increases the frequency of global flooding. 25
MobiQuery
MobiQuery 26 relies on a pre-fetch approach, detailed by Lu et al., 26 in which knowledge of the sink’s trajectory is used to prepare the appropriate nodes for storing the data. For this end, each node in the sink trajectory begins local tree construction (i.e. each node is a tree root). Each tree is limited to a specified depth called Tdhops. Then, nodes within each tree transmit their data toward the tree root. These data will be aggregated and stored at the tree root which then transmits it to the sink when this later is in its radio range.
Advantages and drawbacks: MobiQuery avoids redundant transmissions through data aggregation within each tree structure, thus reducing the memory overhead and the network contention. However, MobiQuery suffers from imperfect data collection by the sink due to considerable data forwarding delays (caused by trees construction) and location errors (i.e. the data are stored only in root nodes which risk missing the sink). 26 Unlike MobiQuery, QBDCS and ALURP replicate data in various nodes in the expected sink area to avoid missing the sink as discussed previously.
BPPDD
BPPDD 27 proposes a position-aware comb-needle scheme based on push–pull technique. The BPPDD principle is relatively simple compared to previous discussed protocols. If an event was detected, data are replicated in various nodes similar to a needle. For example, a node pushes data toward its vertical neighbors and thus builds a vertical needle. The sink can then gather data by sending the query across the network similar to a comb. For high reliability, the mobile sink can generate various requests in different locations; hence, various comb structures can be built along the sink trajectory. To minimize the communication overhead, BPPDD dynamically adjusts the comb and needle based on query frequency. If the number of queries is low, needles are short and combs are fine, and vice versa.
Advantages and drawbacks: BPPDD provides low communication overhead solution. The comb-needle strategy ensures better performance than original push–pull mechanism as shown by Liu et al. 27 However, BPPDD is inefficient for large-scale and sparse WSN. 27
Structured proactive storage
In structured proactive storage, the rendezvous nodes form a virtual structure (i.e. grid, hexagon, border, line, rail, quadtree, and ring) within the WSN. Data are thus available for sinks which send requests toward these storage structures. This class of protocols uses the node’s position to form a virtual structure within the WSN. The protocols are classified by types of structure (see Figure 4): grid-, hexagon-, border-, rail-, line-, quadtree-, and ring-based. The protocols of each sub-class are presented as follows.
Grid-based technique
As shown in Figure 4, grid-based storage protocols include TTDD, GBEER, CMR, and DLB. Each protocol is presented as follows.

Structured proactive techniques for in-network data storage in WSN: (a) grid, (b) hexagon, (c) border, (d) rail, (e) line, (f) quadtree, and (g) ring.
Advantages and drawbacks: TTDD allows easy sink access to the grid. The sink reaches the grid structure using minimum number of hops. Nevertheless, TTDD incurs high communication overhead as each source node independently constructs a separate grid 28 (i.e. broadcast storm problem especially for applications where many sensors nodes generate data).
Advantages and drawbacks: Unlike TTDD, GBEER proposes a common single grid structure rather than separate grids for each source node. Thus, GBEER allows a lower communication overhead than TTDD. 29 However, nodes in the grid structure are hotspots. To avoid this issue, the grid structure has to be periodically changed.
Advantages and drawbacks: The drawbacks of CMR are similar to GBEER. In fact, CMR introduces a message duplication mechanism for increasing the reliability. Nevertheless, this mechanism further raises the hotspot problem for nodes in the grid.
Advantages and drawbacks: DLB reduces the overall energy consumption and increases the storage efficiency due to the multi-threshold and cover-up schemes. Unlike GBEER and CMR, DLB efficiently avoids the hotspot problem. However, DLB incurs high message loss rate in dense WSN as discussed in the work of Liao et al. 31
Hexagon-based technique
HPDD and HexDD are two hexagon-based storage protocols in WSN as shown in Figure 4. Each protocol is presented as follows.
Advantages and drawbacks: HPDD offers lower delay than GBEER for storage and collection processes. 33 Similar to GBEER, HPDD nodes in the hexagon structure are hotspots.
Advantages and drawbacks: In HexDD, border lines confine data messages and sink requests within a subset of the grid resulting in low communication overhead. Nevertheless, similar to HPDD, nodes of hexagons border lines are hotspots. No counter measure is proposed to avoid this problem.
Border-based technique
As shown in Figure 4, EAPDD 35 is a border-based storage protocol in WSN. It stores data in an aggregation point on the network edges (see Figure 6(c)). EAPDD was developed to enhance the network lifetime and reduce the processing time of sink query. To this end, each node that has a distance d from the deployment zone’s boundary is selected as an edge node. A set of nodes are selected as aggregation points (APs) among edge nodes. APs are then used as rendezvous nodes. Once identified, AP nodes begin a network self-organization process: each AP broadcasts a hello message which is forwarded by only its neighbor edge nodes as shown in Figure 7. This process allows each node to recognize its nearest neighbor AP to construct the network perimeter and also to delimit the perimeter into multiple edge portions. The portion delimited by two APs is referred to as an edge as shown in Figure 7. Then, the storage process consists of aggregating network data within AP nodes. Data generated by source nodes are sent toward the edges through horizontal and vertical paths. Data are then routed to the closest AP through edge nodes as shown in Figure 7.

Illustration of EAPDD: the aggregation points are represented by light gray dots, the edge node by dotted circle, and the data source node is the black dot.
Advantages and drawbacks: EAPDD offers significant fast query processing since data are stored in the network border. However, AP are hotspots and, thus, edges of the network have to be periodically changed. In addition, EAPDD could be applied only for static network as discussed in the study of Doss et al. 35
Rail-based technique
RailRoad protocol 36 constructs a virtual infrastructure called “rail.” The rail introduces a closed-loop band which has the shape defined by the network outline as shown in Figure 6(d). Nodes of the rail are called rail nodes. The source nodes send information about their data (meta-data) to the nearest rail node. This rail node then constructs a station which is a portion of the rail (centered on this rail node). The meta-data is replicated on all the nodes of the station. The sink query, including the sink position, is sent toward the rail for meta-data. When a station node is reached, it informs the data source node of the sink position. Thus, the source node can send the requested data directly to the sink.
Advantages and drawbacks: RailRoad avoids storm problem by reducing the broadcast on the rail structure using the station mechanism. In addition, storing only meta-data avoids memory overhead in rail nodes. Nevertheless, RailRoad data delivery delays are high as the sink query has to travel first through a long rail structure. 36 Then, when the query reaches a corresponding station, the source node should be informed of the sink position to finally start data delivery.
Line-based technique
LBDD 37 introduces a vertical band of nodes which splits the network deployment area into two equal portions as shown in Figure 6(e). Nodes in this structure are called in-line nodes. The source node sends data toward this structure, and the first encountered in-line node stores the data. As for query process, the sink sends query toward the line structure. The query is flooded throughout the line until an in-line node storing the data is reached. Then, this in-line node transmits the data directly to the sink.
Advantages and drawbacks: LBDD introduces a line structure which is simple to construct using position-awareness mechanism detailed by Hamida and Chelius. 37 Since query flooding on the line increases the energy consumption rate, the line structure should be wide enough to avoid hotspots and broadcast storm problems.
Quadtree-based technique
QDD and DQT are two quadtree-based storage protocols in WSN as shown in Figure 4. Each protocol is presented as follows.
Advantages and drawbacks: Quadtree construction ensures lower communication overhead than other structures discussed above in this section. 38 However, no mechanism is proposed to counter hotspots around the center of each quadrant.
Advantages and drawbacks: Compared to QDD, DQT has lower quadrant construction delay, thus improving the data storage process throughout the WSN. However, DQT suffers from the same hotspot problem of QDD.
Ring-based technique
As shown in Figure 4, ring-based storage protocols include RingRouting and Artery which are presented as follows.
Advantages and drawbacks: RingRouting allows fast data delivery since the ring structure is easily accessible for quick sink position acquisition as discussed in the study of Tunca et al. 41 In addition, RingRouting ensures memory overhead resilience by only storing sink position in the ring structure instead of storing source node data. Furthermore, the proposed mechanism for hotspot problem resilience has low overhead. The drawback of RingRouting is its scalability: 41 the initial ring construction incurs high communication overhead for large/sparse networks.

Illustration of Artery: the clusters of the ring is delimited by dotted red line, cluster-heads in the ring are represented by black dots, and cluster-heads outside the ring are the light gray dots.
Advantages and drawbacks: Artery offers advantages similar to RingRouting with a difference that Artery does not support memory overhead resilience (i.e. only the sink position is stored in RingRouting, whereas all data are stored in the ring using Artery). Similar to RingRouting, Artery also suffers from the scalability, its efficacy decreases when increasing the number of nodes. 42
Unstructured proactive storage
In unstructured proactive data storage, different techniques are proposed by researchers as shown in Figure 4: flooding-, random-walk-, tree-, and cluster-based. In the following, protocols belonging to each technique are presented.
Flooding-based technique
Flooding-based protocols rely on the message broadcasts throughout the entire WSN for delivering data to rendezvous nodes. Since flooding incurs high energy consumption due to excessive radio communication rate as discussed in literature,19,44 unstructured proactive flooding-based protocols use mechanisms such as counter-based and probabilistic flooding. These mechanisms attempt to avoid unnecessary broadcasts in WSN (i.e. to avoid broadcast storm problem). Flooding-based storage protocols include DEEP, USEE, MHopC, tinyDSM, CStorage, and CNCDS. Each protocol is presented as follows.
Advantages and drawbacks: Using DEEP, the sink visits only a subset of nodes to collect data produced by the entire network. However, the protocol introduces an excessive message redundancy in WSN. 47 The probabilistic flooding mechanism could be improved to be more selective for reducing communication overhead and improving reliability. Moreover, a node in DEEP keeps track of all received messages during its operation, 45 which is impracticable when a lot of messages were flooded.
Advantages and drawbacks: USEE offers good performance in dense WSN, uniformity of data replication, and high storage capacity of WSN. However, USEE suffers from increasing delay as it uses the counter-based flooding: this mechanism is based on random assessment delay (RAD). Instead of immediately broadcasting a message m, a node i adds m to its casting queue during RAD and in parallel counts the number of duplicated/received messages of m from its neighbors. If the counter reaches a preset threshold Cth, it cancels the broadcasting. This mechanism reduces the communication overhead while having a high reliability (i.e. data reach all the nodes). However, it increases the latency of message treatment in each node which increases the overall storage process delay throughout the network.
USEE was compared to DEEP in the work of Mekki et al. 47 They show that USEE and DEEP have similar performance in terms of reliability since they uniformly replicate data throughout the whole network. However, compared to DEEP, USEE offers lowest memory overhead and lowest communication overhead since counter-based flooding ensures less broadcasting rate than probabilistic one used by DEEP.

Illustration of MHopC (example for Hop = 1): the storage nodes are represented by light gray dots and the data source by black dot.
Advantages and drawbacks: Similar to USEE, MHopC offers low communication overhead and high reliability by employing the counter-based flooding to broadcast data within each neighborhood. In addition, MHopC uniformly replicates data in the whole network when a low counter value is used (e.g. Hop = 1 or Hop = 2) as discussed in the study of Mekki et al. 48 If a high counter value is used (e.g. Hop = 12), many large zones of WSN are obtained with empty data, which could lead to data loss during collection by mobile sink.

Illustration of tinyDSM (example for locality of two hops and four hops): the storage nodes are represented by light gray dots and the data source by black dot.
Advantages and drawbacks: tinyDSM broadcasts data messages only within limited area around the source node, which reduces the communication overhead effect in the network. However, when a node is configured to be a replicator, it is assumed to be 1 until it dies. Thus, no mechanism against hotspot problem is proposed. Furthermore, the replicator node stores the current data value of the source node as well as its historical values. Thus, many values changing in the list of stored data would cause this list to be long causing memory overhead as discussed by Piotrowski et al. 50
Advantages and drawbacks: CStorage is fully scalable and distributed since nodes independently make broadcast decisions without using any neighborhood information. Similar to DEEP, CStorage introduces an excessive messages redundancy in WSN as well as the energy consumption. 51
Advantages and drawbacks: CNCDS achieves lower energy consumption than CStorage by further reducing the total number of transmissions during the dissemination process. 13 However, CNCDS scheme is only based on the spatial correlation of sensed data. CNCDS could further improve the energy efficiency of WSN using both spatial and temporary correlations.
Random-walk-based technique
Random-walk is a dissemination mechanism that requires neither broadcasting nor virtual routing structure. In random-walk, a source node disseminates messages across the network by randomly selecting the next hop among its neighbor’s nodes. In the literature, DoubleCross and RaWMS are two random-walk-based storage protocols in WSN as shown in Figure 4. Each protocol is presented as follows.

Illustration of DoubleCross.
Advantages and drawbacks: DoubleCross efficiently reduces the energy consumption while achieving high data delivery rate. The data delivery rate is about 98% as discussed by Shi et al. 56 However, DoubleCross performance decreases for low network density since forwarding nodes cannot find appropriate next hops, so that the RLW line is terminated. 58

Illustration of RaWMS.
Advantages and drawbacks: RaWMS ensures a uniform data replication in WSN. The sink requires only visiting any subset of nodes to collect about 90% of the generated data in the whole network. 59 However, this is achieved at the cost of high communication overhead resulting in a short network lifetime (i.e. a high number of messages have to be repeatedly transmitted in order to meet uniform data replication in WSN). 45 RaWMS was evaluated and compared to DEEP and USEE in literature.45,47 Authors show that RaWMS incurs higher communication overhead than flooding-based mechanism of DEEP and USEE, especially when the number of sending messages by source nodes is low.
Tree-based technique
Tree-based protocols rely on construction of an overlaying virtual tree across the network. The data distribution is usually started from the root toward the leaves. Tree-based storage protocols include Supple, ProFlex, and Z-DaSt. Each protocol is presented as follows.
Advantages and drawbacks: Supple introduces an important improvement for reducing the number of transmissions in the WSN during data dissemination. Supple was evaluated and compared to RaWMS, DEEP, and USEE in the study of Mekki et al.47,49 Authors show that Supple offers the lowest communication overhead and ensures low energy consumption. However, no counter measure against hotspots is proposed around the root node.

Illustration of ProFlex multi-tree construction based on H-sensor nodes.
Advantages and drawbacks: ProFlex achieves good collection efficiency results. It performs data correlation for reducing the communication overhead. Unlike Supple, the use of different root nodes (H-sensors) mitigates the hotspot problem. Maria et al. 61 show that nodes around the roots in ProFlex relay about 91% less messages when compared to Supple. In addition, authors show that ProFlex offers a lowest communication overhead when compared to Supple, RaWMS, and DEEP.
Advantages and drawbacks: Z-DaSt benefits from the efficiency of ZigBee protocol. Thus, Z-DaSt is applicable only for WSN composed of ZigBee motes. In addition, it is better suited to low or moderate network density, 62 and no counter measure against hotspots is proposed around the root node. The data dissemination delay of Z-DaSt could be further reduced by employing an improvement scheme of ZigBee protocol as discussed in the work of Nefzi and Song. 63
Cluster-based technique
In this class, the protocols use clustering mechanism and usually cluster-heads are used as storage nodes. In fact, data storage using clusters is more efficient than tree structure as discussed in literature.64,65 SDS, CBDS, and EEMSRA are three cluster-based storage protocols in WSN as shown in Figure 4. Each protocol is presented as follows.

Illustration of SDS clustering and the data replication between cluster-heads.
Advantages and drawbacks: SDS reduces the communication overhead using the spatial-temporal correlation and ensures efficient data storage in dynamic environments as discussed by Shen et al. 66 However, SDS suffers from hotspot problem in cluster-heads, as once a node is selected as head, it is assumed to be 1 until it dies.
Advantages and drawbacks: CBDS advantages include self-adaption, load balancing, and low storage latency as shown in the study of Wang et al. 67 However, CBDS incurs high communication overhead for self-adaptation and continuous construction of clusters in the network.
Advantages and drawbacks: To increase the energy efficiency, EEMSRA uses an enhanced TDMA algorithm which is described by Xun-Xin and Rui-Hua. 69 Similar to CBDS, EEMSRA mitigates the hotspot problem by periodically changing the cluster-heads. Yet, EEMSRA offers lower communication overhead than CBDS for periodic cluster construction as discussed in the work of Xun-Xin and Rui-Hua. 69
Synthesis
Table 1 summarizes all the above-mentioned in-network data storage protocols in this article. Existing protocols are classified into three categories (reactive, structured proactive, and unstructured proactive) based on topology, load balancing, storage strategy, and transmission mechanism. In addition, Table 2 compares these protocols regarding in-network storage requirements, performance metrics used by authors, simulation tool, existence of testbed experimentation, and existence of a radio propagation model description.
Overview of in-network data storage protocols for WSN.
Performance comparison of in-network data storage protocols for WSN.
Reactive protocols are executed only if an event is detected in WSN (e.g. mobile sink appearance) which reduces the storage rate during the network lifetime as well as the energy consumption. The structured proactive approaches exploit a virtual structure (grid, hexagon, border, rail, line, quadtree, and ring) serving as a rendezvous zone for data and sink queries. These structures minimize the communication overhead of querying operations which are confined in few zones of the network. Nevertheless, nodes belonging to the structure are susceptible to become hotspots since they process and handle more traffic. Furthermore, these structured solutions usually depend on location information mainly using Global Positioning System (GPS) chipset, which would increase the energy consumption of WSN, its cost, and limit its application domain for indoor scenarios.
Flooding-based unstructured proactive approaches incur high communication overhead. The protocols employing the flooding technique are aware of this issue. Hence, they attempt to reduce the broadcasting rate using probabilistic or counter-based flooding. Other protocols also attempt to restrict the flooding within a limited area around the source node.
Tree-based unstructured proactive approaches simplify the storage process by employing the root node as a relay to disseminate data messages. Nevertheless, this technique increases the severity of hotspots on nodes around the root.
Random-walk-based unstructured proactive approaches are very simple to implement. However, a high communication overhead is generated to achieve a well-distributed data result leading to short network lifetime.
Finally, cluster-based unstructured proactive approaches can highly improve the energy efficiency and the overall network scalability. Clustering is an efficient way to minimize the energy consumption in WSN, accomplishing data aggregation and fusion to decrease the number of transmitted messages between nodes. On the contrary, these approaches can cause the cluster-heads to overload leading to high latency and hotspot problem in these nodes. Some protocols employing this mechanism are aware of this issue. Hence, they periodically change the cluster-heads to balance the storage load between all nodes.
Opportunity and open issues for efficient in-network data storage in WSN
The issues presented in this section provide perspectives which could enhance the in-network storage in WSN, making it more applicable and effective.
Among the above-mentioned storage classes, the cluster-based approach improves the energy efficiency and the storage process by electing cluster-heads (i.e. each cluster-head is responsible for storing data in its zone) and scheduling the communication of cluster-members (i.e. cluster-members are in sleep mode most of the time). Moreover, the clustering is a popular technique in WSN community 70 to efficiently reduce the energy consumption and accomplishing data aggregation/fusion to reduce the amount of transmitted messages between nodes.71,72 Despite that, in the literature, few works used the clustering mechanism for in-network data storage in WSN. An analysis in this direction could be done.
The most existing in-network cluster-based storage protocols employ LEACH algorithm for cluster construction and cluster-head selection. LEACH allows the cluster-heads to be periodically changed in order to mitigate the hotspot problem and improve the network lifetime. Despite this advantage, LEACH has problems which affect different performance metrics such as data throughput and network latency as discussed in the study of Xu et al. 73 For example, the cluster-heads are selected randomly (and periodically), thus the distribution and the optimal number of cluster-heads could not be ensured. The low and the high residual energy level nodes have the same chance to be selected as cluster-heads. Hence, low residual energy nodes may be chosen as cluster-heads leading these nodes to die first. In the literature, new recent LEACH-based algorithms were proposed for more efficient clustering solution such as LEACH-MAC, 74 P-LEACH, 75 YA-LEACH, 76 and LEACH-ICE. 77 Other improved LEACH-based protocols are presented in the study of Arora et al. 78 Studying these algorithms shows many improvements made in LEACH by considering parameters such as cluster construction strategy (centralized or distributed), cluster-head selection criteria (random, energy-based, position-based, or connectivity), node roles (cluster-head, cluster-member, vice-cluster-head, or relay node), round period length for cluster-head selection, and communication model (one-hop or multi-hop). Thus, in the future, in-network storage protocols could benefit from these improved LEACH algorithms for more efficient data replication/storage solution.
Other perspectives have to be considered for in-network storage in WSN including heterogeneous networks and security. Nowadays, WSN applications tend to employ heterogeneous network cooperation approaches which must be considered for performing in-network storage. In fact, storage in such networks is weakly studied in the literature. For example, Peng et al. 79 analyzed the problem of data replication in heterogeneous WSN. The problem is presented as minimum non-leaf node Steiner tree. They also proposed a protocol called HSR to face this issue. However, they considered heterogeneous networks as different storage zones. Nodes within each zone are first separated, and then data are replicated across each network (i.e. different strategies are used in each zone). This approach is inefficient in terms of communication overhead and energy consumption as discussed by Peng et al. 79 Thus, to improve data storage in heterogeneous networks, a research opportunity consists of developing protocols while avoiding the overhead of separating these networks.
Furthermore, security and integrity are vital in WSN such as supervising and monitoring in military applications. In such context, there is high probability that a malicious device plays the sink role, queries or negatively changes the stored data identity.80,81 Therefore, a secure communication for in-network storage needs to be addressed. The classic encryption methods are inefficient for WSN regarding its complexity and the limited resource feature of nodes. Thus, simple alternative algorithms have been proposed in the literature. Parthasarathy et al. 82 performed several data operations in the system file and the boot leader of iMote2 platform to avoid its remote programming by malicious third party. Tan et al. 83 presented an algorithm to ensure confidentiality and data protection during multi-hop dissemination. Other secure approaches for data storage were proposed in the literature84,85 based on polynomial key and adaptive polynomial management schemes. However, all these solutions do not meet at the same time the requirements of efficiency, high availability, and bits integrity. 86 For that, more research is needed to achieve efficient and secure scheme for in-network storage.
Finally, two issues should be considered in the future to validate and compare in-network storage protocols: (1) evaluation under various radio propagation models during simulation; (2) experimental study using testbeds. As shown in Table 2, most of the studied papers do not discuss or define the radio propagation model (fading, path loss, etc.). Analyzing these models is a major factor for effective performance evaluation of the protocol. In the literature, several papers analyze the existing WSN simulation tools. For example, Castalia simulator offers a realistic wireless channel, radio models, and node behavior as discussed in literature.87,88 In addition, the experimental study can prove the definitive performance of in-network storage protocols. In the literature, the number of protocols evaluated through testbeds is relatively low. As shown in Table 2, only two protocols—MobiQuery and CBDS—were evaluated using the MICA2 platform. Currently, experimental evaluations have become an essential part of WSN innovations. 89 Existing testbeds are summarized in the study of Abuarqoub et al. 89 Using testbeds will provide a real demonstration of reliability, applicability, and handling uncertainty and randomness for in-network storage protocols.
Conclusion
In this article, a survey of existing in-network storage protocols for WSN is presented. These protocols are first classified, and then benefits and disadvantages of each approach are discussed in terms of in-network storage requirements and challenges. We also provide several opportunities and remaining challenges to guide future researchers. In-network storage has major advantages for improving data collection performances in WSN and to overcome the problem of node failure. Therefore, developing an efficient in-network storage solution is still a promising research field.
Footnotes
Handling Editor: Luca Reggiani
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The research work has been supported and funded by the PERCCOM Erasmus Mundus Program of the European Union.
