Abstract
We study the problem of data gathering in wireless sensor networks and compare several approaches belonging to different research fields; in particular, signal processing, compressive sensing, information theory, and networking related data gathering techniques are investigated. Specifically, we derived a simple analytical model able to predict the energy efficiency and reliability of different data gathering techniques. Moreover, we carry out simulations to validate our model and to compare the effectiveness of the above schemes by systematically sampling the parameter space (i.e., number of nodes, transmission range, and sparsity). Our simulation and analytical results show that there is no best data gathering technique for all possible applications and that the trade-off between energy consumptions and reliability could drive the choice of the data gathering technique to be used. In this context, our model could be a useful tool.
1. Introduction
Wireless sensor networks (WSNs) are composed of a lot of tiny, low power, and cheap wireless sensors, deployed in a geographic area to perform distributed tasks, for example, to monitor a physical phenomenon [1]. In 2004, MIT Technology Review ranked WSNs as the number one emerging technology [2] and today they are effectively employed for many applications, such as surveillance (e.g., real-time area audio or video surveillance), security (e.g., detection of biological agents or toxic chemicals), habit monitoring (e.g., environmental measurement of temperature, pressure, or mechanical vibration), home automation, military systems, and, in general, scientific experiments.
In a typical WSN topology, we can distinguish between ordinary wireless sensor nodes and base stations named sinks. The sink is usually connected to a power supply and it is capable of performing more complex operations than the ordinary nodes. Ordinary wireless sensor nodes, which are capable of transferring processed or raw sensed data to the sink, due to economical reasons, are instead usually powered by small size batteries that in most application scenarios are difficult or even impossible to replace or recharge.
So, in contrast to many other wireless devices (e.g., cellular phones, PDAs, and laptops), usually it is not expected to renew the energy supplied to a wireless sensor node during the life of the WSN. For this reason, each sensor node is required to work under very low power consumption conditions.
In general, to design a highly energy-efficient WSN, it is extraordinarily important to take into account capture, transmission, and routing issues, that is, data gathering techniques that specify how ordinary sensors work for gathering information and delivering them to the sink. As a consequence, data gathering is the main and more critical function provided by a WSN.
The main aim of this paper is to compare some of the state-of-the-art data gathering techniques considering their trade-off between reliability (i.e., packet loss and reconstruction error) and energy consumptions (i.e., network lifetime) by taking into account both compression and networking aspects. To the best of our knowledge, this is the first paper that considers such type of comparison for data gathering techniques belonging to different research fields (i.e., signal processing, compressive sensing, information theory, and networking related techniques are discussed and compared in this paper). Specifically, we derived a simple analytical model able to predict the energy efficiency and reliability of several data gathering techniques. Moreover, we carry out simulations to validate our model and to compare the effectiveness of the above schemes by systematically sampling the parameter space (i.e., number of nodes, transmission range, and sparsity).
The rest of the paper is organized as follows. In Section 2, we present a summary of related works. In Section 3, further details about existing data gathering techniques are provided by highlighting their advantages and drawbacks. In Section 4, the simulation scenario used for comparisons is detailed and an analytical model able to predict the energy efficiency and reliability of different data gathering techniques is derived. In Section 5, the metrics used to compare data gathering techniques are introduced. In Section 6, simulation results are provided and the developed analytical model is validated. Finally, in Section 7, some concluding remarks and future works are drawn.
For the sake of clarity, symbols and notations used throughout the paper are reported in Notations section.
2. Related Works
In the past few years, several data gathering techniques have been proposed for WSNs with the main aim of reducing energy consumptions in WSNs by exploiting correlations among sensory data. We can distinguish them into two broad categories: compression-oriented and networking-oriented.
The first category, named compression-oriented, is focused on maximizing network lifetime by taking advantage of data compression techniques [3–10]. In particular, [3, 4] analyze different lossless compression schemes for WSNs exploiting the temporal correlation in the sampled signals; in [5, 6], the authors exploit spatial correlation by using distributed source coding techniques based on the Slepian-Wolf theorem; finally, [7–10] investigate the fundamental limits of data gathering techniques based on the new paradigm of compressive sensing [11, 12]. A comprehensive review of existing data compression approaches in WSNs is provided in [13].
Further details about compression-oriented data gathering techniques will be provided in Section 3. In particular, signal processing, compressive sensing, and information theory related techniques are discussed, respectively, in Sections 3.1, 3.2, and 3.3.
Since radio transmission is the primary source of power consumption in WSNs, a second category of data gathering techniques, named networking-oriented, have dealt with the problem of maximizing network lifetime by taking into account network protocols and, more specifically, forwarding/routing mechanisms [14–17].
In particular, in [14], the authors show how it is possible to maximize the lifetime of a WSN by exploiting routing algorithms. In [15], the authors study the problem of forest construction for maximizing the network lifetime and adopt a simple data aggregation model where an intermediate sensor can aggregate multiple incoming messages into a single outgoing message. A different approach is proposed in [16] where a smart splitting technique is used in order to achieve different trade-offs between reliability and energy saving. Finally, in [17], the authors address the problem of maximizing network lifetime by taking into account also latency and reliability.
However, with the exception of [15], the above papers do not perform any kind of data aggregation with the aim of not introducing extra delay.
As shown in [18–21], by combining data aggregation and routing mechanisms, efficient data gathering schemes can be obtained.
In particular in [18], the problem of jointly optimized routing and data aggregation is investigated; in [19], the authors combine data compression and multipath routing techniques to obtain a reliable and low-latency data aggregation scheme; in [20], an energy-balanced data gathering and aggregating scheme is proposed which integrates a clustering hierarchical structure with the compressive sensing to optimize and balance the amount of data transmitted; finally in [21] a data gathering technique based on the network coding paradigm is proposed.
Further details about the above works will be provided in Section 3.4.
Several other papers exist which compare signal processing techniques in WSNs from energy efficiency and network lifetime perspectives and several works highlight the effect of using different routing mechanisms for data gathering.
Nevertheless, techniques belonging to different research fields such as compressive sensing, information theory, and networking are seldom evaluated against one another and this is the main goal of this paper.
Only recently, such comparisons have started, for instance, [22] where lossy data aggregation techniques are evaluated and compared in terms of reconstruction errors and energy consumptions. However, authors do not consider the impact of network reliability (i.e., packet loss) nor compare networking-based data gathering techniques such as those based on the network coding paradigm.
The aim of this paper is to fill this gap by investigating the effectiveness of all the above data gathering techniques also in terms of reliability. In particular, an analytical model able to predict the energy efficiency and reliability of different data gathering techniques is derived.
3. Data Gathering Techniques
We can classify data gathering schemes on the basis of the research field from which the technique used to exploit correlation among sensor nodes is drawn, that is,
signal processing; compressive sensing; information theory; networking.
Techniques belonging to the above fields are discussed in the next subsections by highlighting their advantages and drawbacks.
3.1. Signal Processing Techniques
Frequently high correlations (spatial and/or temporal) among sensor readings exist. In this case, it is inefficient to deliver the entire raw data to the destination [8, 9] and signal processing, in particular Transforms and Encoding Compression (TEC) techniques, can be exploited in order to reduce the amount of data to send.
In the case of local TEC techniques, node collects measurements following the Shannon-Nyquist sampling theorem; these measurements are transformed and properly encoded and the output of such transformation is stored in the payload of one or more packets and sent to the sink. In particular, either lossy or lossless techniques can be used depending on the particular application scenario.
With lossy techniques [3], the original data is compressed discarding some of the original information; this allows achieving higher compression ratios but at the receiver side one can only reconstruct the data with a certain accuracy.
However, in some types of monitoring, the accuracy of observations is critical for understanding the underlying physical processes. In other cases, it is not possible to have an a priori knowledge about the magnitude of observational errors that are tolerable without affecting a correct data gathering. Moreover, some application domains (e.g., body area networks BANs in which sensor nodes permanently monitor and log vital signs) demand sensors with high accuracy and cannot tolerate measurements corrupted by lossy compression processes.
In all these kinds of WSNs, lossless data gathering is essential and desirable. Examples of local lossless compression schemes have been proposed in [4, 23, 24].
Lossy compression techniques have been evaluated and compared in terms of reconstruction errors and energy consumptions in [22]; therefore, in this paper we concentrate our attention on lossless techniques.
For the sake of space, we did not consider distributed TEC techniques in this paper but we refer the reader to [13]. Nevertheless, the major distributed approaches of signal processing applied to WSNs will be discussed in the next subsections.
3.2. Compressive Sensing
Compressive sensing (CS) is a new paradigm introduced by Candes and Tao [11] and Donoho [12] used to capture and to compress signals in WSNs where compression and sampling are merged and carried out at the same time. Basically, CS compresses a signal while acquiring data at its information rate (without relying to the Shannon-Nyquist sampling theorem). CS theory states that if a signal is sparse or compressible in a certain basis, then it can be reconstructed from a small number of linear measurements by solving an
More precisely, let us define k-sparse signals
Note that, considering that k is a small value in comparison to n, it follows that m can be much smaller than n and therefore high compression ratios can be achieved using CS (i.e., by transmitting CS measurements y instead of raw data x).
Reconstruction is achieved by solving a complex optimization problem of the following form:
Several algorithms exist which are able to solve the above optimization problem (Basis Pursuit [25], OMP [26], and CoSaMP [27], to name just a few) and several theoretical results exist describing when these algorithms recovered sparse solutions. In particular,
as proved in [28], a signal x can be recovered with high probability if Φ satisfies the Restricted Isometric Property (RIP). Formally, a matrix Φ satisfies the RIP if for all k-sparse signals x exists a Example of matrices that satisfy the RIP condition are As proved in [29], in the noise-free case, exact recovery with Gaussian matrix can be obtained if
We will exploit the above results to derive a reliability model for CS.
CS can be applied in cluster-based WSNs considering that each sensor node in a cluster sends its reading
On the basis of the CS theory, under sparsity condition, by collecting a sufficient number of CS measurements, the sink will be able to reconstruct the original sensor data
CS can be applied also in tree-based WSNs considering that the source node includes in its packet the sensing information which is the product of its acquired value and a random coefficient and then sends it to its next hop node [30]. In such way, CS compression could be performed with a low complexity at source nodes [9] and data traffic over the network is reduced [8]. However, in the last case a high number of hops are needed, that is,
In [7], comparisons of CS-based and conventional signal processing techniques for WSNs have been carried out in terms of energy efficiency and network lifetime.
However, there are several challenges that must be addressed in order to use CS:
Decoding time for reconstruction can be CS assumes that the sensed data has a known constant sparsity, ignoring that the sparsity of real signals varies in temporal and spatial domain. In particular, the sparsifying basis Ψ is assumed to be given and fixed with time, but this is not the case for a realistic WSN scenario, where the signal of interest is unknown and its statistical characteristics can vary over time. CS-based techniques introduce not negligible losses (recovery errors) by reducing reliability and work best for large scale networks (at least a thousand nodes). Quantization effects: CS theory has mostly focused on real-valued measurements but in practice measurements must be represented with a finite number of bits. As a consequence, a trade-off exists between the number of measurements m and the number of bits per measurement
In this paper, we concentrate on the last two problems, by analyzing the effect of sparsity and quantization on energy saving and reliability. Further details on CS and how to exploit it for WSNs will be given in the next sections.
3.3. Information Theory Related Techniques
In order to exploit the correlation of data concurrently acquired by different sensors, DSC techniques, inspired by the Slepian-Wolf theorem, can be applied [2]. The DSC techniques imply that each sensor node sends its compressed outputs to the sink for joint decoding. This means that the nodes need to cooperate in groups of two or three so that one node provides the side information and another one can compress its information down to the Slepian-Wolf or the Wyner-Ziv limit. Furthermore, DSC approaches are also difficult to be applied in such scenarios since they work with the assumption that the statistical characteristics of the underlying data distribution should be known in advance [9, 31].
The most practical and well-known implementation of DSC is DISCUS [5, 32] where sensor nodes are considered divided into clusters. For each cluster, a node (the cluster head) sends uncompressed data (as side information) while all other nodes transmits encoded (i.e., compressed) data.
To encode data, a sensor node firstly divides all possible values into disjoint sets (named bins) so that values in the same bin have a minimum distance d. Each piece of sensory data is then compressed with a code that identifies the unique bin where the sampled value lies.
To better explain how DISCUS works let us consider a simple example.
Let us suppose that (quantized) measurements are in the integer range
In the above example, only the cluster head transmits 3 bits for each measurement while for all the other nodes 2 bits are enough; therefore, compression is achieved.
However, DSC relies on the assumption that statistical characteristics (i.e., correlation function) of the underlying data should be known a priori, which is difficult to obtain in practical scenario. For instance, the simple DISCUS scheme discussed above works only if the difference between the value sampled by the cluster head and all the other nodes in the same cluster is less than
A simple manner to improve reliability is achieved by retransmitting cluster head packets more times but this reduces compression efficiency. Therefore, a trade-off exists between energy consumptions and reliability on the basis of the maximum allowed number of retransmissions. In this paper, we investigate such a trade-off.
3.4. Networking Techniques
3.4.1. Routing-Based Techniques
Since radio transmission is the primary source of power consumption at the nodes, the design of energy-efficient routing is another important topic to investigate in the design of data gathering technique. The basic idea is to route the packet through the paths so as to minimize the overall energy consumption for delivering the packet from the source to the destination. The problem focuses on computing the flow and transmission power to maximize the lifetime of the network, which is the time at which the first node in the network runs out of energy [18]. Specifically, the energy consumption rate per unit of information transmission for each node depends on the choice of the next hop, that is, the routing decision. This choice can influence the energy required to reach the sink [14].
One of the most recent works which addresses the problem of maximizing network lifetime taking into account the routing mechanism is [17]. The authors try to achieve both low latency and high reliability. They construct a data gathering tree based on a reliability model, schedule data transmissions for the links on the tree, and assign transmitting power to each link accordingly. However, they do not perform any kind of data aggregation or data compression with the aim of not introducing extra delay.
Data aggregation can be performed on top of the routing algorithm. The aggregation function is usually performed by extracting some statistical values (e.g., maximum, minimum, and average) and then by transmitting only these [15]. In such a way, it is possible to reduce the amount of communicating data in the dense sensor networks and reduce the power consumption. However, this technique loses much of the structure of the original acquired data.
In particular, the authors of [15] study the problem of forest construction for maximizing the network lifetime. They adopt a simple data aggregation model and assume that an intermediate sensor can aggregate multiple incoming B-bit messages, together with its own message, into a single outgoing message. Moreover, they provide a polynomial time algorithm to build the tree and demonstrate that it is close to optimal.
In [14, 33, 34], the author considered the problem of maximizing the lifetime of WSNs by routing algorithms by recasting this problem as a linear programming problem solvable in polynomial time. The proposed algorithm is a shortest cost path routing whose link cost is a combination of transmission and reception energy consumption and the residual energy levels at the two end nodes.
In [18], the authors try to jointly optimize routing and data aggregation so that the network lifetime can be extended considering two dimensions. In the first dimension, the traffic across the network is reduced by data aggregation, so that one can reduce the power consumption of the nodes close to the sink node. In the second dimension, the traffic is balanced to avoid overwhelming the bottleneck nodes. A smoothing function is used to approximate an original maximization function by exploiting the special structure of the network. The necessary and sufficient conditions for achieving the optimality of this smoothing function were derived and a distributed gradient algorithm was accordingly designed.
Yang et al. propose in [35–37] a joint design of energy replenishment and data gathering by exploiting mobility. The SenCar, a multifunctional mobile entity, periodically chooses a subset of sensors to visit based on their energy status. It utilizes wireless energy transmissions to deliver energy to the visited sensors and, meanwhile, it collects data from nearby sensors via short-range multihop communications and can convey this data to the sink.
3.4.2. Network Coding Based Techniques
A different approach exploits the network coding (NC) paradigm.
NC is an effective information transmission approach originally introduced by Ahlswede et al. [38] to improve network capacity of multicast networks.
Differently from the classical store and forward network paradigm where nodes simply replicate and forward incoming packets, using NC intermediate nodes in the network have the ability to forward functions of received packets (e.g., linear combinations). In this manner, throughput gain, robustness, and energy saving can be achieved by exploiting the fact that each newly generated packet carries information contained in several original packets [39].
NC has received increasing attention also in WSNs as a promising tool to improve network lifetime and reliability by exploiting the broadcast nature of the wireless channel [40].
However, most of the proposed techniques developed so far [41–43], whilst being useful for data dissemination (e.g., traffic from the sink node to the sensor nodes), cannot be applied for data collection, which is the most important traffic in WSNs.
In fact, to apply NC for data gathering in WSNs, some issues have to be solved.
(i) Header Overhead. NC schemes are mostly based on random linear codes [44, 45] which allow implementing them in a distributed manner but introduce large overhead because coefficients used for linear combinations should be specified in packets header. The header size is proportional to the number of aggregated packets that, in the specific case of data collection in WSNs, could be equal to the number of nodes in the network.
(ii) All-or-Nothing Problem. When n packets are combined using NC, the sink has to receive at least n packets in order to be able to recover the original information. Thus, even if the sink receives
(iii) Delay. The delay introduced by NC might be prohibitive for large networks where a large number of packets should be combined and decoded. Instead, many sensor networks applications, for instance, WSNs developed for control/automation or real-time audio/video streaming, require small bounded delays.
(iv) Duty Cycling. Most of NC schemes are based on overhearing; that is, nodes should remain in active mode to participate in NC-based routing, which increases the energy consumption of the sensor nodes. So, it is difficult to couple NC paradigm and duty-cycling techniques commonly used in WSNs.
(v) Reliability. Full (or at least high) reliability is desirable in sensor networks and is mandatory in several scenarios, for instance, in new scientific experiments, where accuracy of observations is critical, or in the case of biomedical applications, where it is necessary to ensure that important details are not lost causing errors in medical diagnosis. When random codes are used, even in a reliable network, the original messages can be retrieved with “high probability” (though not “certainty”), and high probability is achieved through the use of large finite fields (i.e., large coefficients and therefore large headers).
(vi) Complexity. NC techniques should be simple to cope with low computational and memory resources of sensor nodes.
We refer the reader to [46] for further considerations on the applicability of NC to WSNs.
The above issues have been solved in [16] where the authors proposed a new forwarding technique for WSNs based on the Chinese Remainder Theorem (CRT) able to achieve different trade-offs between reliability and energy saving.
Basically, CRT can be seen as a splitting technique able to transform an integer number Z into a vector of smaller numbers named CRT components,
CRT states that every integer number Z can be exactly recovered from its CRT components if the product of prime numbers
Coefficients
CRT can be applied in WSNs to split packets produced by sensor nodes. Such smaller packets (i.e., CRT components) can be sent through different paths by exploiting path diversity of WSNs. The fact that relayers nodes forward smaller packets allows reducing energy consumption [47].
Moreover, CRT has several advantages in comparison to other NC techniques:
The set of prime numbers Differently from coefficients used for NC techniques, the set of prime numbers can be obtained directly by the sink (i.e., CRTs avoid header explosion). CRT can be efficiently combined with duty-cycling techniques [48] and distributed compression algorithms [19] to achieve an efficient data aggregation technique.
Considering the above advantages and the fact that this paper is focused on data gathering techniques for WSNs, we will consider CRT as representative of networking-based data gathering techniques.
4. Simulation Scenario
In this section, we will discuss the WSN model used for comparisons and simulations of data gathering techniques.
4.1. Network Model
We assume a WSN where the sink is located in the center of a square sensing area of size
The network is partitioned into nonoverlapped clusters using the procedure described in [16, 49]. The above-mentioned procedure is mainly based on the exchange of Initialization Messages (IMs) and allows organizing the network in clusters minimizing the number of hops needed by a sensor node to reach the sink. The sink is supposed to belong to cluster 1 (denoted as
We assume that the above initialization procedure is carried out only one time so we neglect related energy consumptions.
In the following, nodes along the path from a source to the sink are referred to as relayers and nodes located one hop away from the source along the path to the sink are specifically called one-hop relayers.
We assume that, independently of the specific data gathering technique, relayers transmit packets through a load-balancing shortest-path scheme; that is, a node in cluster
4.2. Data Gathering Model
Until now, we have classified data gathering techniques on the basis of the data aggregation technique used. However, data gathering techniques can be classified also considering the factors that drive data acquisition. In particular, four broad categories can be distinguished [50]: event-driven, time-driven, query-based, and hybrid.
In event-driven category, data are generated when an event of interest occurs, while in the time-driven category data are periodically sent to the sink at constant interval of time; in query-based category, data are collected according to sink requests. Finally, the hybrid approach is a combination of one or more of the above.
For simulation purpose, with the aim of evaluating energy consumptions and reliability, all the above categories can be unified with an abstraction of the concept of event, that is, by simply considering that data must be sent as a consequence of an event.
In particular, in query-based network an event is triggered by the reception of the query message while in time-driven networks the event can be associated with the rising clock edge of the sampling unit or, more practically, when a sufficient number of measures has been collected and a packet is ready to be transmitted.
Considering the above abstraction, energy consumptions and network reliability can be evaluated in terms of number of events (i.e., packets sent) by not taking into account who drives the event.
Therefore, in our simulation scenarios we will consider that
In event-driven networks, usually small packets are sent to specify that an event has been detected (a single w-bit word could be sufficient in most cases). Instead in the case of time-driven or query-driven networks, packets represent M measures collected in the time interval between two events. Both cases can be taken into account considering that for each event raw information of
With the aim of reducing energy consumptions (i.e., the overall number of bits sent), raw data are not directly transmitted; instead, data are processed according to the chosen data gathering technique.
More precisely, we have the following.
(i) TEC. Nodes using TEC techniques exploit temporal correlation to reduce the number of bits.
Here we do not consider a specific TEC technique but assume that the compression factor,
As already stated, we assume that packets are transmitted through a load-balancing shortest-path scheme; that is, a node in cluster
(ii) DSC. According to DISCUS, we assume that only one node for each cell (henceforward named the cell head) sends uncompressed measures (i.e., side information) into a packet of
So considering that
(iii) CS. We assume that the cell header collects the packets of the other nodes in the same cell and sends them by applying CS. More precisely, the collected measures can be represented by a matrix
For simulation purpose, we suppose that CS measurements are represented by
So the overall number of bits transmitted by the cell head for each event when CS is used is
Other choices are possible without altering the overall number of transmitted bits
(iv) CRT. CRT is exploited as shown in [16].
In particular, we suppose that for each event
As in the case of CS, the collected measures can be represented by a matrix
CRT relayers process the data of each column In the first step, received data are compressed with a compression algorithm by obtaining a binary sequence S. In the second step, CRT is applied to improve reliability by splitting the binary sequence S so that each CRT relayer forwards a CRT component.
It is worth noting that each CRT relayer will independently compress the received packets by obtaining the same sequence S. This is possible mainly because as they receive the same data set and apply the same compression algorithm, the compressed sequence S obtained is the same for all relayers.
Henceforward, we indicate by
CRT relayers split the binary sequence S they have constructed and forward it. Specifically, the sequence S is interpreted as an integer
Note that
From the theory of CRT, the sink will be able to reconstruct all raw measurements from the CRT components provided that the reconstruction condition is satisfied (i.e.,
Note that the reconstruction condition can be satisfied by multiple sets of prime numbers; however, to reduce the number of bits needed to represent values
For instance, if
However, when the set of primes is chosen as above, the message can be reconstructed if and only if all the CRT components are correctly received by the sink. So, to take into account the possible losses due to the wireless medium unreliability, we use the
4.3. Source Model
As shown in [13] and references therein, differences among two consecutive samples of several real-world data (temperature, humidity, solar radiation, etc.) fit well with Gaussian distributions. So, in this paper we consider that sensed data
This choice is motivated also by the fact that several analytical results are well known for Gaussian distribution and can be readily exploited to obtain the maximum lossless compression factor for correlated Gaussian sources.
As is well known when compression of discrete sources is considered, Shannon's entropy H gives the lossless compression limit.
For Gaussian correlated data, under suitable assumptions and without loss of generality, it can be shown that, considering
Moreover, considering that for a broad range of values (i.e.,
Therefore, ideal (maximum) lossless compression factor for Gaussian variables considering blocks of N correlated values of w-bit each can be obtained as
Throughout the paper, we assume that
As regards CS, the actual compression factor is related to the sparsity level
Note that
So we consider two cases: an ideal sparsity level
Finally in the case of CRT we consider that the simple MinDiff algorithm proposed in [4] is used for compression.
Basically, MinDiff encodes a set of uncompressed data
The number of bits
Therefore, its compression factor considering blocks of N values of w-bit each can be obtained as
4.4. Energy Model
Similarly to other works (e.g., [16, 19]), we consider a simple energy model where for each bit to be transmitted a node spends an energy equal to
For instance, let us suppose that a node for sensing and processing M measures of w bits needs an energy equal to
Finally, if also the energy needed for reception must be included and it differs from the energy used for transmission, considering that for almost all nodes the number of bits received is equal to the number of bits transmitted, it will be sufficient to use
Therefore, the main simplification introduced by our model is that we consider
5. Performance Metrics
In order to estimate the energy efficiency of the above techniques, let us introduce the Energy reduction factor,
When an ideal lossless network is considered, the
For instance, in the case of TEC-based data gathering is
However, in the case of lossy network, the number of transmitted and received bits is different and the expected energy reduction factor has to be expressed taking into account the actual number of bits forwarded.
For comparison purpose, we decided to evaluate energies considering nodes belonging to cluster 2 (i.e.,
We restrict our analysis to the nodes of the second cluster for two reasons. Firstly, these nodes are the most critical as they represent the sinks neighbors. In fact, if these nodes run out of energy, the sink remains isolated. Secondly, network lifetime is defined as the time until the first node in the network dies and with high probability, if not certainty, this node belongs to
Finally, considering that network lifetime is related to the maximum energy consumed by a node in this paper, we investigate also the energy reduction factor related to the maximum energies:
Concerning reliability, we consider that a node fails to forward a packet with probability
In fact if h is the number of hops needed to reach the sink and
Note that the reliability of TEC techniques is related only to network parameters
Differently from TEC techniques, for all the other data gathering techniques, reliability can be improved with a proper settings of design parameters. Nevertheless, a trade-off exists between reliability and energy saving as briefly discussed below.
(i) CRT. In the case of CRT data gathering primes, numbers can be selected so that all raw measures can be reconstructed even if at most f CRT components are lost.
As shown in [16], when f is fixed the reliability can be estimated as
As general rule, high reliability can be obtained by fixing f so that
This result can be justified by considering that
For instance, in Figure 1 we show the reliability

Higher reliability can be obtained by further increasing the value of the parameter f. However, by increasing f energy consumptions are increased too so in the next section we investigated the trade-off between reliability and energy consumptions for different values of f.
(ii) DSC. Also in the case of DSC, the probability to lose a packet is
So, in order to improve the reliability we considered that cell head transmits
In this case, DSC reliability can be evaluated as
In Figure 2, we can compare reliability of TEC and DSC for two different values of the loss probability per hop (

As it is possible to observe by fixing
Obviuosly, reliability increases with number of retransmissions
(iii) CS. By indicating with m the number of packets sent with CS, the probability to receive at least
Therefore, by increasing m it is possible to guarantee that, with the desired probability
However, differently from all previous techniques, CS reliability is not related only to the number of received packets so that
Nevertheless, if we assume that raw data
Therefore, in our simulations, the reconstruction error,
From simulation point of view, actual CS reliability can be evaluated as
We show in the next section that by choosing
Two reconstruction algorithms are considered for
6. Simulations Results
In this section, we compare data gathering techniques in terms of energy consumptions and reliability.
The results have been obtained through a custom C++ simulator. For each set of parameters mean results are reported considering 20 random topologies where nodes are uniformly distributed in a square area of size
If not otherwise stated,
6.1. ERF with Reliable Networks
To asses the simulator we first analyzed an ideal (fully reliable) WSN and evaluated the
As shown in Figures 3 and 4, the results obtained through the analytical model (see (20)) and those reported by the simulator are very close to each other for all the values of σ and

ERF of data gathering techniques in the case of reliable network (

ERF of data gathering techniques in the case of reliable network (
The same results are obtained for different values of M, that is, by changing the number of raw measures per packet. This can be easily justified by the fact that
On the basis of the previous results, we can state that in the case of reliable networks TEC and DSC have a greater
Note that in our simulations
Finally, note that although CRT appears to be the worst in terms of
By comparing Figures 3 and 5, we can see that CRT and CS achieve different values of ERF when the sensing radius, r, changes. In particular,

ERF of data gathering techniques in the case of reliable network (
This can be justified considering that when r decreases, the sparsity level
It is worth nothing that our analytical model is able to anticipate this result.
Also in the case of CRT, the ERF slightly decreases for lower values of r due to the fact that when
Similar considerations can be made about the node density ρ. In fact, as it is possible to observe by comparing Figures 3 and 6,

ERF of data gathering techniques in the case of reliable network (
It is worth noting that Figures 6 and 5 report quite similar ERF values despite different values of ρ and r being used for simulation. This result can be justified by considering that network density and sensing radius have been changed but without altering the overall number of source nodes (i.e.,
6.2. ERF with Unreliable Networks
In Figures 7 and 8, reliability and ERF of data gathering techniques for

Reliability of data gathering techniques for

ERF of data gathering techniques for
On the basis of the simulation results, we can state that even in the case of unreliable networks TEC and DSC have greater ERF (see Figure 8). However, their reliability is fully determined by the packet loss probability and cannot be improved; instead, by using CRT and CS higher reliability can be achieved by increasing f and m, respectively.
In particular, as shown in Figure 7, by fixing
It is important to note that the reliability plotted for CS is the actual reliability obtained after reconstruction with an ideal (oracle-based) reconstruction algorithm (i.e.,
By choosing m so that
Obviously, high reliability is achieved at the cost of a lower ERF but, by comparing Figures 8 and 3, we can state that the impact on the ERF is quite low (both
As a consequence when high reliability is needed even with unreliable networks, CRT and CS should be preferred.
6.3. CS Reliability
In all the previous simulations, we have fixed the number of bits
To convince the reader in Figure 9, we report simulation results about the actual reliability

As it is possible to observe, when
Similar results have been obtained for different values of w.
However, in practice, actual reliability depends also on the reconstruction algorithm used. For the sake of completeness, we report in Figure 10 CS reliability when the CoSaMP [27] algorithm is used for reconstruction.

As it is possible to observe also in this case
6.4.
and Network Lifetime
The ERF metric is an expression of mean energy consumptions; instead, network lifetime is more closely related to maximum energy consumptions.
In Figure 11, we report

As it is possible to observe, TEC and DSC have higher
Finally, note that CRT and CS have similar performance if the actual sparsity degree of CS is 20% more than the minimum (ideal) value. These observations can be extended also to all the previous simulations.
7. Conclusions and Future Works
In this paper, we have compared several data gathering techniques used in WSNs by using both simulation results and analytical models. In particular, the effectiveness of the above techniques has been investigated in terms of reliability (packet loss and reconstruction errors) and energy efficiency (i.e., ERF and network lifetime) by systematically sampling the parameter space (i.e., number of nodes, transmission range, and sparsity). Basically we can summarize our results as follows:
DSC and TEC techniques should be preferred for maximizing network lifetime. CS should be preferred when high reliability is needed. CRT should be preferred for its inherent low complexity.
As a consequence, we can state that there is no best solution for all possible applications and that only the trade-off between energy consumptions, reliability, and complexity can drive the choice of the data gathering technique to be used for a specific application.
As future work, we plan to refine and improve the model to deal with actual correlated measurements (i.e., not only Gaussian data) and more realistic propagation channels (i.e., by taking into account actual distance between nodes).
Footnotes
Notations
Competing Interests
The authors declare that they have no competing interests.
