Abstract
Data aggregation promises a new paradigm for gathering data via collaboration among wireless sensors deployed over a large geographical region. Many real-time applications impose stringent delay requirements and ask for time-efficient schedules of data gathering in which data sensed at sensors are aggregated at intermediate sensors along the way towards the data sink. The Minimal Aggregation Time (MAT) problem is to find the schedule that routes data appropriately and has the shortest time for all requested data to be aggregated and sent to the data sink.
In this article we consider the MAT problem with collision-free transmission where a sensor can not receive any data if more than one sensors within its transmission range send data at the same time. We first prove that the MAT problem is NP-hard even if all sensors are deployed on a grid. We then propose a (Δ −1)-approximation algorithms for the MAT problem, where Δ is the maximum number of sensors within the transmission range of any sensor. By exploiting the geometric nature of wireless sensor networks, we obtain some better theoretical results for some special cases. We also simulate the proposed algorithm. The numerical results show that our algorithm has much better performance in practice than the theoretically proved guarantees and outperforms other existing algorithms.
Keywords
Introduction
Due to various existing and potential applications, Wireless Sensor Networks (WSNs) have recently emerged as a premier research topic. A WSN usually consists of a large number of small-sized and low-powered sensors deployed over a geographical area and a sink node where the end user can access data. All nodes are equipped with capabilities of sensing, data processing, and communicating with each other by means of a wireless ad hoc network. A wide range of tasks can be performed by these tiny devices, such as condition-based maintenance and the monitoring of a large area with respect to some given physical quantity, e.g., temperature, humidity, gravity, and seismic information.
In contrast to traditional networks (e.g., the Internet) which are address-centric, WSNs are intrinsically data-centric. In some applications of WSN, the end user needs to extract information from the sensor field with low latency. In this case, data sensed at some sensors related to the same physical phenomenon need to be aggregated and sent to the data sink efficiently. Real time data aggregation is a combination of data from different sensors according to a certain aggregation function, e.g., duplicate suppression, logical AND/OR, minima and maxima, and all requested data should be periodically delivered to the sink node within a certain period of time from the moment they are requested (after that data may be useless).
The stringent resource constraint and the sheer number of sensor nodes in WSNs pose unique challenges on time-efficient data aggregation. First, the sensor nodes operate on batteries and employ low-power radio transceivers to enable communications. Data packet sent by a senor (sender) reaches all its neighbor nodes within the transmission range of the sender; Sensors far from the data sink have to use intermediate nodes to relay data transmission. Second, collision resulting from a large number of simultaneous sending creates response implosion [1]: when two or more sensors send data to a common neighbor at the same time, collision occurs at this node, which will not receive any of these data.
Third, the data sent by a sender is received by any its neighbor (receiver) at which no collision occurs; the receiver fuses the data received with its own data (possibly null), and stores the fused data as its new data. In addition, the time consumed by a single sending-receiving-fusing-storing is typically normalized to one; parallel sending-receiving are desirable for reducing the network delay. Fourth, with the large population of sensor nodes, it may be impractical or energy consuming to pay attention to each individual node in all situations; for instance, the user wound be more interested in querying “what is the highest temperature in some specified areas?”
Motivated by various applications of time-efficient data aggregation for query-based monitoring WSNs, we study in this paper the Minimum Aggregation Time (MAT) Problem under collision-free transmission model, which guarantees the energy-efficiency since no data need to be transmitted more than once. The problem is how to, given a WSN with a distinguished data sink d which is interested in data on a subset S of sensor nodes, determine a data transmission schedule such that all data on S are sent and aggregated to d in the minimum time.
The remainder of the article is organized as follows. In Section 2, we first specify the network model and formalize the MAT problem, and then present some related works and summarize our contributions in this paper. In Section 3, we prove that the MAT problem is NP-hard even for some special case. In Section 4, we propose some approximation algorithms for the MAT problem, and give the theoretical proofs of their performance guarantees. In Section 5, we evaluate the average performance of the proposed algorithm through simulation and compare it with the existing algorithm. In Section 6, we conclude the article discussing on how to implement the proposed algorithm to achieve energy-efficiency as well as time-efficiency.
Preliminaries
Model Description
In view of the miniature design of sensor devices, we assume that all sensors in WSNs are fixed and homogeneous. More specifically, the WSN under investigation consists of stationary nodes (sensor nodes and a sink node) distributed in a Euclidean plane. Assuming the transmission range of any sensor node is a unit disk (circular region with unit radius) centered at the sensor, we model a WSN as a unit disk graph (UDG) G = (V, E) in which two nodes u,v ∈ V are considered neighbors, i.e., there is an edge uv ∈ E joining u and ϛ, if and only if the Euclidean distance ‖ u− v‖ between u and ϛ is at most one. Hereafter we reserve symbol G for UDGs modelling WSNs, and Δ for the maximum degree of G. It is always assumed that G is connected. We assume in this paper that communication is deterministic and proceeds in synchronous rounds controlled by a global clock. In each time round,
Each node can send data (be a sender) or receive data (be a receiver) but cannot do both; Each node can receive data from at most one of its neighbors; Data packet sent by any sender reaches all its neighbors simultaneously; Any node can receive data only if exactly one of its neighbors sends data.
Note that the above conditions guarantee collision-free data transmit/receiving between senders and receivers. In Fig. 1a, two time rounds are required if si needs to send its data to ri for i = 1, 2 since when they send their data in the same time round r2 will receive data from both s1 and s2 causing collision, which is not allowed (due to Conditions (3–4)). For the same reason, in Fig. 1b, two time rounds are required if s1 and s2 both need to send their data to r1. Moreover, we assume that each receiver updates its data as the combination of all data received in different rounds; this enforces that each node needs to send data at most once.

Description of network assumptions.
An instance of the MAT problem is denoted by (G, S, d), where the set
As usual (e.g., in [2]), we assume that each sensor node knows its geometric position in the network, which is considered the unique ID of the sensor (the aggregated data may include some of these IDs). We further assume that the sink has global knowledge of IDs of all sensors in the WSN. When it needs some data of particular interests at some sensor nodes, it informs those nodes, by multicasting, of the schedule {(S1, R1), …, (Ss, Rs)} which may be represented by IDs of senders and receivers. Upon receiving the request, sensor nodes will send their data or receive data from others as specified in the schedule. In such a way, the schedule guarantees collision-free data aggregation. It also enables significant energy savings since sensor nodes are in an energy conserving state when they do not participate in sending/receiving. Prior to the scheduled time for data aggregation, a node switched from the energy conserving state to the energy consuming state, transmits or receives data and then goes back to the energy conserving state.
Most of the works on data aggregation focus on energy efficiency such as [3], [4]. There are two recent works on time efficiency [5], [6]. They studied a special case of MAT problem, called convergecasting problem, where S = V\d (i.e., data at all sensors are required to be sent and aggregated to the data sink). Annamalai et al. [5] proposed a centralized heuristic that constructs a tree rooted at the sink node according to the proximity criterion (a node is assigned as a child to the closest possible parent node) and to assign each node a code and a time slot to communicate with its parent node. However, the miniature hardware design of nodes in WSNs may not permit employing complex radio transceivers required for spread spectrum codes or frequency bands systems. Additionally, the heuristic is evaluated only through simulations and no theoretical analysis for their methods was given. More recently, Kesselman and Kowalski [6] devised a randomized distributed algorithm for convergecasting that has the expected running time O(log|V|). An assumption central to their model is that sensor nodes have the capability of detecting collisions and adjusting transmission ranges, and that the maximum transmission range might be as large as the diameter of the network. This complicates the sensor hardware design and poses challenge to low-transmission-range constraint on sensors.
Minimum Broadcast Time (MBT) problem [7] is very similar to the MAT problem. Initially, d has a message to be broadcasted to every node in the network; At each time round any node that has received the message is allowed to communicate the message to at most one of its neighbors. MBT problem asks for the broadcast schedule of minimum number of time rounds required for every node receiving the message. This problem can be considered as a relaxed version of MAT problem with S = V\d which allows, in Fig. 1a, s1 and s2 to send their data to r1 and r2 at the same time round, respectively. Clearly, MBT problem schedules the data to flow from d to all nodes in S while MAT problem schedules data to flow from all nodes towards d. It is known that MBT problem is NP-hard [7] and has a 2Δ-approximation algorithm [8].
Given an MAT instance (G, S, d), a shortest path tree (SPT) T of (G, S, d) is a tree in G consisting of shortest paths from d to nodes in S. The height of T, denoted by h(G, S, d), equals to the length of the longest path in T from d to leaves of T. The following lower bound can be easily obtained by applying the same argument used in the estimation of multicasting time in telephone networks [9].
Lemma 2.1
tOPT(G, S, d) ≥ max{h(G, S, d), log2 |S|} for any MAT instance (G, S, d). However, data aggregation in WSNs is not simply the reverse of broadcast/multicast in traditional telephone network. For example, it was shown in [10] that the broadcast time in a WSN is at most 648 times the height of SPT. Note also when the underlying topology G of WSNs is a complete graph, we have tOPT(G, S, d) = |V| while SPT gives a multicast time equal to 1 and MBT problem has minimum broadcast time of log2|V|.
Our Contributions
The first contribution of our work is a new and simple data aggregation model. It is collision-free and does not require any specialized codes so that data aggregation can be conducted in an energy-conserving manner. The second contribution is the NP-hardness proof of the corresponding problem even when all nodes are deployed at integer coordinates in the Euclidean plane. The third contribution is an approximation algorithm proposed for the MAT problem with theoretically proved performance guarantees. The algorithm uses a new technique that allows certain flexibility of tree-structures while scheduling parallel transmissions; in other words, instead of making a schedule after a tree is constructed as the existing methods do, it forms a data aggregation tree after a schedule is made.
Complexity Analysis
In order to prove the NP-hardness of MAT problem, we will apply some known results on orthogonal planar drawing. An orthogonal planar drawing of a planar graph H is a planar embedding of H in the plane such that all edges are drawn as sequences of horizontal and vertical segments. A point where the drawing of an edge changes its direction is called a bend of this edge. All vertices and bends are drawn on integer points. If the drawing can be enclosed by a box of width g and height g, we call it an embedding with grid size g X g. By a plane graph we mean a planar graph together with a planar embedding of it. Biedl and Kant [11] proved the following lemma.
Lemma 3.1
Given a simple plane graph H on g vertices that is not an octahedron and has maximum degree at most 4, there is a linear algorithm which produces an orthogonal planar drawing of H with grid size g × g such that the number of bends along each edge is at most 2.
Using the above lemma we can deduce the following lemma whose proof is very sophisticated and thus given in the appendix at the end of the paper.
Lemma 3.2
Let H be a plane graph on g vertices with maximum degree at most 4. Suppose that H is not an octahedron, and let H' be the graph obtained from H by replacing each edge in H with a path of length 120g2. Then H' is a unit disk graph and an orthogonal planar embedding of H' of grid size (40g2 + 40g) × (40g2 + 40g) can be computed in time polynomial in g.
We now explain how to prove that MAT problem is NP-hard by reducing the restricted planar 3-SAT problem to it. Let {x1, …, xn} and {c1, …, cm} denote, respectively, the sets of variables and clause in a Boolean formula Φ in conjunctive normal form, where each clause has at most 3 literals. Associate with Φ the formula graph each variable (unnegated or negated) appears at most three clauses, both unnegated and negated forms of each variable appear; and at every variable node in the planar embedding of GΦ, the edges in E2 that are incident with node x separate the edges in E1 incident to x such that all edges representing a nonnegative appearance are incident to one side of x and all edges representing a negative appearance are incident to the other side. It is known that the restricted planar 3-SAT problem is NP-complete [12].
Theorem 3.1
The decision version of MAT problem is NP-complete even when the underlying topology is a subgraph of a grid.
Proof
The proof is based on a reduction from restricted planar 3-SAT. Given any restricted planar 3-SAT instance ϕ on n variable and m clauses, from its planar formula graph Gϕ, we construct planar graphs Gk for positive integer k as follows.
To every variable xi, 1 ≤ i ≤ n, we associate a rectangle Xi and two node-disjoint paths Xi has exactly 10k nodes among which equally spaced nodes both II
i
and Pij is of length (6
i
— 4)k when it has an end in
Let
Next we show that the restricted planar 3-SAT instance ϕ is satisfiable if and only if the MAT problem on ( Every shortest path from Suppose the contrary that P is a shortest path from For For For
Now we assume that all data on V( for For for
Note from (c) that the data on oi (resp. for
Let us consider an arbitrary
Conversely, we consider the case where the restricted planar 3-SAT instance ϕ has a true assignment {x1∗, x2∗, xn∗}. Notice that for each
In this section we present an approximation algorithms SDA for the MAT problem that adopts the shortest data aggregation strategy: aggregating data along the shortest paths towards the sink. Theoretical analysis provide the worst-case performance ratios of the algorithm.
Basic Algorithm
Algorithm SDA proceeds by incrementally constructing smaller and smaller shortest path trees rooted at d that span all nodes in S. It, initially, sets T1 to a shortest path tree of (G, S, d). A number of iterations are implemented by SDA (refer to the pseudo-code below) and each iteration produces a schedule of a round. In the r-th iteration, Tr is a shortest path tree rooted at d spanning a set of nodes that possess all data aggregated from S till round r — 1. SDA selects from the leaves of Ti as the senders for round r. In Step 4–9, the variable Zr with initial value {leaves of Tr}\{d} is used for selection. The set Zr maintains the property that every non-leaf neighbor of a leaf in Tr other than d has a neighbor in Zr. The leaves of Tr other than d are examined in the decreasing order of the number of their neighbors in G that are non-leaf node in Tr. A leaf is eliminated from Zr if and only if the elimination does not destroy the property of Zr. When all leaves of Tr other than d are examined, the remaining nodes in Zr form the set Sr of the senders in round i. Subsequently, SDA eliminates Sr from its consideration by setting Tr+1 = Tr\Sr and ends the (r + 1)-th iteration. For a node set U in G = (V, E), the notation NG(U) is a shorthand of
Algorithm SDA S
One of the main ideas of our shortest-data-aggregation-based algorithms is to apply degree sorting and assign parallel transmissions (e.g., Step 5–9 of SDA). Intuitively speaking, we prefer to assign nodes of small degrees to send data before those of large degrees, and to arrange nodes of similar degrees to send data simultaneously. Both preferences increase potentially the number of parallel sendings/receivings and therefore reduce potentially the data aggregation time. Next we analyze theoretically the correctness and the performance of SDA.
Lemma 4.1
Let Ir consists of all informed nodes at the beginning of round r. Tr is a subtree of Yr and Zr are nonempty subsets of V(Tr) such that There is an 1-1 mapping between Sr and Rr in such a way that every sender z ∈ Sr corresponds to its receiver
Proof
We apply inductive arguments on r. First we examine the base case in which r = 1. Statements (i)-(iii) are trivially true since T is a shortest path tree whose leaves must be all in S. Using
Then we proceed to inductive steps. We check (i)–(vi) one by one for
Corollary 4.1
(i) S1, S2, …, Ss are pairwise disjoint. (ii)
Theorem 4.1
Given an instance (G, S, d) of MAT problem, Algorithm SDA produces a schedule in time of
Proof
The termination of SDA is guaranteed by Corollary 4.1 (iii). From Ts+1 = {d} and Lemma 4.1 (v) and (vi), it can be verified that Ts is a 2-node tree on Rs = {d} and it is the only sender in Ss. Since, by Lemma 4.1 (ii), all data on S have been aggregated to V(Ts) at the end of round s — 1, the schedule {(S1, R1), …, (Ss—1, Rs—1), (Ss, Rs)} output by SDA aggregates all data on S to d within s rounds.
To estimate the running time of Algorithm SDA, note that the computation of a SPT in Step 1 requires time
We now study the approximation performance ratio of Algorithm SDA. Denote h = h(G, S, d). Set Li = {nodes in T at i hops away from d} for every 0 ≤ i ≤ h + 1; in particular, L0 = {d} and Lh+1 = θ. Set Ti = θ for all i ≥ s + 2.
Lemma 4.2
Proof
The proof is by induction on i. The base case where i = 0 is justified by
Theorem 4.2
Given any instance (G, S, d), Algorithm SAD produces a schedule whose data aggregation time tSDA (G, S, d) ≤ min{(Δ — 1)h + 1, (Δ — 1)tOPT(G, S, d)}.
Proof
To prove the theorem, it suffices to show (i) |tSDA(G, S, d)| ≤ (Δ — 1)h + 1 and (ii) |tSDA(G, S, d)| ≤ (Δ — 1)TOPT(G, S, d). Recall from Lemma 2.1 that tOPT(G, S, d) ≥ h. If s = |SDA(G, S, d)| ≤ (Δ — 1)(h — 1) + 1 then we are done. So we assume s >(Δ — 1)(h — 1) + 1.
To justify (i), we deduce from Lemma 4.2 that
Next we prove (ii). In case of |Lh| ≥ 2, we have tOPT(G, S, d) ≥ h + 1 and (i) implies s ≤ (Δ — 1)tOPT(G, S, d). It remains to consider the case where |Lh| = 1. We may assume Δ ≥ 3 (since otherwise, G is a path or a cycle and s = h = tOPT(G, S, d)). It is obvious that S1 = Lh. Let G′ = G\Lh and S' consist of the nodes in S\Lh and the neighbor (parent) of Lh in T. Then T\Lh is a shortest path tree of (G′, S′, r) that has height h — 1; moreover, there is an implementation of SDA on (G′, S', d) which outputs {S′1, S′2, …, S′s—1} with S′i = Si+1 for all 1 ≤ i ≤ s — 1. Using (i), we have s — 1 = |tSDA(G′, S′, d)| ≤ (Δ — 1)(h — 1) + 1 since the maximum degree of G′ is upper bounded by Δ. It follows from Δ ≥ 3 that s ≤ (Δ — 1)h ≤ (Δ — 1)tOPT(G, S, d), and (ii) is proved.
In this subsection, we show that, when Algorithm SDA is applied to some special instances of the MAT problem, it will produce solutions with better theoretical guarantees. First, in view of Theorem 4.2, SDA provides a (Δ — 1)-approximation for UDGs with maximum degree Δ. In a realistic situation, sensor devices cannot be too close or overlapped; thus it is reasonable to assume that the distance between any two nodes is no less than a positive constant λ. The UDG modelling such a sensor network is called a λ-precision unit disk graph [?], [14]. Krumke et al. [15] showed that the maximum degree of a λ-precision UDG is at most [2π/λ2]. Consequently, we have the following result.
Corollary 4.2
Given any instance (G, S, d) with λ-precision G, Algorithm SAD produces a schedule whose data aggregation time
Next we exhibit some local properties of UDGs, which ensure that Algorithm SDA has a better performance guarantee in some other special cases. Let v be a node in a UDG G =(V, E) of degree d. Then all nodes in NG(v) = {v0, …, vd—1} are located within a disk centered at v with radius 1 and boundary B of length 2π. Corresponding to every vi(0 ≤ i ≤ d — 1), let b
i
be a point on B such that ||b
i
— vi|| is minimized. If ||vi — vj||>1, then the angle between the ray originated at v through vi, b
i
and the ray originated at v through vi, b
i
and the ray originated at v through vj, b
j
is greater than π/3. This implies ||b
i
— b
j
||>1. Thus
Lemma 4.3
Let G =(V, E) be a planar UDG. Then its maximum degree Δ ≤ 15.
Proof
Let
If an arc in B of length at most π/3 contains four distinct points
To see Δ ≤ 15, assuming the contrary Δ ≥ 16, we consider vi, b
i
, i = 0, 1, …, 15, and do all additions involving subscripts in modulo 16. Without loss of generality suppose that b0, …, bΔ—1 are on B in clockwise order. It is instant from (3) that
If li,i+1 > π/3 for some i, then (4) implies a contradiction
Similarly, if both
If
By (6), suppose without loss of generality that
Corollary 4.3
Algorithm SDA produces a schedule to the MAT problem within approximation ratios 3 for grid graphs, 14 for planar UDGs, and
Proof
The first two bounds come directly from Theorem 4.2 and Lemma 4.3 immediately. We prove the third bound by showing
Establishment of a sensor network can be carried out in either a random way (e.g., dropped from an airplane) or a nonrandom way (e.g., fire alarm sensors in facility). In the latter case, WSNs have nice properties including bounded-degree; consequently our theoretical results assure satisfactory approximation for the MAT problem. Thus we focus on the former case and use 100-node network shown in Fig. 2a as our test network. This network was randomly generated within a 200 × 200 square region. The sink d is selected to be the leftmost node (the larger white node in Fig. 2) among the 100 nodes. We use a similar simulation technique to that in [16]. In our simulations, the transmission range varies from 21.692 to 40 so that the number of edges of the UDG modelling the WSN varies from 167 to 546 (see Fig. 2b and c), where 21.692 is the minimum transmission range that guarantees the network to be connected. The variations of other parameters are summarized in Table 1. To evaluate the proposed algorithm SDA we compare its performance tSDA with that of convergecast algorithm in [5], denoted by tAGS.

Randomly generated WSNs used in simulation.
Graph parameters and convergecasting times
m: the number of edges Δ: the maximum degree.
h: the height of the bread first search tree with root d.
It can be observed from Table 1 and Fig. 3 that SDA has a performance much better than the other algorithm. Moreover, the ratio of tSDA to h is much less than the theoretical performance ratio Δ — 1. This highlights the advantages of assigning parallel transmissions according to degree order in our shortest-data-aggregation-based algorithm SDA.

Comparison of performances.
In this paper we have studied the MAT problem aiming for time-efficient data aggregation in WSNs. We first prove the problem is NP-hard even for some special cases, and then propose an approximation algorithm for the problem with provable performance guarantee.
In order to achieve both time-efficiency and energy-efficiency, we can implement our shortest data aggregation algorithm in such a way that it saves more energy at the expense of a small increase of data aggregation time. Given a MAT instance (G, S, d), a tree of G spanning
In our work we assume that all sensor nodes have the same transmission range. For future work it would be interesting to study the case with adjustable transmission ranges. In addition, it is worthy of studying how to deal with 3-dimensional cases.
Footnotes
Appendix
Proof of Lemma 3.1. By Lemma 3.1, the orthogonal planar embedding stated below is derived in time polynomial in g.
H has an orthogonal planar drawing D of grid size gXg in which every edge of H has at most 2 bends. An orthogonal planar drawing D′ of H with grid size (40g2 + 40g) X (40g2 + 40g) can be obtained from D in a direct way that point (x, y) in D maps to point (40nx, 40ny) in D′, and every vertical (resp. horizontal) segment of unit length (segment for short) in D maps to a vertical (resp. horizontal) path of length 40g in D′ between the images (in D′) of the ends of the segment (in D). (The additional 40g is set for the further modification on the drawing.) It is straightforward from (a) that each edge of H has length at most 120g2 in D′. We propose to modify D′ into an orthogonal planar drawing of H′ with grid size (40g2 + 40g) X (40g2 + 40g) in which every edge of H′ is a vertical or horizontal segment and any two nonadjacent vertices of H′ are drawn on two points with distance at least two. To this end, we consider a 18gx18g grid K and For every even integer j with 18g ≤ j ≤ 120g2, K contains an Without loss of generality suppose that We turn back to the drawing D′ Consider an arbitrary edge e of H of length In D′, point Now we put a 18gX18g grid Ke such that In the embedding D″ of H′ the distance between any point (vertex) in Pe\{ Suppose the contrary that cc ∈ V(Pe) — { Combining (c) and (e), we conclude that H′ is a UDG and establish the lemma.
