Abstract
This article focuses on data aggregation scheduling problem with delay constraint in wireless sensor networks. Prior works on this problem have dealt with a tree topology wireless sensor network. However, in fact it is more common that the topology of wireless sensor network is graph topology. Delay-constrained data aggregation problem is formulated. To solve this problem, we propose an algorithm based on dynamic programming in the shortest path tree of the wireless sensor network. This approach classifies conflicts into two types, tree-inside conflicts and tree-outside conflicts with the aggregating tree. First, scheduling transmission time utilizes a dynamic programming algorithm. Then, transmissions with tree-outside conflicts are removed with maximum weight independent set in tree-outside conflict graph. As the scheduling performance depends on the aggregation tree, we propose another idea, simultaneous execution of aggregation tree construction and scheduling. We propose a greedy algorithm in wireless sensor networks. This approach is to maximize the number of scheduled nodes in every time slot from deadline to time slot 1. Simulation results show that greedy algorithm in wireless sensor networks outperforms dynamic programming in the shortest path tree and naive algorithm in terms of the effectiveness and the average delay.
Keywords
Introduction
In many wireless sensor network (WSN) applications, such as environmental monitoring, spatial exploration, and battlefield surveillance, sensed data are aggregated and transmitted to the sinks for analysis. 1 Sensor nodes generate real-time data and route data to the sink node by multi-hop forwarding. The sink node deploys a database, sends control commands, and analyzes data to detect some events.
Data aggregation is an important aspect of extracting useful information for many kinds of applications in WSNs. Because of the measurement error and the interference from the environment, the data from a single sensor node are unreliable. When a sink node requests aggregation function value of the sensor data, in-network data aggregation can get the aggregated data in the process of routing and save the energy cost as much as possible. Therefore, data aggregation can effectively decrease the amount of data forwarding and save energy cost of the data transmission process.
Delay is an important parameter in the process of data aggregation in WSNs, especially for the applications that need the data with high real-time performance. Minimizing maximum data aggregation time (MDAT) 2 problem is widely studied. However, the delay of the aggregated data cannot be limited in the MDAT problem. Actually, the applications usually need to get the aggregated data in the given deadline. For instance, suppose that a target is tracked, in order to ensure good tracking quality, the sink must obtain previous measurements in a timely manner so that the best subset of sensors for the next measurement is chosen. If the sink does not get the measurements in a timely manner, the target might have moved too far resulting in a poor measurement quality during future measurements. Therefore, data aggregation with deadline constraint (DCDA) problem is a more valuable problem for real-time applications. MDAT problem is to aggregate all the data to a sink node, but DCDA problem is to discard some data of sensor nodes and aggregate other data to the sink node with deadline constraints. Therefore, the algorithm for MDAT problem is not fit for DCDA problem. This article studies delay-constrained data aggregation scheduling problem in WSNs, which is to schedule the transmission time to maximize the aggregated information collected by the sink node until the deadline.
At present, the research about DCDA problem had considered tree topology WSNs.3–7 In the tree topology WSNs, data routing is determined, and wireless interference only exists between brother nodes. So, data transmission scheduling is not a severe task. It has been proved that the problem of delay-constrained data aggregation in the tree topology WSNs is a P problem. 4 However, in fact the graph topology WSNs are more general. In the graph topology WSNs, the aggregation tree is unknown. Therefore, the scheduling results not only include the transmission time but also the aggregation tree. Moreover, not only brother nodes on the aggregation tree would compete for wireless channel resource but also wireless interference exists between neighbor nodes. Therefore, data aggregation scheduling algorithm in the graph topology WSNs schedules the time slots that sensor nodes send the data without conflicts and simultaneously decides the aggregation routing tree structure.
The contributions of this article can be summarized as follows:
We address the issue of scheduling data aggregation based on a more generic network model, in which the communication topology is graph.
We propose a dynamic programming scheduling algorithm Dynamic Programme for DCDA Problem, which includes creating shortest path tree (SPT), dynamic programming on the aggregation tree, and removing transmissions with tree-outside collisions.
As the scheduling performance depends on the aggregation tree, we propose a greedy algorithm Greedy algorithm for DCDA Problem, which is to simultaneously create the aggregation tree and schedule.
The rest of the article is organized as follows: related works are reviewed and analyzed in section “Related works.” Section “System model” describes the system model. Section “Problem formulation and analysis” shows the definition of DCDA problem. The DPDCDA algorithm and GDCDA algorithm are presented in section “Our data aggregation scheduling algorithms in WSNs.” Section “Simulation experiment and analysis” contains the results and the analysis of an experimental study on data aggregation scheduling with deadline constraints. Finally, in section “Conclusion,” conclusions are given and some future research directions are outlined.
Related works
Until now, data aggregation research mainly focused on two aspects. One was to balance aggregation delay and aggregation benefits.2–14 The other was to balance the accuracy of the aggregated data and aggregation benefits.15–17 The data aggregation research about delay mainly focused on two aspects, one of which was to minimize the maximum delay of all the aggregated data2,11–14 and the other was to maximize the amount of the aggregated data with deadline constraints.3–10
MDAT problem was proposed by X Chen et al.
2
and was proved to be a nondeterministic polynomial time (NP)-complete problem. They designed SDA algorithm and analyzed that its approximate rate was
DCDA problem was proposed by Hariharan and Shroff.
3
Hariharan and Shroff
3
researched the problem of maximum aggregated information collected by sink node with deadline constraints in WSNs with tree topology. Its basic idea was to design a dynamic programming algorithm to schedule transmission time in layers with maximum weight matching in the bipartite graph. The application scene of the problem in the work by Hariharan and colleagues4,6 was similar to the work by Hariharan and Shroff,
3
but the difference was that the amount of time slots every transmission was held was not fixed. Therefore, the scheduled variables were not only the time when every node sent the data but also the number of time slots occupied. This article claused that this problem was strong NP-hard. The optimal goal of Zheng and Shroff
7
was a little bit different from that of Hariharan and Shroff,3,4 which was to maximum a sub-modular utility. This problem was also strong NP-hard. This article proposed an algorithm with constant algorithm ratio under the data routing model. Under the data aggregation model, a bi-criteria approximation can reach
Until now, the research about DCDA problem only considered tree network topology; however, the graph topology is more general. The research about DCDA problem on the graph topology WSNs has not yet been done.
System model
We model a WSN as a graph G = (V, E), where V is the node set, which includes n sensor nodes
On medium access control (MAC) layer, we consider single-hop interference model, in which two links with the same node cannot receive the data or send the data at the same time. When one node is scheduled to send the data in one time slot, all neighbor nodes can listen to this transmission, but only one node will receive the data. Assume the amount of the data that every sensor node sends in one transmission is less than C.
Every sensor node generates the sensor data, receives the data from other nodes, aggregates all received data and its own data, and sends it. Here, the aggregation operations include MIN, MAX, and SUM. We also assume the time of calculating aggregation function can be negligible. In this article, the aggregation and forwarding strategy is as follows: one node receives data from other nodes until this node is scheduled to send the data; at this moment, the node aggregates all received data and self data to one aggregated data and sends aggregated data to another node. It means that every node may receive data at more than one time slot, but it aggregates data and sends data only once. This aggregation strategy is widely used in the work by Hariharan and Shroff.3,4
Problem formulation and analysis
At time slot 0, sensor nodes generate the data. Sink node sends a request of deadline information. Then, sensor nodes are scheduled to aggregate and send the data. We need to schedule whether node i sends data and when it occupies wireless link to send data. Our goal is to make the sink node receive aggregated information as much as possible. Here, we introduce several important definitions to describe this problem.
Let
DCDA problem
Formula (2) is the optimal goal, which is to maximize the aggregated information that sink node s receives until the deadline D. Formulae (3)–(5) mean the range of scheduled variables. Formula (6) restricts that when node i and node
Important notations are summarized in Table 1 for the ease of reference.
Notations.
WSN: wireless sensor network.
Our data aggregation scheduling algorithms in WSNs
DCDA problem needs to decide two aspects, one of which is the receiving node of aggregated data and the other of which is the time that every node sends the data without conflicts. This means that the solution of DCDA problem includes the routing tree and the scheduled transmission time. Therefore, DCDA problem can be decomposed into two subproblems: an aggregation routing tree construction and the scheduling of the transmissions based on the routing tree.
Dynamic programming scheduling algorithm in the given SPT
We consider two steps to solve DCDA problem. At first, we create an aggregation tree. Then, the transmission time of the sensor nodes is scheduled on the given aggregation tree. The direct idea of the routing tree is an Shortest Path Tree (SPT), on which the hops of routing path from every node to the sink node are minimum. Scheduling on the SPT is fit for DCDA problem because the hops of every sensor node away from sink node in the SPT are smallest, which means the occupied time slots data routed sink node are smallest. There are numerous algorithms to get the SPT, such as Floyd–Warshall algorithm. Next, we concentrate on the scheduling transmission time algorithm without the conflicts in the SPT.
First, we analyze when the transmissions exist the conflicts. One case is tree-inside conflict, which is that one node cannot receive the data from two different nodes at the same time, as shown in Figure 1(a). Node x cannot receive the data from node u and node v at the same time; otherwise, interference between node u and node v would make node x cannot receive any data. Another case is tree-outside conflict, which is that when one node sends data to receiving node, another node in the neighbor set of receiving node cannot send data. Figure 1(b) shows that node u is sending the data to node x; if node v in the neighbor set of node x sends the data to node y, then node x cannot receive the data from node v with the interference of node v.

(a) Tree-inside conflict and (b) tree-outside conflict.
To solve tree-inside conflicts, we use a dynamic programming algorithm in the work by Hariharan and Shroff, 3 which can get the scheduling time without conflicts in the tree-model WSN. When this algorithm is implemented on the SPT of graph-model WSN, tree-inside conflicts and tree-outside conflicts exist at the same time. Therefore, we need to remove the transmissions with tree-outside conflicts.
First, we construct a tree-outside conflict graph TOCG. The node in the tree-outside conflict graph is the subset of node set of WSN G. The edges in the tree-outside conflict graph are constructed by the edges that are in network topology G but not in SPT T. Let

Tree-outside conflict graph: (a) network topology, (b) shortest path tree, and (c) tree-outside conflict graph.
Then, we construct TOCG at time slot t, denoted as
Then, we use a greedy algorithm for maximum weight independent set (MWIS) S of the tree-outside conflict graph in time i. Let the set of nodes of which the transmission time is i denote
The pseudocode of DPDCDA algorithm is as follows.
Therefore, the total time complexity of DPDCDA algorithm is
Greedy scheduling algorithm in WSNs
As the scheduling performance mainly depends on the supplied aggregation tree, the approach that is to create a routing tree and to schedule on this tree cannot guarantee the optimal performance. To address this problem, we propose GDCDA algorithm that is based on the feature of simultaneous execution of constructing the aggregation tree and scheduling the transmission time.
The optimal goal of DCDA problem is to maximize the aggregated information of sink node s. In every time slot from 1 to D, some nodes are scheduled to send data. Moreover, the scheduled times of ordered edges from every leaf node to the sink node on the routing tree are strictly increasing. Therefore, we consider scheduled nodes in every time slot from D to 1. First, according to the shortest hops from node i to sink node, network topology is layered. We introduce several sets for running the algorithm. SCHEDULED set stores nodes that have been scheduled, denoted as S. NOT SCHEDULED set is the complement set of SCHEDULED set, denoted as NS. CANDIDATE set stores the candidate nodes which may be scheduled at this time slot, denoted as C. CANDIDATE FATHER set stores the candidate nodes which may be scheduled to receiving nodes at this time slot, denoted as CF. Now, we describe the process of the algorithm. Started from time slot D, one node in the neighbor set of the sink node is scheduled at time slot D and is added to S set. Other nodes in the neighbor set of the sink node and the nodes in the neighbor set of the node scheduled at time slot D not in S are added to C set. Sink node and scheduled node are added to CF set. Then, calculate maximum conflict-free matching in the bipartite graph BG composed by C set and CF set. The nodes in C set of the matchings are added to S set. The nodes in the neighbor set of these nodes are added to C set. The nodes in S of which there exist edges with the nodes in C set compose CF set. The algorithm repeats above operations until time slot 1.
Here, the key point is to calculate maximum conflict-free matching in the bipartite graph BG. There are two kinds of conflicts. As Figure 3 shows, one of which is that the nodes in CANDIDATE set cannot match more than one node in CANDIDATE FATHER set and the other of which is that edges cannot exist between nodes of pairs in the matching. Therefore, we need to calculate maximum matching without these two kinds of conflicts. We transform BG to new bipartite graph

Two kinds of conflicts: (a) tree-inside conflict and (b) tree-outside conflict.
The pseudocode of GDCDA algorithm is as follows.
Simulation experiment and analysis
This article analyzed the performance of DPDCDA and GDCDA algorithms with the naive algorithm through simulation experiments. In our simulation, we randomly deployed sensors into a region of 250 m × 250 m. Figure 4 shows an example of the distributions of the sensor nodes and a sink node. All sensors have the same transmission radius. Sink node is always the node with ID 0. The details of the simulation are listed in Table 2. Since there is no algorithm that can solve DCDA problem in graph-model WSN, we compared the performance of our algorithms DPDCDA and GDCDA with the naive algorithm. The naive algorithm is to schedule data aggregation by the order of node ID with routing tree with deadline constraint. At each simulation point, the simulation is done 20 times, and we took the average of obtained results. We use NS2 as our network simulator.

Example of a WSN.
Default parameters.
MAC: medium access control.
In order to analyze the performance of DPDCDA and GDCDA algorithms, we defined one metric which measures the effectiveness of the algorithm.
Data collection ratio measures the effectiveness of the data aggregation algorithm, which is the proportion to number of sensor nodes collected data with the number of all sensor nodes
where m is the amount of data delivered to sink node and n is the number of sensor nodes.
Table 3 lists the performance values of three algorithms with the default parameters. From Table 3, we can find that the effectiveness of GDCDA algorithm is highest. We also can find the running time of DPDCDA algorithm is longer than other algorithms. Then, we compared the performance of DPDCDA and GDCDA algorithms from data collection ratio with the change of number of sensor nodes, the change of transmission range, and deadline.
Comparison of the performance of three algorithms with the default parameters.
Figure 5 shows the curves of the three algorithms’ data collection ratio versus the transmission ranges, where the number of the sensor node is set at 150, and the deadline is 25. Note that bigger transmission ranges make graphs denser (more edges between nodes). The more the transmission opportunity, the shorter the transmission range. From Figure 5, we concluded the effectiveness of GDCDA algorithm performs better than DPDCDA algorithm and the naive algorithm, particularly when the transmission range is larger. The reason is that GDCDA constructs better routing tree during the process of data aggregation scheduling. The data collection ratio for all algorithms monotonously decreases as the transmission range increases.

Data collection ratio versus transmission ranges.
We examined the effectiveness of DPDCDA algorithm and GDCDA algorithm in Figure 6 when the number of sensor nodes varies, transmission radius is set to 25, and the deadline is set to 35. The figure shows data collection ratio by running three algorithms, while the number of sensor nodes increases. It can be seen that GDCDA algorithm outperforms other algorithms with higher effectiveness. The bigger the number of nodes, the better the improvement in GDCDA in comparison with the naive algorithm. The effectiveness of DPDCDA is less than GDCDA, but obviously higher than the naive algorithm.

Data collection ratio versus number of nodes.
This group of simulations examines the impact of the deadline. Let the number of sensor nodes is 250, transmission radius is 25, with deadline varying from 10 to 55. From Figure 7, the data collection ratio increases with the improvement in the deadline. It can be seen that when the deadline is relatively small, the effectiveness of GDCDA and DPDCDA is obviously higher than the naive algorithm. When deadline constraint is 55, data collection rate of GDCDA and DPDCDA has reached to 100%, but the data collection rate of the naive algorithm is about 70%.

Data collection ratio versus deadline.
Conclusion
We study the issue of data aggregation scheduling with deadline constraint in the WSNs. To the best of our knowledge, this is the first article to do data aggregation scheduling with deadline constraint in the graph topology WSNs. Most existing works concentrate on data aggregation scheduling with deadline constraint problem in the tree topology WSNs, which is a P problem and has been well solved. In fact, the graph topology WSNs are more common. We have formulated this problem as DCDA problem. We proposed DPDCDA algorithm based on dynamic programming. The basic idea is to divide DCDA problem into three steps, which are to create an SPT as the aggregation tree, to do aggregation scheduling based on dynamic programming, and to remove tree-outside conflicts with MWIS in tree-outside conflict graph. This approach uses a tree-based structure as an input for the scheduling algorithm, so it forces the scheduling algorithm to follow an aggregation order imposed by the supplied aggregation tree. To deal with this issue, GDCDA algorithm has been proposed. The key of GDCDA algorithm is to simultaneously execute aggregation tree construction and scheduling. We compare DPDCDA algorithm and GDCDA algorithm with the naive algorithm through simulations, and the results show that GDCDA significantly outperforms the DPDCDA and the naive algorithm.
Future works involve more constraints such as energy and link capacity. These problems are more thorough to model the WSNs in real situations. They are significantly more challenging.
Footnotes
Academic Editor: Sergio Toral
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Fundamental Research Funds for the Central Universities (No. 2572017BB02).
