Abstract
Processing the gathered information efficiently is a key functionality for wireless sensor networks. In generally, the sensor networks often use in-network data aggregation and clustering to optimize network communication. The set of aggregating nodes forms a dominating set of the network graph. Finding the weakly connected dominating set (WCDS) is a promising approach for clustering the WSN. However, finding a minimum WCDS is NP-hard problem for most graphs, and a host of approximation algorithm has been proposed. The aim of the paper is to construct a minimum WCDS as a clustering scheme for WSN. Our clustering schemes construction algorithm includes two phases. First of all, we construct a maximal data aggregation tree (DAT) of the network. The second phase of the algorithm is to choose the nodes (called connectors) to make the WCDS connected. The correctness and performance of our algorithms are confirmed through theoretical analysis and comprehensive simulations.
1. Introduction
A wireless sensor network (WSN) is a multihop wireless communication network. In WSN, each node assumes the role of a router and relays the packets toward the final destinations if a source cannot directly send the packets to a final destination due to the limitation of the radio transmission range. In addition, the energy efficiency is one of the major constraints in WSN. The network topology may also change unpredictably due to node failure, running out of power, or adding new nodes into the network. Most topology changes are localized within a small area of the network. Therefore, it is desirable to abstract the network structure as local changes which need not be seen by the entire network. This is done by using logical substructures called clusters. It is believed that clustering can dramatically improve a network's broadband utilization and delivery ratio, extend network lifetime, and reduce packet retransmission [1]. A natural method for forming clusters is based on the idea of graph domination [2]. The most basic clustering methods that have been studied in ad hoc networks and WSN are based on the dominating sets (DSs). Moreover, among various existing clustering schemes, dominating set-based clustering [3, 4] is a promising approach.
The main advantage of dominating set-based clustering is that it simplifies the clustering process to the one in a smaller subnetwork generated from the connected dominating set (CDS). The efficiency of this approach depends largely on the process of finding and maintaining a CDS and the size of the corresponding sub-network. In addition, the CDS formation algorithm should be localized (i.e., based on local information) for low overhead and fast convergence. The research that works on selecting a minimum CDS has never been interrupted because of its dramatic contributions to wireless networks. Unfortunately, finding a minimum CDS is NP complete for most graphs, even if global information is available and no constraint [5].
In addition, in wireless channels, packets are usually dropped when the channel goes into deep fade and thus an outage occurs. In particular, the outage happens when instantaneous channel capacity falls below the amount of information carried in the packet [6]. Recently, the cooperative communication technique was exploited to study energy management issues for ad hoc and sensor networks [7, 8]. Such as in [7], a network model using cooperative communication is developed to deal with broadcasting in ad hoc networks and WSN. Transmitting independent copies of a packet generates diversity and combats the effects of fading. The selected relay
Motivated by cooperative communication in ad hoc networks and WSN, Alzoubi et al. proposed an algorithm for weakly connected dominating set (WCDS) based on a spanning tree [4]. In this scheme, a maximal independent set (MIS) is elected such that each node in the MIS can be connected to the spanning tree via an extra node. Chen and Liestman [13, 14] proposed a zonal algorithm, in which the graph is divided into regions, a WCDS is constructed for each region, and adjustments are made along the borders of the regions to produce a WCDS for the whole graph. Their algorithm for the partitioning phase is partly based on a minimum spanning tree (MST) algorithm of Gallager et al. [15]. Han and Jia [16] also proposed an area-based distributed algorithm for WCDS construction in ad hoc networks with constant approximation ratio, linear time, and message complexity. While it has a lower message complexity than the zonal algorithm proposed by Chen and Liestman, it outperforms the mentioned algorithm. Basagni et al. [17] presented a performance comparison of the protocols proposed for clustering and backbone formation in large scale ad hoc network. Wu [3] presented two distributed algorithms for finding a WCDS in ad hoc networks. The first algorithm was implemented by first electing a leader among the nodes, which was going to be the root of a spanning tree. The spanning tree is then traversed, and the dominator nodes are selected. But the distributed leader election is extremely expensive in practice and exhibits a very low degree of parallelism. The second algorithm first constructs a maximum independent set (MIS) by an iterative labeling strategy and then modifies the MIS by selecting one intermediate node between each pair of dominators separated by exactly three hops.
At present, the study of WCDS is not more. As mentioned different above, we consider the WCDS as a better method for clustering [4] an ad hoc network and WSN. In this paper, based on the characteristics of communication under the cooperative communication, we extend the dominative capability of nodes in the corresponding network, and we turn the clustering scheme construction problem of a cooperative network into the WCDS problem in the graph model of cooperative communication. A novel algorithm (called DAT-WCDS) to find WCDS for clustering in ad hoc networks and WSN is proposed. And their good performance is confirmed by simulations.
2. Preliminaries and Definitions
2.1. A Network Environment
In this paper, the aim of the proposed algorithm is to form a clustering scheme for the WSN by finding a connected dominating set problem. We consider a monitor area A with N wireless sensors, represented by the set

Network environment.
At first, the goal of the proposed algorithm is to construct the data aggregation tree (DAT) in this N nodes network, where DAT is consisted of
2.2. Connected Dominating Set
For simplicity, we assume a simple and yet general enough model that is widely used in the community. Wireless sensor networks are modeled as unit disk graphs
Analogously, for
A normal transmission range r, using the Euclidean distance
In graph theory, a dominating set (DS) of a graph
A CDS S of a given graph G is a dominating set whose induced subgraph, denoted
Finding the minimum WCDS of the network graph is one of the most investigated methods for cluster formation in which a dominator node assumes the role of a cluster head, and its one-hop neighbors and 2-hop neighbors are assumed to be cluster members. The structure of the network graph can be simplified using WCDS and made more succinct for transmitting in ad hoc networks and WSN [15, 16].
In this paper, we focus on clustering mechanisms to elect a minimum and sufficient number of links to serve as the communication backbone of the network. Accordingly, the clustering approach to topology management can be modeled as the relevant minimum WCDS problem in graph theory.
2.3. Dominating Set Extension
In this subsection, we extend the dominative capabilities of nodes for finding a small WCDS for a WSN. Wu et al. proposed the notion of an extended dominating set (EDS) [12]. A subset S of V is an EDS if every node of V is (a) in the subset, (b) a regular neighbor of a node in S, or (c) 2-hop neighbor of k nodes in S.
Dominative capabilities extension of nodes: each node is extended such that it dominates not only itself and its 1-hop distance neighbors fully, but also its 2-hop distance neighbors partly. For example, in Figure 2, the node dominates not only itself and nodes

In [12], they used a notion of contribution; each forward node contributes 1 to all its 1-hop neighbors, and 1/k to all its 2-hop neighbors. The effective contribution of u to v is u's contribution to v before the signal energy of v reaches 1. The initial signal energy of each node is 0. A node is said to have the maximum effective contribution if it has the maximum total effective contribution to its neighbors and 2-hop neighbors. If we consider the contribution of each forward node as its dominative capability to all its neighbors, thus each forward node can fully dominate its 1-hop neighbors, and partly dominate its 2-hop neighbors. The following definitions will be used throughout the paper.
Definition 1.
For any vertex
Definition 2.
For a vertex v, the 2-hop-independent neighbors of v are
Definition 3.
Let vertex w be called as connector if it is common neighbor between dominators u and v, where v is the 2-hop neighbor of u.
2-hop WCDS is also a CDS. It requires that, for any two nodes with distance equal to 2, there exists at least one shortest path between them, whose intermediate node should be included in 2-hop WCDS. The formal definition is shown in details as follows.
Definition 4.
The 2-hop shortest path weakly connected dominating set problem (2-hop WCDS) is to find a minimum-size node set the induced graph
We do not consider the situation of
Definition 5.
The degree of a node u is denoted by
Definition 6.
The “diameter” X of a set of nodes S in a graph G is the maximum of the pairwise shortest paths between these nodes
When WCDS is constructed, only nodes in WCDS may forward data. In broadcasting [16], nodes in WCDS can help spread data to the whole network. In routing, data will be sent to WCDS and be delivered via nodes in WCDS. Thus, how to construct a WCDS is closely related to the performance of WCDS-based broadcasting and routing. Our approach to establishing a minimal WCDS is based on two phases that implement the data aggregation tree (DAT) and WCDS elections, respectively. We discuss the construction of WCDS in the following sections.
3. Algorithm Description
The aim of the proposed algorithm is to construct a minimum WCDS as a clustering scheme for WSN. We employ a CDS in this paper since it can behave as the virtual backbone of a sensor network. Our clustering schemes construction algorithm includes two phases: DAT construction and then to select connectors to make the MIS nodes connected into a WCDS construction. In the first phase, we construct a maximal DAT of the network. The second phase of the algorithm is to choose the nodes (called connectors) to make the WCDS connected.
3.1. Construction of Data Aggregation Tree
We assume that each node knows the node ID and degree of all its 1-hop neighbors and 2-hop neighbors, this can be achieved through requiring each node to broadcast its node ID initially. After each node knows all its neighbors, it can broadcast its degree, one more round of “Hello” message is needed to construct 2-hop information.
Let the target region be A, and sensor node set in the region be
Dynamic topology has a significant impact on DAT algorithms. Two actions of a node lead to network topology changes: withdrawing and joining. Withdrawing refers to the functional termination of a node in the network, and it happens when a node fails, runs out of power, or exits from the network. Joining refers to the functional start of a node in the network, and it happens when a new node is added, or a node recovers from a failure. Moving of a node can be treated as two separate actions of withdrawing and joining if the node can be assumed to stop receiving and transmitting messages when in motion. To cover a broader range of situations, this paper assumes no special notification sent from the withdrawing or joining node. Relying on such notification, even if possible, imposes high expectation on the ability of nodes. The neighbors of a changing node must rely on other mechanisms to detect the changes.
The changing neighborhood resulted from a node withdrawing or joining affects the generated DAT. Generally, there are two methods to handle it: recalculating and updating. With the recalculating method, a distributed DAT algorithm starts at a fixed interval or is triggered by some event (e.g., when disconnection of the dominating set is detected), and a new DAT is generated from scratch. With the updating method, the DAT is maintained by updating a portion of the existing dominating set according to the topology changes. A practical strategy may use the updating method most of the time and use the recalculating method when necessary. This paper only discusses the updating method.
Let depth of the tree T be p. Algorithm 1 constructs a DAT with given depth. When a node chooses its two children, it will choose the two biggest span nodes, ensuring that the tree T covers more target regions as far as possible. In the process of the multiple regressions, it can achieve the high accuracy. After the DAT is formed, in each subdomain all residual nodes send data to the nearest tree node away from themselves. In this paper, we used the literature [19] design method to construct the aggregation tree. That is, constructing process through three kinds of messages: Beacon, Probe, and Join. Figure 3 describes the process about the exchange of different signals to construct the tree. For more details, see literature [19].
/*W be vertex set in tree T, P be depth of the tree T, K be stage number */ Let D denote the dominating set constructed at stage K and be initially set to null (1) begin (2) (3) While (4) Choose a vertex (5) (6) (7) End while (8) Choose a vertex (9) (10) while (11) Let v be the parent of u; (12) (13) partner (14) end (15) while (16) Choose a child v of u such that (17) (18) end (19) Let I be a maximal independent set of T with (20) Increment stage number k (21) Until depth of the tree T is greater than P or the stage number k is threshold K (22) end
Algorithm 1

Exchange of message to construct the DAT.
After given data aggregation tree (DAT), a data communication operation consists of (possibly repeated) two phases: a propagation phase where the query demands are pushed down into the sensor network along the tree, and an transmission phase where the aggregated values are propagated up from the children to their parents.
3.2. Clustering Formation
In this section, a data aggregation tree-based algorithm (called DAT-WCDS) is proposed for clustering formation in WSN, which focuses on finding a WCDS problem in the network graph. In the algorithm, a special dominating set using a MIS of the network is constructed, and then a CDS is constructed to connect dominators and the other nodes.
Given T be a DAT and D is a dominating set of T containing. It suffices to determine an independent set J of vertices which is disjoint from D and contains a neighbor of every vertex in D, because a maximal independent set I which contains J but is disjoint from D is clearly a dominating set of
If this strategy succeeds, then the selected vertices will clearly form an independent set. Nevertheless, this strategy fails in the presence of vertices u in D all children of which are also in D. For such a vertex, we have to choose its parent. Working out the consequences of this reasoning leads to Algorithm 1 in the following sections.
We will hope that there are some dominator(s) and some dominatee(s) in maximal independent set of each layer of DAT. Here a connector node x (a dominatee of a dominator u) is said to be redundant for the dominator u if removing x will not disconnect any of the 2-hop dominators of u from u. For every dominatee, it has at least one-dominator neighbor in the same or upper level. Thus, every dominator (except the root) has at least one dominator in the upper level within 2 hops. Using this property, we can ensure that all the data in the dominators can reach the root finally if every dominator transmits its data to some dominator in upper level within two hops. From another point of view, considering dominators in the decreasing order of their levels, a dominator u in level L aggregates data from all dominators in level
Input: The DAT tree with root (1) Let T be the final data aggregation tree. (2) Initially all independent set nodes form different components, each node in I broadcasts dominatees message so that dominatees can know of adjacent independent set nodes in different components. (3) for (4) while a dominatee node v exists having i-adjacent independent nodes of I in different components do (5) Choose all dominators, denoted as (6) For every dominator (7) Node (8) Node lower level (9) Dominatees or (10) Each level select among them the node with highest bank as its connector and informs it. (11) Node (12) Every node w in (13) Every node z that is a parent of some nodes in parent of z in T). (14) End for (15) (16) End for /* The identified DAT nodes connect the dominator nodes. Thus, independent set nodes and DAT nodes forms the CDS of (17) The root
Algorithm 2
In Algorithm 2, we only concentrate on communications between dominators. Since dominators cannot communicate directly, we have to rely on some dominates (NT node), each of which acts as a bridge between two dominators. The algorithm runs from lower level to upper level in DAT, every dominator will remain silent until the level where it locates begins running. When it is its turn, the dominator will try to gather all the data from other dominators in lower levels that have not been aggregated. If a dominator's data have been collected before, then it is unnecessary to be collected again. After the end of the second phase, the algorithm has identified MIS and the connectors. Iteratively, the dominator nodes are picked which connects independent set nodes in different components. The following phases are performed to establish and form clusters
Initially, the sink creates an empty cluster associated with an unclustered node of S. Each sensor
Algorithm 3 is executed by the sink once upon deployment, and thus all nodes will become clustered. If a node joins to the network, it has to send its position (
Input: The DAT tree with has identified I and the connectors; /* Let neighbors of (1) begin (2) Initially, all nodes are unmarked. (3) u sends out CH message with the information of though CM message to dominator relative proximity as rank. u ranks each node v in (4) u node with the highest rank among its unmarked 2-hop neighbors becomes a cluster-head and broadcasts CH messages to all its neighbors; (5) After receiving a CH message, for a node v, v sends a message so that all available nodes in its dominates; (6) Each w node in turn sends another message to its 2-hop neighbors, making the available nodes as potential dominators; (7) For all (8) v becomes a cluster-member if it is a 1-hop neighbor of node u, and its current state is unmarked. If it is the first time that v receives a CH message, v will broadcast CM messages to all its neighbors; (9) If v is a 2-hop neighbor of node u, its current state is unmarked, and it is the first time that v receives a CH message, v becomes half-dominated; (10) If node v is a 2-hop neighbor of node u and its current state is half-dominated, v becomes a cluster-member if v receives a different CH message for the second time. V will broadcast CM messages to all its neighbors; (11) If node v is a 2-hop neighbor of node u and its current state is half-dominated, v becomes a cluster-head if v does not receive a different CH message again. v will broadcast CH messages to all its neighbors; (12) Dominator node to become dominator in DAT tree and sends out CM message to the dominator node; (13) Connectors CM message to its independent dominators; (14) Switching from (15) The same procedure is repeated among the remaining nodes, until each node in DAT becomes either a cluster-head or cluster-member; (16) End for (17) End
Algorithm 3
When a node dies, the first member will notify the rest of the members about the new cluster set and will reconfigure any parameter related to the cluster. The first member also periodically notifies to its cluster members about its availability. If a first-member dies, the cluster members will notify to the sink their availability to belong to another cluster or to create a new cluster. Note that the beaconing among cluster members implies low overhead since cluster sizes have few nodes.
4. Analysis of Algorithm
In the next subsections we first analyze the correctness of the algorithm and then analyze its complexity for running time and messages exchanged of the algorithm.
4.1. Correctness of the Algorithm
Theorem 7.
The output of the proposed Algorithm 1 is a maximal independent set.
Proof.
By contradiction, we consider the first execution of the while-loop in line 11 for which the vertex u has no parent which does not belong to D; that is, either u is the root x of T or the parent of u belongs to D.
Let
Let
Let
Altogether, we obtain that
Theorem 8.
After the above two phases, the constructed DS is a WCDS of the whole graph.
After the first phase, Algorithm 1 constructs a DAT with the given depth. When a node chooses its two children, it will choose the two biggest span nodes, ensuring that the tree T covers more target regions as far as possible. It is possible that there exist two dominators that are apart by at least 2 hops in the graph. However, these dominators are apart by at most 3 hops. According to the definition of WCDS, we know that the IDS constructed in the second phase is a WCDS. Although the second phase reduces the size of dominators, the connectivity is not destroyed. Therefore, after the two phases, the constructed IDS is a WCDS of the whole graph.Proof.
4.2. Complexity Analysis
Theorem 9.
The algorithm DAT-WCDS has time complexity
Proof.
Assume that in a given unit disk the size of an MIS is always less than maximum degree of a node in G; therefore,
While establishing the relationship between connectors and dominators, the message complexity is only size of CDS which is at most
5. Simulation and Discussion
In simulations, all algorithms in discussion are implemented by using MATLAB, and all nodes are randomly deployed in a square area A. Every node uses a radio range
Values of simulation parameters used.
5.1. Node Density
Node density determines how many neighbors a node can have. With a higher node density, a node has more neighbors to compete with to become a dominator. But after a node becomes a dominator, all its neighbors are covered as NT nodes. Usually, a node that can cover more neighbor nodes has a greater chance to become a dominator because of its greater degree. Thus, a new dominator will try to cover a new area of the network by given a connected network. Therefore, if the algorithm is well designed, the CDS size should be mainly determined by the network size and has less to do with the node density. Figures 4 and 5 show that DAT-WCDS generates CDS of almost the same size and the same diameter in networks with various node densities. But it takes longer time for the algorithm to converge in high-density networks (Figure 6).

Impact of node density on CDS size.

Impact of node density on CDS diameter.

Impact of node density on convergence time.
5.2. Size of WCDS
Figure 4 shows the results when the node's transmission range is set as 30 units and the number of nodes in the networks ranges from 20 to 160. When the transmission range increases, as more nodes may be connected, the network becomes denser. In this case, the size of WCDS only increases slightly as the size of the network increases. When the number of nodes in the network reaches 160, the number of nodes in the WCDS constructed by the DAT-WCDS algorithm is only about 31% of that constructed. The reason why our algorithm always outperforms is that for each pair of 2 hops is away cluster-heads adds one additional node to the WCDS, whereas our algorithm only “weakly connects” 2 hops away cluster-heads in different areas. We find that increasing the node's transmission range can increase the coverage area of each node, and therefore, increasing the density of the network, which leads to a smaller size of the WCDS.
5.3. Comparison with Other Algorithms
The DAT-WCDS algorithm is compared with two multiple-phase CDS algorithms: ZS [20] and KM [21]. However, KM does not generate smallest size CDS, but it converges fast. Therefore, KM here serves as a good comparison candidate as we will show various aspects of algorithms at different performance levels.
Figure 7 shows that, in terms of CDS size, DAT-WCDS performs better than KM and ZS. The connected dominating sets built by KM have smaller diameters in large networks (Figure 8), but the tradeoff is much greater dominator population. DAT-WCDS converges much faster than ZS, as illustrated in Figure 9. The DAT-WCDS algorithm always converges in no more than 11 rounds for a wide range of network sizes in our simulations. Here, each round of ZS is the time for generating a new layer of dominators. The convergence time of ZS is mainly affected by the network size and the node radio range.

Comparison with other algorithms.

Comparison with other algorithms.

Comparison with other algorithms.
6. Conclusion
In this paper, we extend the dominative capabilities of nodes, and a data aggregation tree-based algorithm called DAT-WCDS is proposed for clustering formation in WSN, which focuses on finding a WCDS problem in the network graph. Our clustering schemes construction algorithm includes two phases: DAT is constructed and a special dominating set using a MIS of the network is constructed, then selecting connectors to make the MIS nodes connected into a WCDS construction. The correctness and performance of our algorithms are confirmed through theoretical analysis and comprehensive simulations.
Footnotes
Acknowledgments
This work is supported by National Natural Science Foundation of China under Grants nos. 61070169 and 61170021, Natural Science Foundation of Jiangsu Province under Grant no. BK2011376 Specialized Research Foundation for the Doctoral Program of Higher Education of China no. 20103201110018, and Application Foundation Research of Suzhou of China no. SYG201118 and sponsored by Qing Lan Project.
