Routing-Aware Clustering Algorithms for Two-Tiered Sensor Networks

Abstract

In hierarchical two-tiered sensor networks, higher-powered relay nodes can be used as cluster heads for designing scalable sensor networks. It has been shown that, in such networks, the assignment of sensor nodes to clusters plays an important role in determining the lifetime of the network. In this paper, we have proposed two routing-aware, distributed algorithms for assigning sensor nodes to clusters in two-tiered networks. The first heuristic assumes that all relay nodes, acting as cluster heads, send their data directly to the base station. The second heuristic relaxes this assumption and is to be used with any network where each relay node uses a multihop route to send its data to the base station. Unlike conventional clustering algorithms, our approaches take into consideration the routing scheme used by the relay nodes, and attempt to balance the energy dissipation of the nodes. We have compared the results of our distributed approaches with the optimal solutions obtained using an integer linear program (ILP) formulation, as well as existing techniques, based on heuristics. The results indicate that our approaches, on average, can produce results that are close to the optimal solutions and consistently outperform existing heuristics.

1. Introduction

Sensor nodes are tiny, low-powered, and multifunctional devices operated by lightweight batteries. A network of sensor nodes performs tasks through the collaborative efforts of a large number (hundreds or even thousands) of sensor nodes that are densely deployed within the sensing area [1]. Data from each sensor node are gathered at a central entity, often called the base station [1, 2]. Replacing or recharging the batteries of sensor nodes is usually not feasible, so that a sensor network becomes nonfunctional when the batteries in a “sufficient” number of nodes are depleted. The lifetime of a sensor network is a measure of the duration, from the time the network is deployed, to the time it becomes nonfunctional. The limited transmission range and the battery power of sensor nodes affect the scalability and the lifetime of sensor networks. As a consequence, energy conservation in a sensor network to maximize its lifetime is a major research topic.

Recently, some special nodes, known as relay nodes (also called gateway nodes [3] and application nodes [4]), have been used to design sensor networks with increased lifetime. The sensor nodes in such a network are organized in clusters. Each cluster is assigned a relay node acting as its cluster head, defining a two-tier hierarchical sensor network [4, 5], with the sensor (relay) nodes defining the lower (higher) tier. The sensor nodes belonging to a cluster in a two-tier network send their data directly to the relay node acting as the cluster-head for that cluster. Each relay node collects the data it receives from its own cluster and transmits that data, either directly to the base station using the single-hop data transmission model (SHDTM), or forwards the data towards the base station using a multi-hop path, using the multi-hop data transmission model (MHDTM). In the case of multihop routing, in addition to the data gathered from its own cluster, each relay node also relays any data it receives from its neighboring relay nodes.

We have assumed the periodic [6] model for data reporting/gathering, so that data are collected and forwarded to the base station periodically, following a predefined schedule. Each period of data gathering is referred to as a round [7, 8]. In each round of data gathering, each relay node gathers the data it receives from its own cluster and transmits that data, either directly to the base station (i.e., using SHDTM), or forwards the data towards the base station using a multihop path (i.e., using MHDTM).

Each relay node is typically provisioned with higher energy, compared to the sensor nodes but, just like sensor nodes, are also battery operated, so that both relay nodes and sensor nodes are power constrained. It is possible that some relay nodes dissipate energy at higher rates, due to more load and/or longer transmission distances, compared to other relay nodes. This uneven energy dissipation among the relay nodes may lead to the faster “death” of some relay nodes in the network (i.e., the power supplies of these relay nodes are depleted to such an extent that the nodes are no longer functional), assuming that the initial energy provisioning for all relay nodes is equal. This may affect the functionality of the sensor networks, as the inoperative relay node(s) will not be able to perform their assigned functions. This may even cause the network to lose its usefulness, even though many other relay nodes in the network still retain power. Further, if MHDTM is used and the multihop routing strategy is not changed when a relay node becomes faulty, all data received by the failed relay node is “lost”, since the data is not forwarded to the base station. The loss of power of a relay node can, therefore, impact the functionality of the network much more severely than the depletion of the battery of a single sensor node.

In hierarchical networks, the lifetime is determined by the time interval from the deployment of the network to the time a critical number of relay nodes die. Researchers have proposed different ways to define these critical relay nodes, depending on their design objectives [4, 9–12]. In this paper we have measured the lifetime of a network, following the N-of-N [4] metric, by the number of rounds the network operates from the start, until the first relay node depletes its energy completely and ceases to function. We note that other metrics can be used as well, with straight-forward modifications to the algorithms.

The allocation of the sensor nodes to clusters in a hierarchical network is decided by the clustering scheme used. A proper clustering scheme can play an important role in balancing the load on different relay nodes [13] and, hence, significantly improving the lifetime of the network.

It has been shown that a proper clustering scheme can significantly affect the lifetime of the relay node network by balancing the load on different relay nodes [14]. The effectiveness of a clustering scheme depends on a number of factors such as the distribution of the sensor nodes, the number, and the locations of the relay nodes and the specific routing strategy used.

In this paper, we have proposed two distributed algorithms for load-balanced clustering to maximize the lifetime of a two-tier network. The first (second) algorithm is applicable for networks using the SHDTM (MHDTM). Some authors have characterized this type of algorithms as localized algorithm [15]—a specialized class of distributed algorithms. Both approaches assume that the relay node routing scheme for a given network is known beforehand, that is, our proposed approaches are “routing-aware.” Our objective, following [3, 16], is to form appropriate clusters of sensor nodes with the relay nodes acting as cluster heads. We take into account the cardinality of each cluster, the routing scheme, and the energy dissipation for transmitting/receiving data. We have directly maximized the network lifetime, rather than optimize a secondary objective such as the variance in the cardinalities. This makes our algorithms much more effective, compared to existing load-balanced clustering techniques. We have used a top-down approach, where we started with an initial clustering scheme. In successive iterations, our scheme gives an improved clustering scheme, until the algorithm terminates. A preliminary version of this work was presented in [17].

The main features of each of our algorithms are as follows. (i)

It operates on local information only and can quickly generate efficient solutions for networks with hundreds of sensor nodes, with little communication overhead.

(ii)

It produces a valid clustering in each iteration. Therefore, convergence is not a problem, since the algorithm can always be terminated after a specified number of iterations.

(iii)

In each iteration, it is guaranteed that the overall network lifetime will never decrease, compared to the lifetime obtained in the previous iteration.

(iv)

Our algorithm consistently outperforms existing clustering heuristics available in the literature [14, 16].

(v)

The solutions obtained using our algorithm are comparable to optimal solutions generated using a centralized integer-linear-program- (ILP-) based approach [16].

The remainder of the paper is organized as follows. In Section 2, we briefly review some relevant background material. In Section 3, we present our algorithms. We discuss and analyze the experimental results in Section 4 and conclude with a critical summary in Section 5.

2. Review

Clustering of nodes in a sensor network is a well-researched field [2, 3, 16, 18–20]. The problem of clustering is illustrated in Figure 1, where the sensor nodes in the shaded region can be assigned to any of the clusters with A, B, or C as their cluster heads. Depending on the routing scheme and the energy dissipation of relay nodes A, B, and C, one assignment may be more advantageous than the others. The goal of a clustering algorithm is to assign each sensor node to an appropriate cluster in a way that extends the lifetime of the network.

Figure 1

Sensor nodes in overlapping coverage area.

In [3], G. Gupta and M. Younis investigated the problem of forming clusters around a few high-energy gateway nodes. The authors defined “cardinality” of a cluster as the number of sensor nodes associated with the cluster and provided a heuristic that attempts to minimize the variance of the cardinality of each cluster in the system. The idea was to distribute the sensor nodes as evenly as possible, over all the clusters. They have shown [14] that suitable clustering techniques can be used to increase the system lifetime. The clustering problem has also been addressed in [16], where an optimal solution is obtained using a centralized approach employing ILP.

Clustering techniques proposed in the literature [3] typically use simple heuristics to balance the amount of data that each relay node is required to forward towards the base station. However, this clustering approach, which measures the “load” on a relay node in terms of the cardinality of each cluster, defined as the number of sensor nodes associated with the cluster, fails to take into account the specific routing strategy used. For example, in the SHDTM model, where each relay node transmits directly to the base station, it may be more effective to assign fewer sensor nodes to clusters which are further away from the base station, rather than to distribute the sensor nodes equally among the different clusters. Existing heuristics cannot guarantee optimality in terms of directly extending the lifetime of the network, which is the primary objective of load-balanced clustering. Centralized approaches, based on integer linear program (ILP) formulations, that guarantee optimality (in terms of maximum possible lifetime), have been proposed [16]. Since autonomous systems are not under the control of a single, centralized agent, an ILP-based formulation is not appropriate.

In a hierarchical sensor network, if the N-of-N metric is used [4], assuming equal initial energy provisioning in each relay node, the lifetime of the network is defined by the ratio of the initial energy to the maximum energy dissipated by any relay node in a round, so that $N_{lifetime} = ⌊ E_{initial} / F_{\max} ⌋$ , where $N_{lifetime}$ denotes the lifetime of the network in terms of rounds, $E_{initial}$ denotes the initial energy of each relay node, and $F_{\max}$ is the maximum energy dissipated by any relay node in a round. In such a model, it is easy to see that maximizing the lifetime following the N-of-N metric is equivalent to minimizing the maximum energy dissipated by any relay node in a round.

The transmission power dissipated by a sender node to transmit each bit of data to a receiver node increases rapidly with the increase of the distance between the sender and the receiver [1–3]. We have assumed that the energy dissipation for communication is based on the first-order radio model [2], where the energy required to transmit (receive) b bits, at a distance d, is given by $E_{T_{x}} (b, d) = α_{1} b + β b d^{q}$ ( $E_{R_{x}} (b) = α_{2} b$ ), where $α_{1}$ ( $α_{2})$ is the energy coefficient for transmitter (receiver), β is the energy coefficient for the transmit amplifier and q is the path-loss exponent.

We assume that all nodes are stationary after deployment and that the relay nodes are aware of the location of the base station. In the case of multihop routing, we assume that the routing scheme has been determined beforehand, so that each relay node knows the predecessor nodes (if they exist) and the successor node. We have assumed periodic data gathering, where each sensor node measures certain physical attribute(s) from the environment (e.g., temperature, pressure, humidity, pollution level) at predetermined intervals, and communicates data periodically [6]. Therefore, the average amount of data to be communicated by each sensor node in a round is known. We further assume that the placement strategy applied during the deployment phase of the network ensures that each sensor node is able to send its data to at least one relay node.

3. Distributed Algorithms for Load Balanced Clustering

In this section, we present two heuristics for distributed-load balanced clustering. The first heuristic assumes that each relay node sends its data directly to the base station using SHDTM. In the second heuristic, the relay nodes form a network, and, in general, each relay node uses MHDTM to send its data to the base station. Both heuristics use a top-down approach, where the relay nodes first form an initial feasible clustering. At each successive iteration, some sensor nodes are reassigned, from their respective clusters to other clusters so that (i)

each iteration results in a valid set of clusters, and

(ii)

the lifetime of the network never decreases as a result of any reassignment.

3.1. Network Model

In a distributed environment, identifying the node dissipating the maximum energy is a time-consuming task. To reduce the energy needed, as well as the set-up time required when the clusters are being formed, each relay node takes decisions about clustering, after taking into account only the situation in the “neighboring” relay nodes. Our experiments, discussed later, show that such decisions, based on local information only, still give excellent results.

For convenience, we assign each node (sensor or relay node) a unique ID number. In our model, $𝒮 = ⋃ 𝒞^{j}$ , where the union is taken over all $j \in ℛ$ and $𝒞^{j} \cap 𝒞^{k} = \emptyset$ , $\forall j, k \in ℛ, j \neq k$ . We note that for SHDTM, $suc c_{j} = B, \forall j$ , where B denotes the base station, and for MHDTM, $k \in 𝒫^{j}$ , if and only if $suc c_{k} = j$ . If $j_{p}, 0 \leq p \leq h - 1$ , is the relay node in the multi-hop path $j = j_{0} \to j_{1} \to j_{2} \to \dots \to j_{h - 1} \to j_{h} = B$ from j to B that requires the maximum-energy of all nodes in the path from j to B, then $𝔼_{j}^{c} = E_{j_{p}}^{M}$ . We also note that for MHDTM, if a sensor node is added to cluster $𝒞^{j}$ , and j is using the route $j = j_{0} \to j_{1} \to j_{2} \to \dots \to j_{h - 1} \to j_{h} = B$ , the energies of the relay nodes $(j, j_{1}, j_{2}, \dots, j_{h - 1})$ become $(E_{j} + ξ_{j}, E_{j_{1}} + ξ_{j_{1}}, E_{j_{2}} + ξ_{j_{2}}, \dots, E_{j_{h - 1}} + ξ_{j_{h - 1}})$ respectively. Finally, we point out that $n_{j}$ includes all sensor nodes from $C^{j}$ and the sensor nodes whose data is sent to j from all relay nodes in $𝒫^{j}$ , that is, $n_{j} = | 𝒞^{j} | + \sum_{k \in 𝒫^{j}} n_{k}$ .

3.2. Definitions and Notations

In our distributed algorithms, we will use the following definitions and notations. (i)

A sensor node s is covered by a relay node j, if and only if s can communicate directly with j, that is, j is within the transmission range of s.

(ii)

A sensor node $s \in 𝒮$ is essential to the cluster $𝒞^{j}$ of a relay node $j \in ℛ$ if s can communicate with only relay node j. Obviously, s is essential to j implies that s is covered by j.

(iii)

The transmission range of a sensor node is r.

(iv)

If the distance between relay nodes j and k do not exceed $2 r$ , nodes $j, k \in ℛ$ are considered to be neighbors. This ensures that a sensor node can communicate with both relay nodes j and k only if j and k are neighbors. The set of neighboring relay nodes of j is denoted by $𝒩_{j}$ .

(v)

For all j, $suc c_{j}$ denotes the node immediately following j in the path from j to the base station. In other words, if a relay node j transmits its data to node k, then $suc c_{j} = k$ . In SHDTM, $suc c_{j} = B, \forall j$ , where B denotes the base station.

(vi)

For MHDTM, $𝒫^{j}$ denotes the set of relay nodes which send their data directly to relay node j. Clearly, $k \in 𝒫^{j}$ , if and only if $suc c_{k} = j$ .

(vii)

When describing MHDTM, $E_{j}^{M}$ denotes the total energy currently required by relay node j for sending the data (generated in $𝒞^{j}$ as well as the data j received from all nodes $k \in 𝒫^{j}$ ) to $suc c_{j}$ .

(viii)

When describing SHDTM, $E_{j}^{S}$ denotes the total energy dissipation of relay node j per round. The value of $E_{j}^{S}$ depends only on its current cluster $𝒞^{j}$ .

(ix)

Let $j_{p}, 0 \leq p \leq h - 1$ , be the relay node in the multihop path $j = j_{0} \to j_{1} \to j_{2} \to \dots \to j_{h - 1} \to j_{h} = B$ from j to B that requires the maximum energy of all nodes in the path from j to B. $𝔼_{j}^{c}$ denotes the maximum energy of all nodes in the path from j to B (i.e., $𝔼_{j}^{c} = E_{j_{p}}^{M}$ ).

(x)

$ξ_{j}$ denotes the amount of energy per sensor node that relay node j dissipates, to receive information corresponding to one sensor node and then transmit it to $s u c c_{j}$ . We note that for MHDTM, if a sensor node is added to cluster $𝒞^{j}$ , and j is using the route $j = j_{0} \to j_{1} \to j_{2} \to \dots \to j_{h - 1} \to j_{h} = B$ , the energies of the relay nodes $(j, j_{1}, j_{2}, \dots, j_{h - 1})$ become $(E_{j} + ξ_{j}, E_{j_{1}} + ξ_{j_{1}}, E_{j_{2}} + ξ_{j_{2}}, \dots, E_{j_{h - 1}} + ξ_{j_{h - 1}})$ , respectively.

(xi)

$ξ_{j}^{c}$ denotes the energy required for receiving and transmitting the data corresponding to a single sensor node by the most heavily loaded relay node in the path $j \to j_{1} \to j_{2} \to \dots \to B$ from j to B.

(xii)

A sensor node s is a favored node of j if

(a)

s is not essential to j,

(b)

s is covered by j, and

(c)

$\forall k \in 𝒩_{j}$ , if k covers s, then $ξ_{j} < ξ_{k}$ , where $𝒩_{j}$ is the set of neighboring relay nodes of j.

Two relay nodes j and k may cover a sensor node and have $ξ_{j} = ξ_{k}$ . In such a situation, it is easy to break the “tie” arbitrarily (e.g., by node ID). Using this definition, it is clear that every sensor node in this heuristic is either an essential node or a favored node of exactly one relay node.

(xiii)

A relay node j is saturated, if and only if the cluster $𝒞^{j}$ contains all sensor nodes that are covered by j.

(xiv)

ℂ denotes the set of relay nodes that can accept, from more heavily loaded relay nodes, additional sensor nodes in any given iteration.

(xv)

For SHDTM, relay node $j \in ℂ$ if and only if (a) j is not a saturated node and (b) $E_{j}^{S} \leq E_{k}^{S}, \forall k \in 𝒩_{j}$ . Again, a “tie” can be broken arbitrarily (e.g., by node ID).

(xvi)

For MHDTM, relay node $j \in ℂ$ if and only if

(1)

j is not a saturated node, and

(2)

$𝔼_{j}^{c} < 𝔼_{k}^{c}, \forall k \in 𝒩_{j} - {s u c c_{j}}$ . Again, a “tie” can be broken arbitrarily (e.g., by node ID).

(xvii)

$n_{j}$ is the total number of sensor nodes whose data is currently transmitted by relay node j is to $suc c_{j}$ . This includes all sensor nodes from $C^{j}$ and the sensor nodes whose data is sent to j from all relay nodes in $𝒫^{j}$ , that is, $n_{j} = | 𝒞^{j} | + \sum_{k \in 𝒫^{j}} n_{k}$ .

(xviii)

accepted_list is the list of sensor nodes sent by $suc c_{j}$ to a relay node j, specifying all sensor nodes that can be added to the cluster of node j and its “up-stream” relay nodes.

(xix)

$T h_{j}$ is the energy threshold for node j, which cannot be exceeded when j is accepting additional sensor nodes into its cluster $𝒞^{j}$ .

A summary of the notations used to describe the algorithms is given in Table 1 for convenience.

Table 1

Notation.

𝒮	Set of sensor nodes
ℛ	Set of relay nodes
B	Base station
r	Transmission range of a sensor node
$𝒞^{j}$	Set of sensor nodes belonging to the $j^{}$ th cluster with relay node j as the cluster head
ℂ	Set of collector nodes in any given iteration
$suc c_{j}$	The node immediately following j in the path from j to the base station
$ξ_{j}$	Amount of energy per sensor node that relay node j dissipates, to receive information corresponding to one sensor node and then transmit it to $suc c_{j}$
$E_{j}^{S}$	Energy dissipation of relay node j (per round) in SHDTM
$𝒩_{j}$	Set of neighboring relay nodes of j
$𝒫^{j}$	Set of those relay nodes, which send their data directly to relay node j
$E_{j}^{M}$	Energy currently required by relay node j for sending the data to $suc c_{j}$
$𝔼_{j}^{c}$	Maximum energy of all nodes in the path from j to B in MHDTM
$ξ_{j}^{c}$	Energy required for receiving and transmitting the data corresponding to a single sensor node by the most heavily loaded relay node in the path j to B
$n_{j}$	Number of sensor nodes whose data is currently transmitted by relay node j to $suc c_{j}$
$T h_{j}$	Energy threshold for node j, which cannot be exceeded when j is accepting additional sensor nodes into its cluster $𝒞^{j}$

3.3. Distributed Clustering for Single-Hop Routing (DCSR)

The DCSR heuristic uses a top-down, iterative approach for clustering. We first obtain an initial assignment of sensor nodes to clusters. This can be done using any existing clustering technique, such as greedy clustering (GC) or least-distance clustering (LDC) [14, 16]. Based on this initial clustering, we calculate an initial value for the network lifetime, which provides a lower bound for the overall network lifetime. In each successive iteration, we form a new set of clusters, by reassigning one or more sensors from relay nodes having high energy dissipation, to other nodes with lower energy dissipation. The new sensor assignment is such that the overall network lifetime is never lower than that of the previous iteration. This process terminates when the specified stopping criteria (e.g., specified number of iterations completed or change in lifetime over a period of time is below a given threshold) have been met. The DCSR algorithm (Algorithm 1) has three phases, as described below.

Algorithm 1: Distributed clustering for single-hop routing (DCSR).

(1) ∖∖ PHASE 1: Sensor Identification

(2) Sensor nodes broadcast “hello” message.

(3) Each relay node discovers the sensor nodes it can cover and exchanges this

information with its neighbors.

(4) Each relay node also broadcasts its own distance from the base station to its neighbors

(5) ∖∖ PHASE 2: Cluster Initialization

(6) Each relay node $j \in ℛ$ initializes its own cluster by including the “essential”

and “favored” sensor nodes.

(7) Each relay j node broadcasts its current values of $𝒞^{j}$ to it neighbors.

(8) $∖ ∖$ PHASE 3: Iterative Cluster Refinement

(9) while stopping criteria not met do

(10) Each relay node j checks to see if $j \in ℂ$ in its neighborhood $𝒩_{j}$ .

(11) for $\forall y \in ℂ$ do

(12) y accepts sensor node(s) from one or more neighboring clusters into $𝒞^{y}$ .

(13) y broadcasts updated value of $𝒞^{y}$ to all other relay nodes $k \in 𝒩_{y}$ .

(14) end for

(15) for $\forall k \in ℛ, k \notin ℂ$ do

(16) k updates $𝒞^{k}$ based on messages from neighboring collector nodes and

broadcasts this information to its neighborhood.

(17) end for

(18) end while

PHASE 1: Sensor Identification

During the sensor identification phase (lines 1–4 of Algorithm 1), each relay node $j \in ℛ$ becomes aware of all sensor nodes covered by j, as well as the nodes covered by its neighboring relay nodes. First, each sensor node $s \in 𝒮$ broadcasts a “hello” message. When a relay node j receives a “hello” message from a sensor node s, it identifies s as one of the nodes it covers. It then communicates this information to other relay nodes in its neighborhood. So, at the end of the sensor identification phase, each relay node becomes aware of the sensor nodes it covers and the sensor nodes covered by each of its neighbors. Each relay node j also broadcasts, to its neighbors, the value of $ξ_{j}$ . This information is used by a relay node to estimate the energy dissipation of neighboring relay nodes, for a given cluster configuration.

PHASE 2: Cluster Initialization

Once each relay node is aware of the set of sensor nodes covered by itself and its neighbors, the cluster initialization phase (lines 5–7 of Algorithm 1) begins. During this phase, each relay node $j \in ℛ$ selects which sensor nodes to include in its initial cluster. Any simple assignment scheme can be used for forming the initial clusters. We constructed the initial clusters by assigning $s \in 𝒞^{j}$ , if and only if (i)

s is an essential node for cluster $𝒞^{j}$ or

(ii)

s is a favored node of j.

Since each sensor node s is either an essential node or a favored node of exactly one relay node j, this approach ensures that a sensor node is always assigned to exactly one cluster, and hence guarantees a feasible initial clustering. Each relay node then broadcasts, to its neighborhood, its current cluster

𝒞^{j}

PHASE 3: Iterative Cluster Refinement

The third and final phase (lines 8–18 of Algorithm 1) is an iterative process, where we modify the current clustering scheme (either the initial clusters or the clusters obtained during the previous iteration) to generate a set of new clusters, which improve (or at least maintain) the overall network lifetime. In each iteration, after a relay node j has received the current cluster information from all its neighbors, it determines whether $j \in ℂ$ . If so, it attempts to take sensor nodes from “heavily loaded” neighboring clusters and add these nodes to its own cluster. During this process, it gives preference to accepting sensor nodes from neighboring nodes, which currently have higher energy dissipations. Since $j \in ℂ$ implies that j is not a saturated node, it can move at least one sensor node from a neighboring cluster into its own cluster. In any given iteration, the number of sensor nodes that j accepts into its cluster $𝒞^{j}$ is based on a threshold $T h_{j}$ , which is always less than the maximum value of $E_{k}^{S}, k \in 𝒩_{j}$ that node j is aware of in the current iteration. Each node j accepting additional sensor nodes into its cluster ensures that its new energy dissipation rate $E_{j}^{S} < T h_{j}$ . This guarantees that the energy dissipation of the most heavily loaded node in the network will never increase as a result of reassignment of sensors from one cluster to another. Consequently, the network lifetime is guaranteed to never decrease with each iteration. After accepting an appropriate number of sensor nodes, each relay node $j \in ℂ$ broadcasts its updated cluster to its neighborhood $𝒩_{j}$ . The neighboring relay nodes receiving this information update their own cluster information and send that information to their respective neighbors.

3.4. Distributed Clustering for Multi-Hop Routing (DCMR)

In order to handle arbitrary multi-hop routing schemes, we need a more generalized form of our initial DCSR algorithm. In the case of multi-hop networks, relay node j, in general, sends data to the base station B through a multi-hop path $j = j_{0} \to j_{1} \to j_{2} \to \dots \to j_{h - 1} \to j_{h} = B$ of length h, using intermediate relay nodes $j_{1}, j_{2}, \dots, j_{h - 1}$ . A relay node transmits data either to another relay node (which is the next hop in the multi-hop path from the relay node to the base station) or to the base station itself. A relay node may receive data from any number of other relay nodes (again, decided by the actual multi-hop routing scheme used).

Multi-hop routing introduces additional complexities into the clustering scheme, since the decision to include a sensor node in cluster $𝒞^{j}$ not only affects the energy dissipation of node j, but all intermediate nodes that j uses in its path to the base station. Our proposed heuristic, distributed clustering for multi-hop routing (DCMR), assumes that the multi-hop routing scheme to be used for data transmission is known beforehand. So, each relay node j is aware of the locations of its next-hop node $suc c_{j}$ , and its set of predecessor nodes $𝒫^{j}$ . As with DCSR, we divide the entire algorithm (Algorithm 2) into the following 3 phases.

PHASE 1: sensor identification,

PHASE 2: cluster initialization,

PHASE 3: iterative cluster refinement.

Algorithm 2: Distributed clustering for multihop routing (DCMR).

(1) ∖∖ PHASE 1: Sensor Identification

(2) Relay node j determines $𝒮^{j}$ and exchanges $𝒮^{j}$ , $ξ_{j}$ with its neighbors in $𝒩_{j}$ .

(3) ∖∖ PHASE 2: Cluster Initialization

(4) j initializes $𝒞^{j}$ to the set of all “favored” sensors of j.

(5) ∖∖ PHASE 3: Iterative Cluster Refinement

(6) while stopping criteria not met do

(7) j waits until it receives a down-stream message from each of its predecessors

(i.e., nodes $k \in 𝒫^{j}$ ).

(8) j computes the value of $n_{j}$ and communicates this value as a down-stream

message to $suc c_{j}$ .

(9) if $suc c_{j} = B$ then

(10) j computes and sends, as an up-stream message to all $k \in 𝒫^{j}$ , the values

of $𝔼_{j}^{c} = E_{j}^{M}$ and $ξ_{j}^{c} = ξ_{j}$ .

(11) else

(12) j waits until it receives the values of $𝔼_{k}^{c}$ and $ξ_{k}^{c}$ as an up-stream message

from its successor k (i.e., $k = suc c_{j}$ ).

(13) j determines and communicates the values of $𝔼_{j}^{c}$ and $ξ_{j}^{c}$ to all nodes $k \in 𝒫^{j}$ .

(14) end if

(15) j broadcasts the values of $𝔼_{j}^{c}$ and $ξ_{j}^{c}$ to its neighborhood.

(16) j waits until it receives a down-stream message from each node $k \in 𝒫^{j}$ .

(17) if $j \in ℂ$ then

(18) j prepares a request list of sensor node(s) it proposes to accept into $𝒞^{j}$

from cluster $𝒞^{k}$ of some node k, along with the values of $𝔼_{k}^{c}$ and $ξ_{k}^{c}, k \in 𝒩_{j} - {suc c_{j}}$ .

(19) end if

(20) j combines all request lists it has received, and its own request list (if any)

and eliminates requests that it cannot handle.

(21) j communicates a down-stream message containing the updated request list to $suc c_{j}$ .

(22) j waits until it receives an up-stream message from $s u c c_{j}$ , containing the accpeted_list.

(23) j updates $𝒞^{j}$ by including all sensor nodes for j which are in accpeted_list

and sends accpeted_list to all $k \in 𝒫^{j}$ .

(24) j broadcasts, to $𝒩_{j}$ (including all nodes $k \in 𝒫^{j}$ ), the list of all new sensor

nodes it has included in $𝒞^{j}$ .

(25) j excludes, from $𝒞^{j}$ , the sensor nodes that are now included in other clusters.

(26) end while

The first two phases are almost identical to those in DCSR. During PHASE 1, each relay node j discovers the set $𝒮^{j}$ of sensor nodes it can cover, and exchanges $𝒮^{j}$ and $ξ_{j}$ with all relay nodes $k \in 𝒩_{j}$ . After receiving this information from its neighboring nodes, in PHASE 2 each relay node j determines its favored and essential sensor nodes and includes them in its own cluster $𝒞^{j}$ . An initial valid clustering for each relay node is obtained during this cluster initialization phase.

Iterative cluster refinement (PHASE 3, lines 5–26 of Algorithm 2) comprises the major portion of the DCMR algorithm. In this phase, we use the term “down-stream” to describe messages going from a relay node towards the base station and “up-stream” for messages going away from the base station. A single iteration of PHASE 3 consists of three main subtasks that each relay node j in the network must perform.

(a) Collect Critical Energy Information (Lines 7–15 of Algorithm 2)

In this first subtask of phase 3, our goal is to determine $𝔼_{j}^{c}$ and $ξ_{j}^{c}$ , for all $j \in ℛ$ , and to broadcast these values to the neighborhood of j. A node j without any predecessors can calculate the values of $n_{j}$ and $E_{j}^{M}$ for itself, based only on its own cluster $C^{j}$ . All other nodes must wait for this information to arrive from all its predecessors, to compute the values of $n_{j}$ and $E_{j}$ . Node j then forwards the value of $n_{j}$ to $suc c_{j}$ . If $suc c_{j} = B$ , then j sets $𝔼_{j}^{c} = E_{j}^{M}$ and $ξ_{j}^{c} = ξ_{j}$ , and forwards the values of $𝔼_{j}^{c}$ and $ξ_{j}^{c}$ to all nodes $k \in 𝒫^{j}$ . On receiving this information from $suc c_{j}$ , node j accepts these values if $𝔼_{j}^{c} \geq E_{j}^{M}$ ; otherwise it sets $𝔼_{j}^{c} = E_{j}^{M}$ and $ξ_{j}^{c} = ξ_{j}$ and forwards it to all its predecessors. In this way, all nodes eventually find out the values of $𝔼_{j}^{c}$ and $ξ_{j}^{c}$ along its path.

(b) Create List of Potential Sensor Nodes to Be Included in ℂ (Lines 16–21 of Algorithm 2)

During this subtask, each relay node j determines the sensor nodes it should accept, if any, from other clusters and sends a request for this reclustering towards the base station. It also forwards reclustering requests from its up-stream nodes, which use j as an intermediate node in their respective paths to the base station. In order to determine if j is a potential candidate for accepting sensor nodes in the next iteration, node j first checks if $j \in ℂ$ . If so, it may choose to accept some sensor nodes from neighboring clusters, to reduce the load on the other relay nodes. The number of sensor nodes that may be accepted into cluster $𝒞^{j}$ must be low enough that the maximum-path-energy for node j ( $𝔼_{j}^{c}$ ) does not exceed a specified threshold. Each node j (irrespective of whether $j \in ℂ$ or not) must also wait for cluster reassignment requests from all $k \in 𝒫^{j}$ . On receiving all such requests, it creates a new list, selecting as many sensor nodes as possible from its predecessors' lists as well as its own list (if any) without exceeding its energy threshold. In case it is not possible to accommodate all requests, j gives preference to reassigning sensor nodes from more highly loaded paths. Node j forwards this new request list to $suc c_{j}$ . If $suc c_{j} = B$ , then the base station simply returns this same list to j, as accepted_list.

(c) Update Clusters (Lines 22–25 of Algorithm 2)

This is the last subtask in phase 3 where node j, if applicable, updates its cluster by including some sensors from other clusters, or by shedding some sensors to be acquired by other clusters. The accepted_list received by node j, from its down-stream neighbor ${s u c c}_{j}$ , specifies all sensor nodes that can be reassigned for j and its “up-stream” relay nodes. On receiving accepted_list, node j updates its own cluster, if necessary, and forwards the list to all its predecessors. In order to update its cluster, node j accepts all sensor nodes allocated to it (if any) in accepted_list, and broadcasts the updated cluster $𝒞^{j}$ to all its neighbors. Since all nodes are executing the same steps, j receives, from its neighbors, their respective clusters. If any node k, indicates (line 24) that k has accepted a sensor node $s \in 𝒞^{j}$ , then node j removes s from $𝒞^{j}$ , and broadcasts the new $𝒞^{j}$ to its neighborhood (line 25). After updating $𝒞^{j}$ , the algorithm completes one iteration for node j.

When node j completes all steps in an iteration, a valid clustering is formed with an overall lifetime no less than the previous clustering. This process continues, in an iterative manner, until the “stopping-criteria” are met. The stopping criteria can be based on a predefined number of iterations, a threshold limit for the relative amount of improvement, a combination of these factors, or any other suitable means. For our simulations, we allowed the iterations to continue until a stage is reached where there is no significant update in any cluster.

We note that a “valid cluster” is an assignment of sensor nodes to different cluster heads (relay nodes), such that each sensor node is assigned to exactly one cluster head and this cluster head is within its transmission range. It is easy to see that, in each iteration, the proposed distributed algorithms will produce a solution that is never worse than the previous iteration. In this sense, the algorithms converge towards the best solution within a given stopping-criteria. However, it is not guaranteed that the solution will be an optimal solution. Considering this as a convergence problem and developing techniques for improving convergence speed present an interesting extension that can be addressed in future work.

4. Experimental Results

4.1. Experimental Setup

We have studied the performance of our distributed clustering schemes through simulations. We considered two different network configurations, the first with 12 relay nodes, covering an area of 160 m × 160 m, and the second with 24 relay nodes, covering an area of 240 m × 240 m. The number and the positions of the relay nodes are determined, based on the distribution of sensor nodes, using an existing placement scheme proposed in [7]. For each configuration, we generated the sensor locations randomly, with the number of sensor nodes varying from 75 nodes to 1000 nodes. We set the transmission range of each relay (sensor) node to 100 m (40 m). We calculated the energy dissipation and network lifetime using the models given in Section 2, and followed [2] to set the values of the constants as given below: (1)

$α_{1} = α_{2} = 50 nJ/bit$ ,

(2)

$β = 100 pJ / bit / m^{2}$ ,

(3)

path loss exponent, $q = 2$ .

For a given network and a given number of sensor nodes, we randomly generated 10 different sensor node distributions. For each relay/sensor node distribution, we compared the performance of our heuristics with the following existing clustering approaches [14, 16]: (1)

Greedy Clustering (GC). In GC, each relay node works in sequence and greedily picks all sensor nodes which may communicate with the relay node under consideration

(2)

Least Distance Clustering (LDC). LDC, a relay node j picks a sensor node s, if j closest to s among all relay nodes and j lies within the transmission range of s.

(3)

ILP for Single-Hop Routing (ILP-S). ILP-S is an approach based on an integer linear program (ILP) formulation [16] that produces optimal results, for networks using SHDTM.

(4)

ILP for Multi-Hop Routing (ILP-M). ILP-M is based on an ILP formulation [16] that produces optimal results for networks using MHDTM.

We obtained the results for ILP-S and ILP-M using ILOG CPLEX [21] and used custom-made simulators to obtain the results for the other approaches. In order to evaluate the performance of our distributed algorithms, it is important to take into consideration the energy overhead incurred by the relay nodes during the clustering phase. We did this by applying a penalty, equivalent to the energy required to transmit a 1 kb packet over a distance of $2 r$ , to each relay node for every iteration executed by the clustering algorithm. We note that the communication penalty only applies to the proposed distributed algorithms, and does not apply either to the optimal ILP solution, or the centralized heuristics. This is because in the centralized algorithms, the base station calculates the clustering based on global information about node positions, data rates, and so forth. Then the base station transmits the final clustering information directly to each relay node. So, the relay nodes are not required to exchange information among them for the clustering phase.

4.2. Performance Evaluation for DCSR

In this section, we have presented the simulation results for our DCSR approach, where the network uses SHDTM. Our objective was to cluster the sensor nodes around the relay nodes, such that the maximum energy dissipation of a relay node is minimized. Based on the N-of-N lifetime [4] considered in this paper, this is equivalent to maximize the network lifetime, as discussed in Section 2. Figure 2 (Figure 3) shows the average lifetime, counted by the number of rounds, achieved by these approaches for networks with 12 (24) relay nodes, where the number of sensor nodes differed from one network to another. In each figure, the lifetimes are shown in the order GC, LDC, DCSR, and ILP-S. Both Figures 2 and 3 demonstrate that the proposed approach not only significantly outperforms existing clustering techniques (GC and LDC) with improvements in the range 30%–40%, it produces results very close to the optimal solution obtained using ILP-S.

Figure 2

Lifetimes for a 12-relay node network with different clustering methods.

Figure 3

Lifetimes for a 24-relay node network with different clustering methods.

Table 2 shows the relative performances of the different clustering approaches with respect to the optimal lifetimes (computed by ILP-S) for networks with 24-relay nodes each. The results for networks with 12 relay nodes is similar. We see that, on an average, GC and LDC never achieve more that 70% of maximum lifetime, while DCSR can typically achieve 84%–98% of the maximum lifetime on average. As the network size and the number of sensor nodes increase, there is a wider gap between DCSR and the optimal solution.

Table 2

Percentage of optimal lifetime achieved by different approaches on the 24 relay node network.

	% lifetime achieved form the optimal when the number of sensor nodes is
	200	300	400	500	750	1000

GC	64.1	59.8	58.6	59.8	57.9	57.3
LDC	67.7	68.4	66.4	64.6	64.4	68.8
DCSR	98.2	95.8	95.4	96.4	93.0	84.5

4.3. Performance Evaluation for DCMR

In this section, we have presented the simulation results for our DCMR approach. Since DCMR takes into consideration the specific routing scheme, we used two well-known strategies for multi-hop routing. (i)

Minimum-Transmission Energy Routing (MTER). In MTER [2, 16], each relay node transmits to one of its nearest neighbors, which is closer to the base station (or to the base station itself).

(ii)

Minimum-Hop Routing (MHR). In MHR, each relay node finds a path to the base station, such that the number of hops required is minimized.

Figure 4 (Figure 5) show the average lifetimes obtained using existing heuristics (GC and LDC) compared to our approach (DCMR) and the optimal solution (ILP-M), for MTER on a 12- (24-) node network. The results indicate that DCMR consistently outperforms existing heuristics, although the differences in the overall lifetimes are not as high as for the single-hop case. We attribute this to (i)

the cost associated with the additional broadcasts required by each relay node to update the energy information of the entire path, and

(ii)

the lack of global knowledge about the networks by each relay (each relay node has only local knowledge).

Figure 4

Network lifetimes for a 12-node network with different clustering methods when MTER routing is used.

Figure 5

Network lifetimes for a 24-node network with different clustering methods when MTER routing is used.

However, as shown in the figures, the lifetime achieved by our approach is always within 10% of the corresponding optimal lifetimes, obtained using the ILP-M formulation.

Figure 6 shows the average lifetimes obtained for a network with 24 nodes, where the transmission range of the relay nodes is set to 100 m using (a) existing heuristics (GC and LDC), (b) the optimal solution (ILP-M), and (c) the distributed clustering algorithm DCMR. We have noticed that, in most situations, LDC performs better than GC. However, there are some cases where GC outperforms LDC. One advantage of our proposed approach is that we can use any existing clustering scheme to form the initial cluster. Therefore, our approach is guaranteed to be as good as any specific clustering scheme (e.g., GC, LDC), since we can use the specific clustering scheme during the cluster initialization phase itself.

Figure 6

Network lifetimes for a 24-node network with different clustering methods when MHR routing is used.

We note that, in our simulations, the sensor nodes were distributed randomly over the entire sensing area. For this type of distribution and the number of sensor nodes we considered, the lifetime of the relay nodes were determined primarily by the distance over which they had to transmit data. So, for SHDTM, the nodes farthest from the base station depleted their energy much quicker, and since the network lifetime, based on N-of-N-lifetime metric, is over as soon as the first node dies, the overall lifetime for SHDTM has been found to be lower, as compared to MHDTM. However, we would like to note that for very densely populated networks, the load (total amount of data to be transmitted) on a relay node may become more important than the transmission distance. In such cases, SHDTM would be expected to perform better.

In MHDTM, each relay node transmits over a relatively smaller distance. So, the effect of moving a few sensor nodes from one cluster to another does not have as big an impact on the energy consumption of a relay node as it would for SHDTM, where a node might transmit over a much larger distance. So, the effects of clustering are not as pronounced for MHDTM.

5. Conclusions

In this paper, we have proposed the concept of “routing-aware” clustering for sensor networks that takes into consideration the effect of different routing strategies and adapts the clusters accordingly. We have developed two distributed clustering algorithms based on this concept—the first (DCSR) for single-hop routing and the second (DCMR) for multihop routing. The proposed approach is suitable for autonomous systems without any centralized control and provides quick, efficient solutions for networks with hundreds of sensor nodes. A useful feature of our approach is that each iteration of the algorithms DCSR or DCMR generates a feasible solution, and the network lifetime is guaranteed never to decrease as a result of cluster reassignments in future iterations. This means that our algorithm can be used in conjunction with any clustering heuristic (e.g., GC or LDC) to generate an initial cluster assignment. Then the iterative steps can be applied to further extend the network lifetime.

Through simulations, we have demonstrated that both DCSR and DCMR can produce results that are close to the optimal solutions obtained using ILP-based optimization approach. For a given set of relay nodes, the network lifetime decreases, as expected, with the increase of number of sensor nodes. However, in all cases, our approaches consistently outperform existing clustering algorithms, which do not consider the routing schemes used for data transmission.

Footnotes

Acknowledgment

The work of A. Jaekel and S. Bandyopadhyay has been supported by research grants from the Natural Sciences and Engineering Research Council of Canada (NSERC).

References

Akyildiz

I. F.

Sankarasubramaniam

Cayirci

Wireless sensor networks: a survey

Computer Networks 2002 38 4 393 422

2-s2.0-0037086890

10.1016/S1389-1286(01)00302-4

Heinzelman

Chandrakasan

Balakrishnan

Energy effcient communication protocol for wireless micro-sensor networks

Proceedings of the Hawaii International Conference on System Sciences (HICSS'00)

2000

3005 3014

Gupta

Younis

Load-balanced clustering of wireless sensor networks

Proceedings of the International Conference on Communications (ICC '03)

May 2003

1848 1852

2-s2.0-0037632539

Pan

Hou

Y. T.

Cai

Shi

Shen

S. X.

Topology Control for Wireless Sensor Networks

Proceedings of the 9th Annual International Conference on Mobile Computing and Networking, (MobiCom '03)

2003

286 299

2-s2.0-1542358975

Tang

Hao

Sen

Relay node placement in large scale wireless sensor networks

Computer Communications 2006 29 4 490 501

2-s2.0-32644439091

10.1016/j.comcom.2004.12.032

Nayak

Stojmenovic

Applications, models, problems, and solution strategies

Wireless Sensor and Actuator Networks: Algorithms and Protocols for Scalable Coordination and Data Communication 2010

New York, NY, USA

John Wiley & Sons

Bari

Jaeke

Bandyopadhyay

Optimal placement and routing strategies for resilient two-tiered sensor networks

Wireless Communications and Mobile Computing 2008 9 7 920 937

10.1002/wcm.639

Kalpakis

Dasgupta

Namjosh

Efficient algorithms for maximum lifetime data gathering and aggregation in wireless sensor networks

Computer Networks 2003 42 6 697 716

2-s2.0-0038148549

10.1016/S1389-1286(03)00212-3

Blough

D. M.

Santi

Investigating upper bounds on network lifetime extension for cell-based energy conservation techniques in stationary ad hoc networks

Proceedings of International Conference on Mobile Computing and Networking (ACM MobiCom2 '02)

September 2002

183 192

2-s2.0-0036949028

10.

Cardei

D. Z.

Improving wireless sensor network lifetime through power aware organization

Wireless Networks 2005 11 3 333 340

2-s2.0-24044520078

10.1007/s11276-005-6615-6

11.

Dietrich

Dressler

On the lifetime of wireless sensor networks

ACM Transactions on Sensor Networks 2009 5 1, article 5 1 39

2-s2.0-60449083692

10.1145/1464420.1464425

12.

Madan

Cui

Lall

Goldsmith

Cross-layer design for lifetime maximization in interference-limited wireless sensor networks

Proceedings of the IEEE INFOCOM

March 2005

1964 1975

2-s2.0-25844490513

13.

Al-Karaki

J. N.

Kamal

A. E.

Routing techniques in wireless sensor networks: a survey

IEEE Wireless Communications 2004 11 6 6 28

2-s2.0-11144277843

10.1109/MWC.2004.1368893

14.

Gupta

Younis

Performance evaluation of load-balanced clustering of wireless sensor networks

Proceedings of the 10th International Conference on Telecommunications

2003

1577 1583

15.

Zhang

Datta

Sensor networks: distributed algorithms reloaded or revolutions?

4056

Proceedings of the International Colloquium on Structural Information andCommunication Complexity, (SIROCCO '06)

2006

LNCS

24 28

16.

Bari

Jaekel

Bandyopadhyay

Clustering strategies for improving the lifetime of two-tiered sensor networks

Computer Communications 2008 31 14 3451 3459

2-s2.0-49649114550

10.1016/j.comcom.2008.05.038

17.

Bari

Luo

Jaekel

Bandyopadhyay

Distributed clustering techniques for improving lifetime in two-tiered sensor networks

Proceedings of the 7th Annual Communication Networks and Services Research Conference, (CNSR '09)

May 2009

185 192

2-s2.0-67650314387

10.1109/CNSR.2009.37

18.

Gupta

Younis

Fault-tolerant clustering of wireless sensor networks

Proceedings of the IEEE Wireless Communications and Networking Conference, (WCNC '03)

2003

1579 1584

19.

Ossama

Sonia

HEED: a hybrid, energy-efficient, distributed clustering approach for ad hoc sensor networks

IEEE Transactions on Mobile Computing 2004 3 1 366 379

2-s2.0-10944266504

10.1109/TMC.2004.41

20.

Manjeshwar

Agarwal

D. P.

APTEEN: a hybrid protocol for efficient routing and comprehensive information retrieval in wireless sensor networks

Proceedings of the International Parallel and Distributed Processing Symposium

2002

195 202

21.

Plassard

M. F.

The impact of new technology on document availability and access

Interlending and Document Supply 1989 17 1 3 10

2-s2.0-55249126055