Optimizing Cluster Heads for Energy Efficiency in Large-Scale Heterogeneous Wireless Sensor Networks

Abstract

Many complex sensor network applications require deploying a large number of inexpensive and small sensors in a vast geographical region to achieve quality through quantity. Hierarchical clustering is generally considered as an efficient and scalable way to facilitate the management and operation of such large-scale networks and minimize the total energy consumption for prolonged lifetime. Judicious selection of cluster heads for data integration and communication is critical to the success of applications based on hierarchical sensor networks organized as layered clusters. We investigate the problem of selecting sensor nodes in a predeployed sensor network to be the cluster heads to minimize the total energy needed for data gathering. We rigorously derive an analytical formula to optimize the number of cluster heads in sensor networks under uniform node distribution, and propose a Distance-based Crowdedness Clustering algorithm to determine the cluster heads in sensor networks under general node distribution. The results from an extensive set of experiments on a large number of simulated sensor networks illustrate the performance superiority of the proposed solution over the clustering schemes based on k-means algorithm.

1. Introduction

Multiple sensor systems have been the target of active research since the early 1990s due to their widespread use in many agricultural, civil, industrial, and military applications that involve environmental monitoring, target tracking, and situational assessment [1, 2]. Recent developments in Microelectromechanical Systems (MEMS) make it now possible to deploy a large number of inexpensive and small sensors to perform complex tasks by obtaining quality through quantity. In some important applications, sensor networks are deployed for remote operations in vast unstructured geographical areas. In such deployments, wireless networks with low bandwidth are usually the only means of communication among the sensors. These sensors are typically powered by irreplaceable batteries with limited energy supply, a large portion of which has to be spent on data communication among sensors and the Base Station (BS). Therefore, minimizing the Total Energy Consumption (TEC) for sensor data gathering is critical to ensuring sustained operations of these large-scale Wireless Sensor Networks (WSNs), even though minimizing TEC does not necessarily maximize network lifetime, which also depends on the balance of residual energy across the network.

It has been well recognized that clustering provides an efficient and scalable way to design and organize large-scale WSNs for energy efficiency of data communication. In a typical hierarchical WSN that deploys a large number of homogeneous or heterogeneous sensors, clusters are formed around a set of strategically selected or randomly designated Cluster Heads (CHs). The sensors within each cluster, often referred to as Leaf Nodes (LNs), collect and send environmental measurements to their corresponding CH, which performs an appropriate form of data processing (i.e., aggregation and compression) and sends the result to a BS for integration with data from other clusters. The BS, which is located either inside or outside the sensor network region, could be wire connected or rechargeable, and hence is often considered to have unlimited energy supply. Apparently, the number and location of CHs in hierarchical WSNs have a significant effect on the energy consumption for data communication from LNs to the BS. Many existing clustering algorithms assume that the number of clusters is predetermined and each CH is also designated a priori or on a random basis [3, 4]. In practice, however, such information on CHs is not always readily available during network implementation and operation, especially when the network is deployed in an unstructured environment with unpredictable disturbances and threats. As a matter of fact, determining the optimal number and location of CHs has been raised as one of the most fundamental problems in clustering-based hierarchical WSNs.

The global knowledge of sensor locations is crucial for determining the optimal number and location of CHs. Once deployed, each sensor can acquire and report its geographical location through a certain location discovery process. The Global Positioning System (GPS) is certainly a very effective but prohibitively expensive solution due to its high cost and the constrained energy supply of sensors. Many other location discovery approaches use received signal strength, neighbor node position, or arrival time difference to estimate the distance between two sensors [5]. Based on the distance estimations, sensors can compute their locations by distance or angle combining (hyperbolic trilateration, triangulation, or maximum likelihood estimation). To make such combining feasible, the computation requires at least two known reference points, which can be obtained through either a GPS or a deterministic deployment.

We investigate the problem of selecting an optimal subset of sensors, which are designated as CHs to form clusters, to minimize the TEC for data communication per round from LNs to the BS in a predeployed WSN under a uniform or general node distribution. We rigorously derive an analytical solution to calculate the number of CHs for minimum TEC in WSNs where nodes are uniformly distributed. In WSNs where nodes are deployed according to an unknown probability distribution, we propose a heuristic algorithm, Distance-based Crowdedness Clustering (DCC), to determine the optimal number and location of CHs. The extensive experimental results on simulated WSNs ranging from small to large scales show the performance superiority of the proposed methods compared to the clustering and optimization schemes based on classical k-means algorithm in terms of TEC for data communication per round. However, the main purpose of these performance comparisons is to recognize the necessity of such CH optimization and illustrate the efficacy of the proposed research approaches. In fact, the proposed optimization schemes, which could be performed offline using global information prior to the actual deployment, are largely orthogonal to the main body of current research efforts on sensor network clustering, and the optimization results obtained by our approaches can be used to effectively guide the operation of, and therefore greatly improve the performance of, the classical clustering algorithms that require a priori knowledge on CHs.

The rest of the paper is organized as follows. We describe related work in Section 2. The energy consumption models are presented and the optimization problems are formulated in Section 3. In Section 4, we derive an analytical solution for WSNs under uniform distribution and develop a heuristic approach for WSNs under general distribution. Implementation details and performance evaluations are provided in Section 5. We conclude our work and discuss future efforts in Section 6.

2. Related Work

In WSNs, a CH, which may collect environmental sensing data as well in some cases, is responsible for receiving, processing, and transmitting data from the LNs in its service area to the BS, and hence it consumes much more energy than an LN. The sensor nodes in the close proximity of a CH may also run out of battery quickly due to frequent data forwarding. Therefore, designating an optimal subset of sensor nodes as CHs at appropriate locations is critical to minimizing the TEC for prolonging the lifetime of the entire network. There exist a large number of research efforts in the literature that have been devoted to solving various clustering problems with different objectives [3, 6–15].

Several clustering algorithms have been proposed to minimize the energy consumption or satisfy the network connectivity requirement with the assumption that the number of CHs is known a priori [3, 4, 12, 16]. Some other researchers tackle the problem of optimizing the number and location of CHs, which is closely related to our work. Kim et al. derived analytical formulas to estimate the optimal number of CHs to achieve the minimum TEC of the entire network, based on the assumption of even partition (i.e., each cluster covers an equal number of sensors) and one-hop communication in both intracluster and intercluster routing [6]. In [7], Wang et al. calculated the optimal number of CHs for a WSN by applying a cross-layer approach from both perspectives of the power efficiency in the medium access control (MAC) layer and the coverage performance in the physical layer. In [9], Xu and Cassandras determined the optimal location of CHs for minimum communication energy consumption in a two-level heterogeneous WSN. They formulated the optimization problem with the constraint that each LN is connected to at least p CHs and each CH has at most q LNs as a Mixed Integer Nonlinear Programming (MINLP) problem and achieved a global optimum based on an iterative decomposition algorithm and a randomized multistart technique. Srinivas et al. studied the problem of minimizing the number of backbone nodes and referred to it as the Connected Disk Cover problem in [17], where they controlled the mobility of the backbone nodes to maintain the connectivity of the network. In [18], Tao and Zheng proposed an improved algorithm that combines the optimal number of CHs with energy adaptive CH selection algorithm based on the classical Low-Energy Adaptive Clustering Hierarchy (LEACH) [3] protocol.

One commonly adopted way to ensure load balance and meet energy constraint of the entire network is to rotate the role of a CH and form a corresponding cluster on a random and periodical basis among all sensor nodes. The classical energy-efficient algorithm, LEACH [3], employed randomized rotation of CHs to evenly distribute the energy consumption among the sensors in the network. Yang and Sikdar [8] applied a sleep-wakeup scheme based on a decentralized MAC protocol to LEACH and further proposed an analytical framework for achieving the optimal probability with which a sensor becomes a CH in order to minimize network energy consumption and prolong network lifetime, assuming uniform distribution of all the sensors. A similar node election strategy that considers the amount of residual energy was also used in [19] with multihop data communication. Du and Xiao presented an energy-efficient Chessboard Clustering routing protocol for heterogeneous sensor networks [20] to balance node energy consumption and increase network lifetime, in which some sensors are initially set to be in sleeping mode and can be activated later on according to a certain procedure. Some research efforts along a different line allow sensors to adjust their locations after initial deployment. Mao and Wu proposed two new sensor location updating algorithms, VFSec and Weighted Centroid, to jointly optimize sensing coverage and secure connectivity [21].

Nonlinear programming has been widely used to model wireless sensor data communication as network flows to minimize TEC in WSNs. Besides [9], Ergen and Varaiya also formulated their problem as a nonlinear programming problem that determines the optimal location of relay nodes and the optimal energy provided to them so that the network is alive during the desired lifetime with minimum total energy [22]. In [23], Dasgupta et al. presented the sensor deployment problem for maximum lifetime with coverage constraints and proposed an algorithm motivated by force-directed/potential field-based approaches in robotics and graph drawing to place and assign the role of each sensor in the system to maximize network lifetime.

The probability of a sensor to be a CH is also considered in some work. Bandyopadhyay and Coyle proposed a distributed and randomized clustering algorithm to organize the sensors in a WSN into clusters, and they further extended this algorithm to generate a hierarchy of CHs and observed more energy savings as the number of levels in the hierarchy increases [24].

The main differences between our work and the aforementioned ones arise from the following aspects: (i) we investigate both uniform distribution and general random deployment for large-scale sensor networks; (ii) we propose both analytical derivation and heuristic algorithm to solve the CH optimization problems; (iii) we consider complete data aggregation and use a compression ratio to reflect the reduction in data size at CHs; (iv) we do not assume even partition of sensor networks as in [6].

3. Network Model and Problem Formulation

3.1. Energy Consumption Model

We consider two different types of energy consumption for data transmission and receiving, respectively: a transmitter consumes energy to run both the radio electronics and the power amplifier, while a receiver only consumes energy to drive the radio electronics. The mobile radio channels on typical sensor nodes are predominantly in the VHF (frequency from 30 MHz to 300 MHz, wavelength from 1 m to 10 m) and UHF (frequency from 300 MHz to 3 GHz, wavelength from 10 cm to 1 m), respectively [25, 26]. Since the intercluster communication distance is typically much longer than the intracluster communication distance, we employ (i) the free space (fs) fading channel model for intracluster wireless communication that incurs a $d^{2}$ power loss, and (ii) the multipath (mp) fading channel model for intercluster wireless communication that incurs a $d^{4}$ power loss [3, 6, 19, 25]. Multipath fading is mainly caused by the reflection of radio waves off structures such as buildings, mountains, trees, and other landscape features in the environment where the nodes are deployed. The impact of multipath fading is particularly strong in rich scattering environments such as offices and other indoor locales, and could be also significant in some outdoor deployments [25].

In a real communication system, the transmission power could be adjusted by suitably configuring the power amplifier. Therefore, the energy dissipation in transmitting one unit of data message over a directed wireless communication link can be modeled as $E_{t} (i)$ :

\begin{matrix} E_{t} (i) & = E_{elec} + E_{amp} (d_{i, j}) \\ = {\begin{cases} E_{elec} + ϵ_{fs} \cdot {d_{i, j}}^{2}, intra cluster \\ E_{elec} + ϵ_{mp} \cdot {d_{i, j}}^{4}, inter cluster, \end{cases} \end{matrix}

(1)

where

E_{elec}

denotes the energy for driving the electronics, which depends on various factors including digital coding, modulation, filtering, and spreading of the signals, for both transmitter electronics and receiver electronics;

ϵ_{fs}

and

ϵ_{mp}

are the coefficients for calculating the amplifier energy

E_{amp}

, which depends on the Euclidean distance

d_{i, j} = \sqrt{(x_{i} - x_{j})^{2} + (y_{i} - y_{j})^{2}}

between transmitter

v_{i}

located at

(x_{i}, y_{i})

and receiver

v_{j}

located at

(x_{j}, y_{j})

as well as the acceptance bit-error rate. The energy consumed by a sensor

v_{i}

in receiving one unit of data packet is denoted as

E_{r} (i) = E_{elec}

. Note that the above transmission and receiving energy models assume a contention free MAC protocol, where interferences from simultaneous transmission can be avoided.

A CH, which also collects environment sensing data, receives data messages from LNs within the cluster and sends all the data to the BS after performing a certain type of data processing (such as data aggregation and data compression). We use a constant $E_{p}$ to represent the energy spent in processing each unit of received or sensed data. We assume that the CH performs complete data aggregation, that is, an input of two k-bit messages produces an output of one k-bit message after aggregation. Furthermore, we use a parameter α, $0 < α \leq 1$ , to denote the data compression ratio: an input of k bits results in an output of $α \cdot k$ bits after compression.

3.2. Problem Formulation

The problem of determining the optimal number and location of CHs for minimum TEC in sensor networks is formulated as follows. We consider a WSN where n sensor nodes have been deployed in a bounded $L \times L$ ( $m^{2}$ ) square region and a single BS is located at ( $x_{BS}, y_{BS}$ ), somewhere inside or outside the network region. The location of each sensor $v_{i}$ , $i \in 0,1, \dots, n - 1$ , is denoted as ( $x_{i}, y_{i}$ ). We assume a one-hop communication model for both intracluster (from LNs to their associated CHs) and intercluster (from CHs to the BS) communication: the transmission energy within each cluster is calculated using the fs model; while from CHs to the BS, the mp model is used. We consider two different sensor deployment scenarios: (i) uniform node distribution and (ii) general node distribution. The CH optimization problem is to strategically designate an appropriate subset of sensor nodes in the network as CHs, each of which forms a cluster with its neighbor nodes, such that the TEC for the transmission of each unit of data message from all LNs to CHs and to the BS per round is minimized. Here, the “round” is defined as a time period during which every sensor in the network sends one unit of data message to the BS through its associated CH. The optimization objective is to determine the optimal number and location of CHs in the given sensor network for minimum TEC.

We consider the following general conditions or assumptions in our problem formulation. (i)

All sensors are predeployed and have constrained energy supply.

(ii)

The BS is also predeployed and has unlimited energy supply.

(iii)

The network is static, that is, neither the sensors nor the BS has mobility once deployed.

(iv)

The total number of sensors is known.

(v)

Each CH forms exactly one cluster, and besides data processing, also performs the same task of environmental sensing and data collection as a regular sensor node.

(vi)

There exists a contention free MAC protocol for wireless communication.

We consider the energy consumption for data transmission of each LN, and for data receiving, processing, and transmission of each CH. Since the energy cost for environment sensing is generally much less than communication and processing tasks, we do not consider sensing energy cost here. Obviously, the TEC depends on the network distribution, the number and location of CHs, and the compression ratio α at CHs.

4. Optimizing Number and Location of CHs

4.1. Analytical Derivation for Uniform Distribution

When CHs perform data processing or are responsible for certain intracluster management duties, uniform distribution of sensors among the network is usually a goal for setting up the network, which can also leverage data delay [27]. In LEACH and other similar clustering algorithms, the expected number k of CHs per round is considered as a prefixed system parameter [3, 4, 12, 16]. Here, we shall rigorously derive an analytical formula for calculating the optimal value of k to achieve the minimum TEC of data transfer from LNs to the BS through their corresponding CHs. The optimal k value determined by our approach can be used to guide the execution of the clustering algorithms that require such information.

We first calculate the expected distances from an LN to its CH and from a CH to the BS using an approach similar to the one used in [8]. Since the CHs are uniformly distributed in an $L \times L$ ( $m^{2}$ ) sensor network region, the expected square area covered by each cluster with the CH deployed at $(x_{CH}, y_{CH})$ can be calculated as: $\sqrt{L^{2} / k} \times \sqrt{L^{2} / k}$ based on Voronoi tessellation. Furthermore, the LNs are also uniformly and independently deployed in each cluster, where we have $E [x_{LN}] = E [x_{CH}] = E [y_{LN}] = E [y_{CH}] = (1 / 2) \sqrt{L^{2} / k}$ , and $E [{(x_{LN})}^{2}] = E [{(x_{CH})}^{2}] = E [{(y_{LN})}^{2}] = E [{(y_{CH})}^{2}] = (1 / 3 k) L^{2}$ . Therefore, the average squared distance from an LN to its CH within a cluster can be calculated in

\begin{matrix} r^{2} & = E [{(x_{LN} - x_{CH})}^{2} + {(y_{LN} - y_{CH})}^{2}] \\ = E [{(x_{LN})}^{2}] - 2 E [x_{LN}] E [x_{CH}] + E [{(x_{CH})}^{2}] \\ + E [{(y_{LN})}^{2}] - 2 E [y_{LN}] E [y_{CH}] + E [{(y_{CH})}^{2}] \\ = \frac{1}{3 k} L^{2} . \end{matrix}

(2)

Without loss of generality, we suppose that the BS location $(x_{BS}, y_{BS})$ is prefixed at $(0,0)$ to simplify calculation. For a coarse grained analysis, we assume identical expected distance from CHs to the BS. Therefore, the average squared distance from a CH to the BS is similarly given by

\begin{matrix} R^{2} = E [{(x_{CH} - x_{BS})}^{2} + {(y_{CH} - y_{BS})}^{2}] = \frac{2}{3} L^{2} . \end{matrix}

(3)

The TEC per round, denoted by $E_{Tot}$ , is the sum of the energy consumption $E_{LN}$ of all LNs for data transmission and the energy consumption $E_{CH}$ of all CHs for data receiving, processing, and transmission in one round, which can be defined as

\begin{matrix} E_{Tot} = E_{LN} + E_{CH} . \end{matrix}

(4)

Since

E_{LN}

only contains transmission energy cost

E_{t}

and the total number of LNs in the network is

n - k

E_{LN}

can be estimated as

\begin{matrix} E_{LN} = (n - k) E_{t} = (n - k) (E_{elec} + ϵ_{fs} r^{2}) . \end{matrix}

(5)

Similarly,

E_{CH}

is the total energy cost for the transfer of one unit of data from each CH to the BS in one round, which includes the energy cost

E_{r}

for receiving,

E_{p}

for processing, and

E_{t}

for transmission. Each of

n - k

LNs transfers one unit of data to its corresponding CH, which performs processing (aggregation and compression) on the received data and its own sensing data, and sends the compressed aggregated result to the BS. Since there are total n units of input data (including

n - k

units of data received from LNs and k units of data collected by k CHs themselves), the total energy consumed by k CHs is defined by

\begin{matrix} E_{CH} & = (n - k) E_{r} + n E_{p} + α k E_{t} \\ = (n - k) E_{elec} + n E_{p} + α k (E_{elec} + ϵ_{mp} R^{4}) . \end{matrix}

(6)

Using (2), (3), (4), (5), and (6), we obtain the TEC per round

E_{Tot}

as follows:

\begin{matrix} E_{Tot}  =  (2 n - 2 k + α k) E_{elec} + n E_{p} + (n - k) ϵ_{fs} \frac{L^{2}}{3 k} + α k ϵ_{mp} \frac{4 L^{4}}{9} . \end{matrix}

(7)

The TEC per round

E_{Tot}

can be minimized by selecting an optimal value of k, which is a solution to the first derivation of (7). Following that, the optimal number k of CHs can be calculated as (the negative solution is ignored)

\begin{matrix} k = \sqrt{\frac{3 n L^{2} ϵ_{fs}}{9 (α - 2) E_{elec} + 4 α L^{4} ϵ_{mp}}} . \end{matrix}

(8)

We further verify that the solution to the second derivation of (7) is positive. Therefore, we conclude that the value of k defined in (8) results in the minimum TEC in WSNs with uniform node distribution. Once the optimal number of CHs is obtained, their locations can be determined based on Voronoi tessellation among uniformly distributed sensors.

4.2. Algorithm Design for General Distribution

Though uniform distribution is suited for achieving energy balance, it may not always be feasible for practical sensor network applications, especially for those deployed in large and harsh environments not accessible to humans. In such environments, sensors may be airborne or dropped by other appropriate means that could lead to a more general node distribution.

We develop a heuristic algorithm, Distance-based Crowdedness Clustering (DCC), based on a cutoff distance and the concept of crowdedness to solve the CH optimization problem in WSNs under general node distribution. Here, the “cutoff distance” is the threshold that decides which cluster an LN belongs to: if the distance from an LN to the CH is within the cutoff distance, the LN is considered to be located inside this cluster; otherwise, it is not. The concept of “crowdedness”, which is closely related to the number of neighbor nodes of each sensor node, is used to describe the density of sensor nodes in a given region.

In the DCC algorithm, we first calculate the distances $d_{i, j} (i, j \in V)$ between all pairs of sensor nodes, each of which is selected based on the cutoff distance. Using this distance, we designate the sensor with the largest number of neighbor nodes as a CH and form a cluster of this CH with all its unclustered neighbors. We repeatedly designate CHs with most neighbor nodes in the rest of the LNs using the same cutoff distance until there is no LN left unclustered. We calculate the TEC using (7) in Section 4.1 for all cutoff distances, from which, the minimum one is selected as the final result. The pseudocode of DCC is given in Algorithm 1. The complexity of this algorithm is $O (n^{3})$ .

Algorithm 1: Distance-based Crowdedness Clustering.

Input: a sensor network $G = (V, E)$ with n LNs

randomly deployed in a $L \times L$ ( $m^{2}$ ) square region

and one BS deployed inside or outside the region.

Output: the optimal number k and location of CHs with

minimum TEC.

(1) Calculate all-pair distances $d_{i, j}$ , for $v_{i}, v_{j} \in V$ , in an

array $A_{d}$ ;

(2) Initialize minimum TEC $TE C_{\min}$ = $+ \infty$ ;

(3) for all distances $d_{i, j} \in A_{d}$ do

(4) Set cutoff distance $d_{cut} = d_{i, j}$ ;

(5) Set $v_{m}$ as a neighbor of $v_{n}$ if $d_{m, n} \leq d_{cut}$ for all m,

$n \in V$ ;

(6) Sort all $v \in V$ according to the number of neighbors

in a decreasing order and place them in an array $A_{v}$ ;

(7) Insert all $v \in V$ in an unclustered sensor queue $Q_{u}$ ;

(8) Initialize a clustered sensor queue $Q_{c} = \emptyset$ ;

(9) Initialize the number of clusters $n_{clusters}$ = 0;

(10) while $Q_{u} \neq \emptyset$ do

(11) Retrieve $v_{k} \in Q_{u}$ from $A_{v}$ and designate it as a CH;

(12) Form a cluster $C_{k}$ of $v_{k}$ and its neighbors $v_{l} \in Q_{u}$ ;

(13) Insert all $v \in C_{k}$ in $Q_{c}$ ;

(14) Remove all $v \in C_{k}$ from $Q_{u}$ ;

(15) $n_{clusters}$ ++;

(16) Calculate the TEC;

(17) if $TE C_{\min} >$ TEC then

(18) $TE C_{\min}$ = TEC;

(19) k = $n_{clusters}$ ;

(20) return k and location.

5. Performance Evaluation

5.1. Implementation and Experimental Settings

The proposed DCC algorithm is implemented in C++ and runs on a Windows XP desktop equipped with a 3.0 GHz CPU and 2 Gbytes memory. We conduct an extensive set of experiments using a wide variety of simulated sensor networks, in which two deployment scenarios are considered: uniform and general distributions. We generate these simulation datasets by varying the size of network regions and the number of initially deployed sensor nodes. The parameters used in the testing sensor networks and the sensor radio characteristics of the energy cost models for wireless communication [6] are tabulated in Table 1. For testing purposes, we consider a fixed value 0.8 for the compression ratio α, which has an impact on the clustering process.

Table 1

WSN communication parameters.

Parameter	Value
$L \times L$	$200 \times 200$ (m²)
$E_{elec}$	50 (nJ/bit)
$ϵ_{fs}$	10 (pJ/bit/m²)
$ϵ_{mp}$	0.0013 (pJ/bit/m⁴)
$E_{p}$	5 (nJ/bit/signal)
α	0.8

5.2. Case Study for Uniform Distribution

We first investigate the optimization problem on the number and location of CHs in a sensor network within a square region of $200 \times 200$ (m²) under uniform distribution. Figure 1 plots the TEC optimization curve that is calculated by (7) in the network of $n = 200$ sensor nodes in response to the number k of CHs varying from 1 to 50.

Figure 1

Analytical calculation of TEC per round with $n = 200$ .

From (8) in Section 4.1, we obtain the optimal number of CHs to be $k = 6$ , which is consistent with the optimal one observed in Figure 1. The TEC increases as the number of CHs moves away from the optimal point. We further study a case with a larger WSN of $n = 900$ sensor nodes. Again, the optimal number 13 observed in Figure 2 is consistent with the one produced by (8). The unimodal property of the TEC optimization curves justifies the correctness of our derivation for the optimal number of CHs in WSNs under uniform node distribution.

Figure 2

Analytical calculation of TEC per round with $n = 900$ .

To evaluate the robustness of our solution, we perform more experiments on networks under uniform distribution with 10 different problem cases from small to large ones: in the first case, the total number of initial sensor nodes is set to be 10; in the rest 9 cases, it increases from 100 to 900 nodes at an interval of 100 nodes. The optimization results, that is, the optimal number of CHs, the expected distance from an LN to its associated CH, and the corresponding minimum TEC in each case, are plotted in Figure 3. We observe that the expected distance from an LN to its CH varies from 32 m to 116 m and it decreases when the optimal number of CHs increases.

Figure 3

Analytical calculation of minimum TEC per round in 10 cases ranging from small to large scales.

5.3. Case Study for General Distribution

To better illustrate the optimization process of DCC in WSNs under general node distribution, we calculate and plot the TEC per round under different cutoff distances at each optimization step for a network of $n = 200$ sensor nodes deployed in the same region, as shown in Figure 4. This three-dimensional optimization curve also features a unimodal property: the TEC is minimized with an optimal number of CHs at a certain cutoff distance.

Figure 4

TEC per round at each optimization step using DCC algorithm.

For performance comparison, we adapt the classical k-means clustering algorithm to our problem and compare its performance with that of the proposed DCC algorithm using the same set of 10 problem cases previously considered for uniform distribution. We first determine the optimal number k of CHs and the corresponding minimum TEC using the proposed DCC algorithm, and then use k-means algorithm to perform clustering based on the k value obtained by the DCC algorithm, referred to as deterministic k-means algorithm. For visual comparison, we plot the performance measurements of TEC produced by these two algorithms in Figure 5. The results produced by the DCC algorithm outperform those produced by the deterministic k-means algorithm in all 10 cases we studied, which shows the performance superiority of the proposed DCC algorithm.

Figure 5

Performance comparison between DCC and deterministic k-means algorithms in 10 cases ranging from small to large scales.

For illustration purposes, we lay out the node distribution and clustering of the sensor network with 100 sensor nodes in Figure 6, in which the unclustered solid node denotes the BS. The clustering results obtained by the DCC algorithm are marked by the solid lines, while the results obtained by the deterministic k-means algorithm are marked by the dashed lines. The DCC algorithm produces 20 clusters in this instance.

Figure 6

Visualization of the clustering in a network of 100 nodes using DCC and deterministic k-means algorithms: the results of DCC are marked by solid lines and the results of deterministic k-means are marked by dashed lines.

We further compare DCC with a clustering method completely based on the classical k-means algorithm, in which we iteratively use k-means algorithm to cluster n sensor nodes for n times by varying the parameter k from 1 to n, and select the clustering result with the minimum energy cost. We refer to this clustering method as adaptive k-means algorithm. Two WSNs of problem sizes of 200 and 900 sensor nodes, respectively, are tested using adaptive k-means algorithm, and their partial optimization curves are plotted in Figures 7 and 8. The rest of the TEC values continuously increases as k increases.

Figure 7

TEC per round for each k value using adaptive k-means algorithm.

Figure 8

TEC per round for each k value using adaptive k-means algorithm.

We observe some small random variations on the performance curves produced by the adaptive k-means algorithm. These variations are mainly caused by the following two factors: (i) the k-means algorithm may be trapped in local optimal solutions by only considering the distance from LNs to CHs, while the TEC has to consider the distances from LNs to CHs and from CHs to the BS; (ii) as widely evidenced, the quality of the final solution produced by the k-means algorithm also depends on the initially (randomly) selected set of CHs. We apply the adaptive k-means algorithm to all 10 cases of different problem sizes, and plot the performance measurements in terms of minimum TEC and optimal number of CHs produced by DCC, adaptive k-means, and deterministic k-means algorithms in Figure 9. We observe that DCC achieves the best performance among the three algorithms in comparison and the adaptive k-means algorithm outperforms the deterministic k-means algorithm.

Figure 9

Performance comparison among DCC, adaptive k-means and deterministic k-means, algorithms in 10 cases ranging from small to large scales.

Using the same network instance of 100 sensor nodes, we visually compare the clustering results obtained by the DCC algorithm and the adaptive k-means algorithm, as shown in Figure 10. The 20 clusters marked by the solid lines are obtained by the DCC algorithm (the same DCC clustering in Figure 6), and the 4 clusters marked by dashed lines are obtained by the adaptive k-means algorithm. In Figure 10, we observe that the DCC algorithm achieves a more reasonable clustering result than the k-means-based algorithms in terms of local sensor density.

Figure 10

Visualization of the clustering in a network of 100 nodes using DCC and adaptive k-means algorithms: the results of DCC are marked by solid lines and the results of adaptive k-means are marked by dashed lines.

6. Conclusion and Future Work

We investigated the problem of selecting a subset of sensor nodes as CHs in WSNs to achieve minimum TEC. We considered two energy dissipation models, free space model for intracluster communication and multipath model for intercluster communication. We derived an analytical formula to determine the optimal number and location of CHs in WSNs under uniform distribution and proposed a heuristic clustering algorithm based on distance and crowdedness to optimize the number and location of CHs in WSNs under general node distribution. The simulation results illustrated the performance superiority of the proposed solution in comparison with the clustering schemes based on classical k-means algorithm.

In the present work, we only considered one-hop communications for both intracluster and intercluster data communication under uniform and general node distributions. We will consider multihop routing methods and the balance of sensor nodes in each cluster in our future work. It would also be of our future interest to derive an analytical performance bound of energy cost for the clustering of WSNs under complex and general distributions.

Footnotes

Acknowledgments

This research is sponsored by National Science Foundation under Grant no. CNS-0721980 and Oak Ridge National Laboratory, US Department of Energy, under contract no. PO 4000056349 with University of Memphis.

References

Hyder

Shahbazian

Waltz

Multisensor Fusion 2002

Boston, Mass, USA

Kluwer Academic

Jayasimha

D. N.

Iyengar

S. S.

Kashyap

R. L.

Information integration and synchronization in distributed sensor networks

IEEE Transactions on Systems, Man and Cybernetics 1991 21 5 1032 1043

2-s2.0-0026226210

10.1109/21.120056

Heinzelman

Chandrakasan

Balakrishnan

Energy-efficient communication protocol for wireless microsensor networks

Proceedings of the 33rd Annual Hawaii International Conference on System Sciences

2000

3005 3014

Oyman

E. I.

Ersoy

Multiple sink network design problem in large scale wireless sensor networks

Proceedings of the IEEE International Conference on Communications

June 2004

3663 3667

2-s2.0-4143134426

Savvides

Han

C. C.

Strivastava

M. B.

Dynamic fine-grained localization in ad-hoc networks of sensors

Proceedings of the 7th ACM Annual International Conference on Mobile Computing and Networking (MOBICOM ′01)

July 2001

166 179

2-s2.0-0034775930

Kim

Lee

Son

Estimation of the optimal number of cluster-heads in sensor network

Knowledge-Based Intelligent Information and Engineering Systems 2005 87 94

Wang

L. C.

Liu

C. M.

Wang

C. W.

Optimizing the number of clusters in a wireless sensor network using cross-layer analysis

Proceedings of the IEEE International Conference on Mobile Ad-Hoc and Sensor Systems

October 2004

585 587

2-s2.0-20344377192

Yang

Sikdar

Optimal cluster head selection in the LEACH architecture

Proceedings of the IEEE International Performance, Computing, and Communications Conference

April 2007

New Orleans, La, USA

93 100

10.1109/PCCC.2007.358883

Cassandras

Optimal cluster-head deployment in wireless sensor networks with redundant link requirements

Proceedings of the 2nd International Conference on Performance Evaluation Methodologies and Tools

2007

1 9

10.

Lloyd

Xue

Relay node placement in wireless sensor networks

IEEE Transactions on Computers 2007 56 1 134 138

11.

Pan

Cai

Hou

Shi

Shen

Optimal basestationlocations in two-tiered wireless sensor networks

IEEE Transactions on Mobile Computing 2005 4 5 458 473

12.

Das

G. K.

Das

Nandy

S. C.

Sinha

B. P.

Efficient algorithm for placing a given number of base stations to cover a convex region

Journal of Parallel and Distributed Computing 2006 66 11 1353 1358

2-s2.0-33749517141

10.1016/j.jpdc.2006.05.004

13.

Kim

Seok

Choi

Kwon

Optimal multi-sink positioning and energy-efficient routing in wireless sensor networks

Proceedings of the International Conference on Information Networking (ICOIN ′05)

January 2005

264 274

2-s2.0-24144457526

14.

Heinzelman

Application-specific protocol architectures for wireless networks, Ph.D. thesis June 2000

Massachusetts Institute of Technology

15.

Guha

Khuller

Approximation algorithms for connected dominating sets

Algorithmica 1998 20 4 374 387

16.

Vincze

Vida

Vidács

Deploying multiple sinks in multi-hop wireless sensor networks

Proceedings of the IEEE International Conference on Pervasive Services (ICPS ′07)

July 2007

55 63

2-s2.0-52249106441

10.1109/PERSER.2007.4283889

17.

Srinivas

Zussman

Modiano

Mobile backbone networks—construction and maintenance

Proceedings of the International Symposium on Mobile Ad Hoc Networking and Computing (MobiHoc ′06)

2006

Florence, Italy

166 177

18.

Tao

Zheng

The combination of the optimal number of cluster-heads and energy adaptive cluster-head selection algorithm in wireless sensor networks

Proceedings of the International Conference on Wireless Communications, Networking and Mobile Computing (WiCOM ′06)

September 2006

1 4

10.1109/WiCOM.2006.289

19.

Selvakennedy

Sinnappan

An energy-efficient clustering algorithm for multihop data gathering in wireless sensor networks

Journal of Computers 2006 1 1 40 47

20.

Xiao

Energy efficient chessboard clustering and routing in heterogeneous sensor network

International Journal of Wireless and Mobile Computing 2006 1 2 121 130

21.

Mao

Coordinated sensor deployment for improving secure communications and sensing coverage

Proceedings of the 3rd ACM Workshop on Security of Ad Hoc andSensor Networks

2005

Alexandria, Va, USA

117 128

10.1145/1102219.1102239

22.

Ergen

S. C.

Varaiya

Optimal placement of relay nodes for energy efficiency in sensor networks

Proceedings of the IEEE International Conference on Communications (ICC ′06)

July 2006

3473 3479

2-s2.0-42549166632

10.1109/ICC.2006.255610

23.

Dasgupta

Kukreja

Kalpakis

Topology-aware placement and role assignment for energy-efficient information gathering in sensor networks

Proceedings of the 8th IEEE International Symposium on Computers and Communications

2003

341 348

24.

Bandyopadhyay

Coyle

E. J.

An energy efficient hierarchical clustering algorithm for wireless sensor networks

Proceedings of the 22nd Annual Joint Conference on the IEEE Computer and Communications Societies (INFOCOM ′03)

April 2003

1713 1723

2-s2.0-0041472588

25.

Puccinelli

Haenggi

Multipath fading in wireless sensor networks: measurements and interpretation

2006

Proceedings of the International Wireless Communications and Mobile Computing Conference (IWCMC ′06)

2006

Vancouver, Canada

1039 1044

10.1145/1143549.1143757

26.

Rappaport

Wireless Communications: Principles and Practice 2002

Upper Saddle River, NJ, USA

Prentice Hall

27.

Abbasi

A. A.

Younis

A survey on clustering algorithms for wireless sensor networks

Computer Communications 2007 30 14-15 2826 2841

2-s2.0-34548850872

10.1016/j.comcom.2007.05.024