Abstract
In this paper, we utilize clustering to organize wireless sensors into an energy-efficient hierarchy. We propose a Medium-contention based Energy-efficient DIstributed Clustering (MEDIC) scheme, through which sensors self-organize themselves into energy-efficient clusters by bidding for cluster headship. This scheme is based on a new criterion that can be used by each sensor node to make a distributed decision on whether electing to be a cluster head or a non-head member, which is a fully distributed approach. Although MEDIC uses only local information, it achieves better performance in terms of effective lifetime and its Data/Energy Ratio is 25% higher than native LEACH (Low-Energy Adaptive Clustering Hierarchy), which relies on other routing algorithms to access global information. A complementary exponential data correlation model is also introduced to simulate different data aggregation effect.
Introduction
A wireless sensor network (WSN) can be thought of as an ad hoc network consisting of sensors linked by a wireless medium to perform distributed sensing tasks. Thanks to recent advances in semiconductor technology, the sensors nodes can be made small in size and cheap for mass production. Due to their low-power antenna and limited energy reserves in the form of chemical battery, the sensor nodes can communicate with each other in a short range and work for a relatively short duration.
WSNs share many communication technologies with ad hoc networks, but there are some vital differences such as dense deployment and energy constraint [1], thus the protocols developed for traditional wireless ad hoc networks are not necessarily well-suited to the unique features of WSNs. When a wireless sensor may have to operate for a relatively long duration on a tiny battery, energy efficiency becomes a major concern.
A variety of “power-aware” routing protocols have been proposed to address this problem. In one school of thought [2–4], the traditional Shortest Path First strategy is replaced by the Least Energy First routing, i.e., a multihop route is preferred to a single-hop one if only the multiple short-distance relays cost less energy than a single long-distance transmission. For example, “Minimum Transmission Energy” (MTE) routing [3], [4] was proposed in place of traditional “minimum hops routing”. Another school of thought is that nodes are clustered so that a hierarchy is formed [5–7]. Based on the observations on cellular networks [8], it would be advisable to partition the nodes into clusters for reasons such as spatial reuse, less update cost, less routing information, and less data transmission.
Another dispute in clustering research is whether a cluster head can be elected within each cluster. Some researchers [6], [7], [9] argue that it is unreasonable to have a cluster head because every node has a similar energy constraint and the cluster head will consume energy much faster. Their methodology breaks the information exchange is achieved by demand-based operations. This approach does have some advantage when the traffic is mostly within the cluster; however, when the major traffic in WSN is directed from the sensor nodes to the base station, i.e., of the intercluster type, the headless structure suffers from cumbersome intercluster information exchange.
On the other hand, the extra burden of the cluster head can be mitigated by rotating the headship among the members. The rotation can also take advantage of the relaxation effect [10], which indicates frequently reducing the current drawn from the battery enables the battery to recover a portion of its lost capacity and hence lengthens the battery lifetime. In addition, the cluster head can perform data fusion and reduce the data sent back to the base station. LEACH (Low-Energy Adaptive Clustering Hierarchy) [11], an example of the latter, school, can extend network lifetime by an order of magnitude compared with general-purpose multihop approaches. In conclusion, the characteristics of WSN prefer a hierarchical structure with cluster heads.
This paper builds on the work described in [12] by giving a detailed description and analysis of Medium-contention based Energy-efficient Distributed Clustering (MEDIC). The rest of this paper is organized as follows. Section II reviews LEACH and wireless medium access. Section IV-A introduces the data correlation model that our research is based on. In Section IV, we make data-centric analysis of energy consumption in WSN and propose a new criterion that MEDIC bases the self-electing decision on. MEDIC is described in Section V and the simulations are given in Section VI. Section VII concludes this paper.
Related Work
In this section, we provide some preliminaries needed for further discussion.
Leach
LEACH uses the CDMA-TDMA hybrid communication scheme. Each cluster has its own Spread Spectrum code so that the interference between clusters is minimized. For intracluster communications, TDMA slots are assigned for each member to minimize media contention. The operation of LEACH is divided into rounds. At the beginning of each round, cluster heads are elected and other nodes join them as members so that N nodes are partitioned into c clusters. When a cluster is formed, the cluster head creates and broadcasts a time schedule to its members. As shown in Fig. 1, each member is assigned a time slot per frame to send its data to its cluster head, and then the cluster head performs data aggregation and sends the resulting data back to the base station. Compared with multihop routing schemes, LEACH shows an outstanding energy efficiency, which is referred to as clustering energy gain in the following.

A generic MAC.
However, the cluster formation in LEACH is based on global information. To access such information, other routing schemes are required. In this sense, LEACH is only a semi-distributed protocol for WSN. Another problem with LEACH is the random head election that cannot guarantee that the desired number of cluster heads be elected or the elected heads be evenly positioned. In this paper, we are concerned about optimizing the cluster formation using only local information. The intuition behind the proposed Medium-contention based Energy-efficient Distributed Clustering (MEDIC) is using medium contention to keep the cluster size within an ideal range. By exploiting medium contention to simulate an auction for the cluster headship, we also eliminate the need for location information.
The problems of LEACH are also addressed by other researchers. HEED [13] corrects the problem that none or not enough cluster heads are elected by doubling the probability of electing to be a cluster head for those nodes that are not covered by any cluster heads. In [14], Qin and Zimmermann proposed a voting-based clustering algorithm (VCA) for energy-efficient data dissemination in quasi-stationary sensor networks. Their approach lets sensors vote for their neighbors to elect suitable cluster heads. Connectivity based k-hop clustering [15] proposed to combine two known approaches into a single clustering algorithm which considers connectivity as a primary criterion and lower ID as a secondary criterion for selecting cluster heads. Their clustering goal is to minimize the number of clusters, which results in dominating sets of smaller sizes (this is important for applications in broadcasting and Bluetooth formation). Bandyopadhyay and Coyle in [16] extend clustering to multiple levels, generating a hierarchy of clusterheads and observed that the energy savings increase with the number of levels in the hierarchy. They also used stochastic geometry to derive solutions for the values of parameters of our algorithm that minimize the total energy spent in the network when all sensors report data through the cluster heads to the processing center. Weighted Clustering Algorithm (WCA) [17] took into consideration the ideal degree, the transmission power, the mobility, and the battery power of a mobile node and tried to keep the number of nodes in a cluster around a pre-defined threshould to facilitate the optimal operation of the medium access control (MAC) protocol, The on-demand execution of WCA aimed to maintain the stability of the network, thus lowering the computation and communication costs associated with it. In [18], Two-Phase Clustering (TPC) partitions the network into clusters in phase I, each with a cluster head, forming a direct link between the cluster member and the cluster head. In phase II, each cluster member searches for a neighbor closer than the cluster head within the cluster to set up an energy-saving data relay link. The sensors use either the direct link or the data relay link for their sensed data forwarding depending on the requirements specified by the users or applications. Other work on clustering also tries to distribute the clustering formation [19], [20]. Our approach is independent and orthogonal to these advances in clustering and thus may be used to further improve the energy efficiency.
In this paper, we consider a generic MAC that is compatible with the basic access mechanism described in 802.11 DCF [21]. As illustrated in Fig. 1, after the channel is sensed idle for greater than or equal to a DIFS (Distributed InterFrame Spacing) period, the transmitting node generates a random backoff timer chosen uniformly from the range [0, w-1], where w is the size of the contention window. In a binary exponential backoff scheme, the value of w is reset to CW min after each successful transmission, and doubled after each unsuccessful transmission, up to CW max (maximum contention window). The backoff timer is decremented as long as the channel is sensed idle, stopped when a transmission is detected, and reactivated when the channel is sensed idled again for a DIFS period. After the backoff timer reaches zero, the node starts transmission.
The collisions of packets in the contention-based MAC generally degrade channel utilization and increase energy consumption, which motivates establishing transmission schedules to allow nodes to communicate without collisions. In NAMA (Node Activation Multiple Access) [22] and TRAMA (Traffic-Adaptive Medium Access protocol) [23], a distributed election scheme is used to determine which node can transmit at a particular time slot. From this point of view, LEACH is also a schedule-based scheme, in which cluster formation is a random-access period to establish a scheduled-access period collision-free (Fig. 3).
Auction Theory
Auctions are heavily studied as a market clearing mechanism to equate demand and supply [24]. In WSN, the cluster headship is a scarce resource, and hence, it would help to view the process of electing a cluster head among nodes in WSNs as an auction for the headship. Auctions can be classified according to several distinct criteria. For example, they can be classified into open and sealed-bid auctions, based on whether all bids are publicly observable. Open auctions can be further classified into ascending and descending price auction. The ascending auction, also called English auction, starts at a low price and bids have to be increasing. The ascending stops when no bidder is willing to increase his bid above the highest standing bid. The bidder with the highest bid wins the auction and pays the highest bid. The English auction is the best-known format of auctions, its drawback is that a bidder can take a dominating strategy to win the bid, that is as long as his bid is higher than the second highest bid, this bidder can win the bid, even when the highest bid is still much lower than the true value of the auctioned object. In comparison, the descending price auction, also known as Dutch auctions, starts at a high price that continuously decreases on an automated clock. The auction ends when one of the bidders stops the clock. This bidder wins the object and pays the price at which the clock stopped. The aforementioned dominating strategy cannot be used in Dutch auctions, therefore, the highest bid in the Dutch auction tends to reflect the true value of the resource. However, our interest in the Dutch auction is for another advantage—its time efficiency. Note that in the Dutch auction, the highest bidder always speaks first, and the auction ends right after its bid. On the other hand, the English auction could start with a lower bid, and when this does happen, it might take several rounds before a bidder dominates. Therefore, we design MEDIC based on the Dutch auction.
Radio Energy Consumption
The following model is adopted from [11] where perfect power control is assumed. To transmit l bits over distance d, the sender's radio expends
and the receiver's radio expends
E elec is the unit energy consumed by the electronics to process one bit of message, ε fs and ε mp are the amplifier factor for free-space and multi-path models, respectively, and d0 is the reference distance to determine which model to use. The values of these communication energy parameters are set as in Table 1.
Communication Energy Parameters
Random election has been widely used in algorithms for wireless sensor networks for its simplicity and convenience [11]. However, there are two fundamental drawbacks in random election:
where c is the desired number of clusters and Ci(t) is the indicator function determining whether or not node i has been a cluster head in the most recent (r mod (N/c)) rounds. The more general estimate of Pi(t) is given by
where Ei(t) is the current energy (i.e. remaining battery capacity) of node i and
Essentially, N in (3) and Etotal in (4) are global information, which is only accessible via other routing schemes. Then the probability of “n heads are elected” is
The distribution of the number of elected heads is listed in Table 2. Obviously, too few (Fig. 2(c)) and too many (Fig. 2(d)) elected heads would damage the energy efficiency.

100 nodes elect 5 heads (Heads marked by pentagrams). (a) Five heads are elected and evenly distributed. (b) Five heads are elected and clump in the right semicircle. (c) Only one head is elected. (d) Nine heads are elected.

Time line showing LEACH's frame structure.
Outcome of 100 Nodes Electing 5 Heads
Moreover, in the case of “no elected head” whose probability is listed in row 1, all the nodes have to communicate directly with the base station, in which case all the clustering energy gain is lost. When the number of elected heads is too few, for example, only one head is elected, the head may be exhausted by the tremendous data sent to it. In such cases, the energy efficiency is tremendously compromised.
Another problem introduced by the random head selection is that the sensor locations are not taken into consideration. Obviously, the even layout of heads would favor energy efficiency (Fig. 2(a)). When heads are randomly selected as in LEACH, elected heads sometimes clump together as shown in Fig. 2(b), which leads to unnecessary energy waste.
In this section, we first introduce a data correlation model, and then based on this model, make data-centric analysis of energy consumption in WSN. A new criterion is proposed for clustering, which is the theoretical basis of Medium-contention based Energy-efficient Distributed Clustering (MEDIC). For easy reference, the variables used in the following discussion are listed in Table 3.
Data Correlation Model
The data collected by neighboring sensors have a lot of redundancy, thus, [11] assumes a perfect data correlation that all individual signals from members of the same cluster can be combined into a single representative signal. Nevertheless, this assumption cannot hold when the cluster size increases to some extent. Therefore, we develop a complementary exponential data correlation model based on the observations in distributed data compression [25], [26].
Considering the phenomenon of interest as a random process, the correlation between data collected by two sensors is generally a decreasing function of the distance r between them. After the data aggregation removes most of the redundancy, the residue can be assumed to be an increasing function of r. Based on the above observation, the data aggregation effect is modeled as below.
Table of Variables
Table of Variables
Suppose there are M
k
non-head members in cluster k(k = 1,2,3, …, c), the ith member (i = 1,2,3, …, M
k
) collects l bits and sends them back to its head k at distance r
ki
, the head expends 2lE
DA
Joules on the data aggregation of the 2l bits (l bits collected by itself and another l bits by its ith member), where E
DA
is set as 5nJ/bit as in [11] and listed in Table 1. The resulting data is assumed of l(1 + η
ki
) bits, where η
ki
is data aggregation residue ratio and assumed to be complementary exponential, specifically,
where α is a small positive real number whose magnitude depends on specific phenomenon of interest. For example, the light, acoustic, seismic, and thermal signals often show a strong correlation at short distance, and thus, α will have smaller values for such data. Since η is a monotonically decreasing function of α and r, η approaches zero for smaller α and r. This model can approach the perfect-data-correlation assumpion in [3], [4] by increasing α, thus, different scenarios can easily be set up by varying α.
Clustering has been widely used in pattern recognition, and we use it to obtain the energy-efficient organization for WSN. From the data-centric view, the data collected by a node can be sent back directly to the base station or relayed by a cluster head. The first case occurs if this node is a cluster head; the data collected by head k is data-aggregated (with the data collected by its members) and sent back to the base station. Thus, the energy cost for each bit of data collected by head k is
where d k is the distance between head k and the base station.
For the second case, consider non-head member ki, the ith sensor in cluster k, with distance r
ki
to its cluster head, member ki sends its data to head k, and then head k performs data aggregation on the data and sends the resulting data to the base station. Thus, the energy cost for each bit of data collected by non-head member ki is
where η(r ki ) is the data aggregation residue ratio introduced in Section IV-A.
Considering all c clusters, the overall cost is
where M k is the number of non-head members in cluster k.
Taking E[J total ] the expected value of the overall energy cost as the objective function, the original problem is translated into an objective function clustering. If a central control scheme is possible, an iterative algorithm can be run at the base station to minimized E[J total ]. For example, Fuzzy c-Means is utilized in [27] to minimize a Euclidean-distance-based functional representing the energy cost in Wireless Personal Area Networks. However, since WSNs are working in ad hoc mode, the clustering decision must be distributed to each sensor node. Thus, our goal is using only local information to achieve energy-efficient clustering.
When a node finds no cluster head around, it should naturally elect to be a cluster head. However, if a node is close to a cluster head, there could be some energy gain or loss if it joins that cluster. In the following, we will discuss how a node decided to be a cluster head when it finds a cluster head in its communication range. The energy gain diminishes when the distance between the non-head member and the head increases. Consequently, the energy gain approaches zero at some critical distance, termed as influence range. To determine the influence range, consider a node i with a head k at distance r. The node could choose to be a non-head member or a head, which would consequently cost J
CM
or J
CH
as in (10) and (9). Naturally, the decision should be based on the comparison of J
CM
and J
CH
as
i.e., the decision rule for each sensor is:
We call this criterion as the local energy efficiency criterion, because it is based on only the local information. Substituting (9) and (10) into (12), we obtain
The influence range can be obtained by equating two sides though a closed form may be unavailable. Obviously, the cluster radius R
c
has to be much smaller than the influence range because the energy gain is so low at the outer ring that a new cluster can be formed. Although this criterion is too complex to be used in real applications, it promotes using R
c
instead of c as the clustering objective parameter to guide the election. Denote the areas occupied by the whole WSN and the cluster by S
N
and S
c
respectively,
Assume S
N
and S
c
are both circular, R
c
is related to c by
where R is the radius of S N . Although it is mathematically equivalent to partition nodes into c clusters or to organize nodes into clusters with radius R c , the former is definitely a global approach, which leads to dependence on the global information. Thus, the latter is more suitable for a distributed algorithm.
As indicated in (16), it is equivalent to determine c or R c . Here, we try to analytically determine the optimal value of c using the introduced models. LEACH can only determine a rough range C opt ε [1, 6] for a similar 100-node network [11], while our analysis predicts the optimal value of R c in simulation with satisfying accuracy.
The typical scenario is that N nodes are distributed uniformly in a circular region with radius R. There are c clusters with one cluster head and n-1 non-head members within each cluster. n is the average number of cluster members and related to c by
Based on (11), the average total energy cost can be approximated by
where
Since all nodes are independently deployed, r and d are independent, thus, (20) can be written as
We estimate the expected values in (19) and (21) as follows. Assuming the cluster head is at the center of mass of the cluster,
Where ρ
c
(r,θ) is the node distribution density. Since the nodes are assumed to be uniformly distributed, ρ
c
(r,ρ) is a constant given by
Substituting (23) and (16) into (22),
Similarly,
Since E[d4] is a function of R and irrelevant to c, we keep it in the further derivation.
The optimal value of c can be obtained by setting
where
Since it is impossible to solve (28) algebraically, we turn to the numerical solution. For example, the base station is located at (r
BS
, θ
BS
) = (125, 0) and N − 100, R = 50m in our experiments, we can evaluate (27) as
In Fig. 4(a)(b), we plot

Plot of
Note that
We are also interested in the relation of N to C opt . In Fig. 5(a)(b), we plot c opt over N for α = 0.001 and α = 0.05 respectively. These figures show that C opt is an increasing function of N, which indicates that the clustering objective parameter (c or R c ) should be adjusted adaptively if N varies.

Plot of c opt vs. N for R = 50m and (r BS , θ BS ) = (125,0). (a) α = 0.001. (b) α = 0.05.
Medium-contention based Energy-efficient Distributed Clustering (MEDIC) is designed to replace the cluster formation occurring at the beginning of each round in LEACH. We design MEDIC based on the Dutch auction for its time efficiency. Note that in the Dutch auction, the highest bidder always speaks first, and the auction ends right after its bid. On the other hand, the English auction could start with a lower bid, and when this does happen, it might take several rounds before a bidder dominates.
In MEDIC, there is no global broadcast. As shown in Fig. 6, each node first broadcasts its vital information at the maximum radio power level so that the knowledge is spread as widely as possible. Such “maximum-power” broadcasts are not frequent in MEDIC, which helps saving energy. The vital information may include nodes' energy, location, etc., though only the energy information is needed by MEDIC. Then, each node counts its neighbors and broadcasts the number of its neighbors at an adjusted power level corresponding to the cluster radius R c . The cluster radius R c is an important system parameter for energy efficiency. As shown in Section IV-D, given a specific type of application, R c is mainly determined by the node density. In MEDIC, each node should choose an appropriate R c according to the neighbor count in its transmission range, which is a good estimator of the local density.

Flow chart of a node in MEDIC.
If a node's headship potential qualifies as a head compared to its neighbors', it will try to claim the headship by broadcasting locally, which can be viewed as placing a bid for the headship. A node's “neighbors” are defined as the nearby nodes within distance R c from that node. Due to the possible contention for the headship, such bids could fail, which is indicated by the collision of “headship claims.” Using the modified MAC described below, the bidders will contend with each other until a node with satisfactory potential wins. By doing so, the head-to-be expels other possible heads in its neighborhood, and in consequence, the clusters with desired size are formed.
The headship potential is an important parameter, which replaces the self-electing probability used in native LEACH. As discussed in [11], the node's energy is important to determine its potential because the headship can be rotated among nodes by assigning more potential to the nodes with higher energy. In addition, we propose taking the number of neighbors into consideration, because the energy gain is prominent only in the neighborhood of the head as shown in Section IV-C and thus it is energy-efficient to let the node with more neighbors win the headship.
Based on these considerations, the qualification conditions are set as below. For any node, let N denote the set of its neighbors, E(i) and B(i) be the energy and the number of neighbors of the ith neighbor respectively, i ε N. The thresholds are set as the linear combination of the maximum and mean value of corresponding parameters as in (35) and (36) so that the thresholds are adapted to the current distribution of parameters and take values between the maximum and mean.
The conditions can be relaxed by decreasing γ1, γ2, where γ1, γ2 ε [0, 1]. Since there is no closed-form objective function, it is difficult to determine optimal γ1, γ2 analytically. Fortunately, our experiments show that the performance is not sensitive to the setting of γ1, γ2. Thus, we simply choose a smaller value for γ1 and a larger value for γ2 as γ1 = 0, γ2 = 0.8, because we want to emphasize the position condition in order to achieve energy efficiency and relax the energy condition in order to accept more nodes into the headship auction.
Depending on their conditions, the nodes classify themselves into three categories shown in Fig. 7. Each node can classify itself into one of the three categories by comparing itself to its neighbors. The nodes meeting both conditions consider themselves in Category-A, meeting one of the conditions indicates Category-B, and Category-C for the rest which fail both conditions. Note that DIFS B = DIFS A + CW min in Fig. 8, Category-B bidders have to wait CW min longer than Category-A to ascertain there are no Category-A bidders in their neighborhoods. That is, no matter which slot Category-A nodes randomly choose to send out the claim, there is no chance for a Category-B node to access the channel if there is one Category-A node in its neighborhood. The extreme case that no heads are elected is avoided by permitting Category-B bidders into the headship auction because it is impossible that there are only Category-C nodes in the neighborhood.

Categories of bidders.

The Medium Access Control used in MEDIC.
Once a node successfully sends out the “headship claim,” its neighbors must join it by sending “Request to join.” Since these requests can be eavesdropped by their neighbors, their neighbors can correspondingly correct their numbers of unclustered neighbors. If a node finds all its neighbors are clustered, it can elect to be a cluster head by sending out a “headship claim.” Those nodes outside the neighborhood of existing cluster heads cannot join any clusters. When the public channel is idle again, which indicates there is no node in its neighborhood trying to join existing clusters, another round of auction will begin until all nodes are clustered.
In this section, we compare the performance of MEDIC and LEACH using Matlab simulations. 100 nodes, each with 2J initial energy, were evenly distributed in a circular region with diameter 100m, and the base station was located at (125m, 0). Because currently available sensor nodes have no problem covering such an area, we assume every node can communicate directly to one another and the base station. This is because we want to find the optimal cluster size without constraint on the communication range. We ran LEACH and MEDIC over 1000 random network topologies for each c or R c and took the average of the collected data. In each simulation, nodes were redeployed uniform randomly in the area and ran until more than 95% nodes were dead. Since the steady state is much longer than the cluster formation phase, the time and energy used in the cluster formation is generally two orders of magnitude less than in the steady state, whose effect is safe to neglect.
MEDIC vs. LEACH
In this case, MEDIC is compared to native LEACH for nearly perfect data correlation (α = 0.001). For such small α, the amount of residue data is negligible, therefore, this setting is equivalent to the perfect data compression assumption used in [11]. Fig 9(a) and (b) plot the received data in the base station over time and the energy dissipated, respectively, while (o) and (d) plot the number of survival nodes whose batteries are not completely exhausted over time and per given amount of received data in the base station, respectively. To find the best c for LEACH, we compared the maximum transportation, the total data delivered back to the base station during a simulation, for different c. The simulations show that the maximum transportation is maximized at c ≈ 7 for LEACH in this case. Similarly, we found the performance of MEDIC is best at R c ≈ 40m (See Fig. 12 and Fig. 13).

MEDIC vs. LEACH. (a) Amount of data received at the base station over time. (b) Amount of data received at the base station per given amount of energy. (c) Number of survival nodes over time. (d) Number of survival nodes per amount of data received in the base station.
At first glance, LEACH seems to have a longer lifetime than MEDIC as shown in Fig. 9(a). However, a further study of Fig. 10 reveals that LEACH cannot guarantee the data delivery during the later phase. Figure 10 clearly shows that the throughput of LEACH starts to degrade shortly and continues to decrease to virtually zero. The reason is that the ill result of random election (e.g. too few heads are elected) often puts a tremendous burden on the heads whose energy is already low during the later phase. After the heads are exhausted quickly, the cluster members remain idle during the rest of that round, which seems to extend the lifetime. Therefore, we define the effective lifetime as when the data loss remains below 10 percent. Figure 10 shows that MEDIC extends the effective lifetime by about 3200s.

MEDIC vs. LEACH. Node-to-basestation throughput over time.
Another good measurement of energy efficiency is the ratio of data transporation over energy consumption, termed as Data/Energy Ratio (DER), which is indicated by the slope in Fig. 9(b). A higher slope implies that the corresponding scheme can transport more data with the given amount of energy dissipation. Figure 9(b) shows MEDIC increased DER by about 25%. The energy efficiency improvement is achieved by successfully electing the required number of cluster heads and evenly distributing them among the whole sensor networks. In MEDIC, once a node successfully wins the bid for cluster headship, it also eliminates its neighbor's possibility to be a cluster head. And the nodes outside its influence range still have a chance to be a cluster head. In this way, MEDIC avoids electing too many or too few cluster heads in the simulation, and the elected cluster heads are evenly laid out. The mean and std of cluster heads are plotted in Fig. 11. The higher variation in the head election of LEACH indicates the clustering in the LEACH led to a worse result, which is the main reason for its damaged energy efficency.

The elected number of heads (Average and std).

Data of LEACH.

Data of MEDIC at α = 0.001.
The analysis in section IV-D indicates that R c should be adapted to the decreasing node density. Since the death of nodes decreases the node density, we expect the optimal R c to decrease accordingly. However, since MEDIC remarkably extends the effective lifetime, the number of survival nodes does not decrease visibly during most of the network lifetime. Therefore, we need not adapt R c to keep energy efficiency.
In this case, MEDIC is evaluated for different α. We ran 1000 simulations at different R c with α = 0.05 to determine optimal R c . Figure 14 shows that the performance of MEDIC is optimal at around R c − 10m, which is far from R c = 40m when α = 0.001. The reason that the smaller clusters are formed is that the influence range shrinks when the data correlation decreases. These values of R c agree well with the analysis in section IV-D for both values of α. This shows the advantage of MEDIC over the original LEACH; the inherent distributed nature of MEDIC makes it more adaptive to the different applications. This also shows the advantage of our data correlation model; we can easily fit the simulation scenarios for the phenomena of interest by varying α.

Data of MEDIC at α = 0.05.
The previous clustering researches often take a global approach, which is appropriate for global optimization. However, when a distributed clustering is desired, the already-answered questions such as “How many clusters should the nodes be partitioned into?” have to be translated into a distributed version, that is, “What's the appropriate cluster size?”, because it is easier for a node to know its cluster size than the number of clusters in the whole network. In this paper, we take a fully distributed approach to energy efficiency for WSN. Motivated by the local energy efficiency criterion, we propose using the cluster size instead of the number of clusters as the clustering objective parameter in clustering. Furthermore, we utilize the medium contention to implement the headship auction to keep the cluster size within an ideal range. As shown by the simulations, although the proposed MEDIC uses only local information, it achieves better energy efficiency than native LEACH in terms of Data/Energy Ratio and an effective lifetime. The simulations also show that the optimal cluster radius obtained from the experiments agrees well with the analysis of optimal clustering, which indicates that the performance of our distributed clustering is close to that of the global optimal one.
Footnotes
Acknowledgment
This work was supported by the U.S. Office of Naval Research (ONR) Young Investigator Program Award under Grant N00014-03-1-0466.
The second author would like to thank his Research Assistant Mr. Xiang Hong for helping with part of the simulations.
