Abstract
In view of the unbalanced energy consumption of traditional cluster-based sensor networks, this paper proposes a type of uneven clustering protocol in data collection. The network is divided into inner and outer regions according to the distance between nodes and the base station. The inner region is consisted of several layers and nodes in outer region are deployed in grids of different sizes. Sensor data is collected by nodes in outer region and then is be transmitted to the inner region. Nodes in the inner region do data fusion and forward data from the lower layer to the higher one. Simulation results show that, compared with MTP and CDFUD, the proposed algorithm performs well in balance of energy consumption and could effectively prolong the network lifetime.
1. Introduction
Due to the dense deployment of sensor networks, sensory data tended to have larger redundancy in time and space which not only increase the cost of transmission and processing but also affect the expansion of its application [1, 2]. In the age of big data and ubiquitous network, it has been one of the key technologies to further improve the execution efficiency as well as the quality of sensory data with the help of data fusion and data aggregation in wireless sensor networks (WSNs for short).
On the other hand, energy consumption problem is also one of the hotspots in WSNs. How to achieve efficient energy management and optimization on the sensor nodes with weak computation and communication capabilities as well as limited power reserve is also a key problem which could greatly improve the network performance.
For most of the existing data fusion algorithms, they always focus on the optimization of energy consumption in communication between nodes but ignore the cost of energy consumption in data fusion in the dense deployment wireless sensor networks [3]. In addition, the large amount and various types of sensor data have greatly increase the overhead of calculating and processing of the node. Therefore, calculating become another source of energy consumption [4].
Based on the regional cluster network structure, this paper analyzes the energy consumption during data transmission and fusion process and put forward a type of energy-balanced data fusion method which is efficiently decreasing the energy consumption of the whole network.
The rest of the paper is organized as follows. The related works and the network model are described in Sections 2 and 3, respectively. In Section 4, we analyse the energy consumption of nodes in inner region and outer region. Experimental results are shown in the fifth section and the conclusion is provided in the last section.
2. Related Works
There are two types of data gathering methods in WSNs: the routing-driven method and the fusion-driven one [5]. In routing-driven method, sensor data is transmitted hop by hop through the shortest path and data from different sources will be fused. However, the fusion process depends only on the data dependency in fusion-driven method and the efficiency of data gathering of this method has nothing to do with node deployment [6]. In this paper, a type of routing-driven data gathering method based on uneven clustering and regional division is proposed.
LEACH [7] is the first data gathering architecture for wireless sensor networks that achieves low energy dissipation and latency without sacrificing application-specific quality [8]. It uses a clustering architecture where each node in the cluster sends its data to a local cluster head. However, this mode is responsible for collecting data from all sensors in the cluster and sends them to the receiving end without data fusion, which not only reduces the bandwidth utilization but also increases energy consumption.
PEGASIS is an enhancement over LEACH [9]. Nodes are organized to form a chain, so that they need to communicate only with their closest neighbors. PEGASIS avoids cluster formation and only uses one node in a chain to transmit to the sink node instead of using multiple nodes. It reduces the energy consumption on transmission per round as the power draining is spread uniformly over all nodes. Nevertheless, PEGASIS presents a big delay for the most distant node in the chain, even if the clustering overhead is avoided.
Another kind of typical energy aware data gathering protocol is MTP (Multi-Tier Trace-Back Protocol) [10]. It selects one node with the most sufficient energy as well as the shortest distance to the base station (BS for short) as the relay node which could communicate with the BS directly. Other nodes could only transmit and fuse their data to the parent node in the upper layer. If the cluster head in MTP is far away from the BS, it will consume more energy during data gathering, fusing, and transmitting. On the other hand, according to MTP, nodes in the upper layer will also forward its data to their parent node even if the parent node is just in the lower layer. It not only increases the transmission cost but also reduces the reliability of the whole network.
Yue et al. proposed an uneven clustering algorithm based on grid, namely, CDFUD [11, 12] (Clustering Data Fusion Algorithm Based on Uneven Division). Sizes of the grids are different from each other according to their distance to the BS and the node with the maximal residual energy in each grid is chosen as the cluster head. Similar to LEACH, the cluster head in each gird transmit the data of its members as well as itself to BS with one hop. The main disadvantage of CDFUD is that the cluster head consumes energy rapidly which causes unbalance energy consumption in the network.
K-Means [13] Data Relay Clustering algorithm is developed to group the sensor nodes for energy efficient data communication. K arbitrary points are picked as the cluster head by the sink node. Cluster members are obtained for each CH based on distance metric. Then, in each CH, data aggregation process is done for limiting the energy spent in transmission. Although it reduces the communication overhead, it also reduces the number of nodes transmitting data to sink node and unable to collect enough information which we need.
TMMDF [14] eliminates negligent errors by Dixon method to obtain a valid observation data and uses the triangle module operator to assign weights of each sensor data. Finally, it gets the fusion estimate values. Although data delay of this method is greatly increased, it is not an efficient method for large-scale data processing.
Based on the study above, an energy-balanced uneven clustering protocol (EUCP) is proposed in this paper. According to the distance between nodes and BS, it divides regions into clusters with different sizes. And the size of each cluster as well as the number of nodes is related to the energy consumption of each grid. Simulation results show that EUCP could effectively prolong the network lifetime and balance the whole network energy consumption.
3. Method Description
3.1. Network Model
As we know, in two-dimensional space, the communication and sensing area of one node is a circular whose center is the node itself [15]. Therefore, in EUCP, we assume that the shape of the network is also a circular. Moreover, it has been proved that clustered structure is suitable for data fusion in WSNs [16], so it is still be adopted in EUCP. However, in cluster-based WSNs, nodes which are far from BS undertake minor data fusion mission (the nodes on boundary of the network even do not need to fuse data), while nodes which are close to BS often receive and process more data. So, in EUCP, we adopt the hierarchical structure to design the whole network for data transmission and fusion.
The attenuation model of wireless signal can be divided into free-space model and multipath fading model [17], as shown in Figure 1. A sensor node consumes energy when it is generating local data, receiving data, transmitting data, or in standby mode [18]. The energy for generating one bit of data and the standby energy consumed by one node are assumed to be the same for all nodes which can always be ignored. For energy used in receiving and transmitting, we adopt the first order radio model described in [19]. We assume that

Energy consumption model for one node.
In EUCP, all the nodes are distributed uniformly in a circle area with radius D (

Network structure.
As shown in Figure 2, in the inner region, the distance between nodes and BS, is smaller than
In wireless network, signal quality is determined mainly by base station antennas. In general, we divide the circle region into three sector regions with covering angle
3.2. Circular Division in Inner Region
In EUCP,

Distribution of nodes with maximum density.

Distribution of nodes with minimum density.
Obviously, when the density of nodes is higher than
Without loss of generality, we assume

A similar structure of binary tree in the inner region.
Similarly, the width of kth layer of inner region satisfies
Widths of all inner rings satisfy
In inner region, we group every two nodes in ith layer together and they choose one node in the

The longest distance in adjacent ring.
3.3. Division of Outer Region Based on Uneven Clustering
In EUCP, nodes in outer region are far from BS, and the distance between them is longer than
First, we divide the outer region into l annular regions, called “outer rings” and number them as layer

Subregions in outer region.
From the analysis above, it is obvious that, in EUCP, each node in the subregion should communicate with each other in one hop. Taking the outermost subregion for example, the maximum distance of any two nodes is just the length of diagonals of this subregion. We assume the length of diagonals is
Similar to LEACH, in the initial stage of clustering, nodes in subregion need to broadcast packets containing their ID number, residual energy, and coordinates
After the first round of data fusion and transmission, the cluster heads will be reselected. The priority of node i in subregion is defined as p, as shown in the following:
4. Analysis of Energy Consumption
4.1. Energy Consumption of Nodes in Inner Region
As mentioned above, a similar structure of binary tree is built in inner region whose root is the base station. Each node (except the leaf nodes) needs to fuse data collected by itself and its two child nodes before uploading, while leaf nodes collect the data from the corresponding cluster heads in outer region and transmit them to their parent nodes in
For a node in ith layer of the inner region,
By using mathematic induction, we obtain
Thus, for each node in
f is defined as the energy consumption of data fusion per bit and
For nodes in kth layer, there is no need to do data fusion. So the energy consumption
since the number of nodes in each layer of the inner region satisfies geometric progression whose common ratio is 2. Based on the above analysis, the total energy consumption
4.2. Energy Consumption of Nodes in Outer Region
4.2.1. Energy Consumption of Data Fusion for the Cluster Head
We take a subregion in
And the energy consumption on data fusion
Therefore, for the cluster head in the
4.2.2. Energy Consumption of Transmission for the Cluster Head and Its Members
Since nodes are uniformly distributed, for each subregion in the
In the equation,
In addition, each cluster head needs to transmit the fused data to the upper cluster head. These data are collected by cluster head itself as well as its members and uploaded by the cluster head in the lower layer. So the energy consumption of transmission for a cluster head in the
For simplicity, we define
4.2.3. Energy Consumption of Receiving for the Cluster Head
For the cluster head in the
While for the cluster head in
Therefore, in one round time of data collection and fusion, the total energy consumption of nodes in outer region (defined as
That is,
To balance energy consumption, it is assumed that the energy consumption of nodes in each subregion of outer region is approximately equal as follows:
In (30), we have
As for the multihop wireless sensor networks, it is well known that clusters near the center could receive more data than the clusters which are away from the center. Therefore, we have
To satisfy (30), we need
That is,
Through analysis of (34), we obtain
5. Experimental Results and Analysis
To analyze the balance of energy consumption as well as the network lifetime during data collection process, we use the java language to build the data collection model and then put the initial values for each parameter into the program to calculate the values of energy consumption and their variance. This paper compares EUCP with MTP and CDFUD algorithm, respectively.
5.1. Performance of Data Collection in Inner Region
The width
Parameter values of the inner region.
We set the fusion rate η to be 0.25, 0.5, 0.75, and 1.0, respectively, to analyze energy consumption on EUCP and MTP (including MTP1, MTP2, MTP3, and MTP4). MT
According to MTP, during the ith round, nodes which is not in the ith layer need to fuse and transmit data to nodes in the ith layer hop by hop, while nodes in the ith layer transmit all of the data to a node with the maximal residual energy of this layer which will then directly transmits these data to BS.
As shown in Figure 8, for the same data fusion rate η, energy consumption of MTP1 is a little more than EUCP. However, with the increasing of rounds, energy consumption of MTP is significantly higher than EUCP. In MTP2, a node which has maximal residual energy in layer 2 communicates with BS directly. But before this, nodes in layer 1 should transmit data to the nodes in layer 2 which is far from BS. This mode will obviously cause the waste of energy, while in EUCP, nodes in layer 1 transmit data to BS after having collected all the data from other layers which could keep equal energy consumption in each round of data fusion as well as data forwarding and could prolong network lifetime.

Energy consumption of EUCP and MTP in one round of data fusion and transmission.
The inner region is divided into 4 layers and the energy consumption of each layer is

Energy consumption in each layer of MTP and EUCP (

Energy consumption in each layer of MTP and EUCP (

Energy consumption in each layer of MTP and EUCP (

Energy consumption in each layer of MTP and EUCP (
It is easy to know that, for different values of η, energy consumption of MTP4 is always the highest one. And the relationship between energy consumption of these algorithms is
As shown in Figures 9–12, the differences in energy consumption between each layer are small in EUCP. However, for MTP, with the increasing of rounds, differences in energy consumption between each layer become larger and larger.
Figure 13 shows that variances of energy consumption are small in EUCP, MTP1, and MTP2. While with the increasing of rounds, variance of energy consumption of MTP become larger than that of EUCP.

Variance of energy consumption of EUCP and MTP.
5.2. Performance of Data Collection in Outer Region
We compare the energy consumption in outer region of EUCP and CDFUD algorithm. The outer region is also divided into 4 layers, whose widths are
Parameter values of the outer region.
As shown in Figure 14,

Energy consumption of EUCP and CDFUD in outer region.
As shown in Figure 15, for EUCP, the differences of energy consumption between two adjacent layers are around zero. While for CDFUD, the value is about 920 times of EUCP which verifies well-balanced energy consumption of EUCP in outer region.

Differences of energy consumption of EUCP and CDFUD in outer region.
As shown in Figure 16, for different value of η, the total energy consumption of EUCP is always far less than CDFUD. For transmitting the same amount of data, energy consumption of EUCP is only 6% of CDFUD.

Total energy consumption of EUCP and CDFUD.
As shown in Table 3, energy consumption variances of EUCP are less than 2.7 with different η. On the contrary, the value of EUCP is 9296.1. Because in CDFUD, no matter how far is the node from BS, the cluster head will directly transmit data to base station without data fusion, this will inevitably generate mass of redundant data and increase the energy consumption on sending and receiving.
Variances of energy consumption.
6. Conclusion
A type of energy-balanced uneven clustering protocol is proposed in this paper. Sensor network is divided into two regions and the inner is further divided into clusters with different sizes. Simulation results show that EUCP could not only prolong the network lifetime but also balance the whole network energy consumption.
In the future, the expansion of clustering in the outer region will be analyzed. And the residual energy should not be the only criterion for selecting the cluster header in our future work. Moreover, the cluster head rotation strategy also needs to be considered.
Footnotes
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
Acknowledgments
The subject is sponsored by the National Natural Science Foundation of China (61202355), Research Fund for the Doctoral Program of Higher Education of China (20123223120006), China Postdoctoral Science Foundation (2013M531394), Natural Science Foundation of Jiangsu Province (BK2012436), Jiangsu Provincial Research Scheme of Natural Science for Higher Education Institutions (14KJB520029), Postdoctoral Foundation of Jiangsu Province (1202034C), Open Project of Provincial Key Laboratory for Computer Information Processing Technology of Soochow University (KJS1327), and the Project funded by Priority Academic Program Development of Jiangsu Higher Education Institutions (Information and Communication, YX002001).
