Associated Clustering Strategy for Wireless Sensor Network

Abstract

We consider the soil moisture monitoring problem and propose a WSN associated clustering strategy based on spatiotemporal data correlation, which ensures that the nodes within each cluster can share a good data correlation and consequently makes the cluster head do the data fusion more efficiently. As a result, the energy of each node will be saved and the lifetime of the whole sensor network will be extended. In the associated clustering strategy, the different clusters can be divided by the correlation characteristics of nodes data, which is based on a dynamic model and a correlation characteristics model after the correlation coefficient analysis. Simulation results show that our proposed associated clustering strategy works very well in soil moisture measurement. Moreover, as compared with the traditional random clustering, the associated clustering strategy based on data correlation achieves better performance for each cluster, and will be more efficient in data fusion at the cluster heads.

1. Introduction

The improvement of embedded system design, sensor node design, and low power wireless communication techniques has made the large-scale wireless sensor networks (WSN) become an attractive solution for many applications [1]. WSN has features such as large-scale and high-density, independence from infrastructure, self-organizing, and adaptive network topology, making it exhibit a distinct advantage in many applications. When used in industrial or agricultural fields [2], WSN can provide cost-effective data monitoring and meanwhile bring several remarkable superiorities, such as concealment, ease of deployment, timeliness of data, reliability, and high coverage density.

Clustering [3] is a network management method, which divides the network nodes into several separated subsets according to some certain rules. Cluster head is responsible for collecting the data within its cluster and forwarding it to the base station. Data integration in cluster head can reduce the redundant data in networks and, consequently, save the energy consumption and prolong the network lifetime. Existing clustering strategy, such as LEACH, GAF, TEEN, and PEGASIS, is mostly cluster by the distribution of cluster head, distance between nodes, remaining energy, and network topology. Data correlation within one cluster is rarely a major consideration in clustering. In practical applications, data between nodes are generally correlated [4]. However, traditional random clustering strategy has poor data correlation within cluster, which leads to low data integration and thus generate massive data redundancy in network. Therefore, WSN clustering strategy based on data correlation has become a hot spot. In clustering process, partitioning the associated nodes into one cluster can get efficient data integration in cluster head. Accordingly, reducing the traffic in network greatly saves network resource and energy.

The main contribution of this work is to exploit the spatiotemporal correlation to divide the WSN clusters. The inherent correlation of a specific application is taken to be the basis of clustering. Take soil moisture measurement application as an example and establish a universal associated clustering strategy, which is also suitable for industrial applications when the soil moisture model is replaced by the particular industrial model. Based on Rodríguez-Iturbe soil moisture model, establish a dynamic model of soil moisture in greenhouse. Provide a clustering strategy based on spatiotemporal data correlation, which makes nodes in a cluster share a good correlation so as to do more efficient data integration, consequently, saving energy and extending the network lifetime.

2. Related Works

Several methods have been developed to analyze the spatial and temporal correlation characteristics in WSN. In [5], variograms are used to analyze spatial correlation. In [6], spatial correlation is used for schedule so as to achieve energy efficient data aggregation. In [7], an error-bounded data compression using data spatial-temporal correlation is provided. However, in the above studies, they mostly get data correlations from a single dimensional relation, spatial or temporal; besides, associated correlation is also nonqualitative. Particularly, spatial correlation is still just equivalent to physical location correlation, but not the data spatial correlation. In [8, 9], linear correlation between n-dimensional random variable and the definition of linear correlation coefficient of multiple variables are studied.

In [10], temporal and distance correlation are both used as clustering basis to reduce traffic in network to achieve the goal of energy saving. Similar clustering method is also mentioned in [11], which only takes the spatial correlation into account. In [12], a novel clustering algorithm based on correlation of sensor data is proposed. Its key is to express the data redundancy of WSN as formalized data correlation, thereby considering clustering from the perspective of data dependency. In [13], Yang et al. studied to define the correlation of soil moisture between different vertical depths by correlation analysis and R-type hierarchical clustering analysis.

In study of soil moisture model, Rodríguez-Iturbe et al. in [14] proposed several typical dynamic stochastic models of soil moisture and the corresponding probability density function of soil moisture. Rodríguez-Iturbe model is the one which considered more completely the dependence of random input and output of soil moisture. Main factors of soil moisture such as rainfall, vegetation, and soil are expressed as quantified model. Further discussions of spatial and temporal variability of soil moisture are proposed in [15]. However, in this model, rainfall was proposed as the main factor of soil moisture, which is difficult to achieve fine-grained soil moisture analysis model. As for the detection of soil moisture in greenhouse, irrigation replaced rainfall to be the main entry of soil moisture. In [16, 17], sprinkling irrigation is analyzed, which is an irrigation method widely used in greenhouse. However, its model only depends on some parameter coefficient.

In this paper, consider the main affect factors of soil moisture in greenhouse; a dynamic soil moisture model is established. Meanwhile, a spatiotemporal correlation characteristics model is proposed after the soil moisture model. Associated clustering strategy based on data correlation is finally raised, which can implement efficient data integration within each cluster and reduce traffic in network, so as to achieve energy balanced efficient WSN.

The remaining of the paper is organized as follows: soil moisture correlation characteristics model is established in Section 3, which is based on a dynamic soil moisture model and an irrigation model. Section 4 proposes an associated clustering strategy using the idea of correlation clustering pedigree chart. Simulation results of clustering algorithm and verification of correlation are deployed in Section 5. Finally, conclusions and further works are mentioned in Section 6.

3. Data Correlation Characteristics

3.1. Soil Moisture Model

Soil moisture is affected by many factors like climate, rainfall, irrigation, soil structure, and so forth. So, it is difficult to establish an accurate soil moisture model by taking all the factors into account. Based on the classic Rodríguez-Iturbe soil moisture dynamic model, we established a soil moisture model specifically for greenhouse, which takes irrigation as a major factor of soil moisture entry. The model gives a complete expression of soil moisture input and output items. Ignoring the effects of terrain, it proposed a quantitative simulation for several major factors such as irrigation, vegetation, evaporation, and leakage; meanwhile, take the upper and lower bounds of soil water capacity into consideration. Soil moisture model can be described by the following water balance equation:

\begin{matrix} n Z_{r} \frac{d s (u, t)}{d t} = (1 - ϕ) I (u, t) - K_{s} \cdot \frac{s (u, t) - s_{1}}{1 - s_{1}}, s > s_{1}, \\ n Z_{r} \frac{d s (u, t)}{d t} = (1 - ϕ) I (u, t) - V \cdot s (u, t), s_{0} < s < s_{1}, \end{matrix}

(1)

where

s_{1}

is the upper bounds of soil water capacity and

s_{0}

is the soil drought coefficient. Overflow and leakage are key factors associated with the current soil moisture when the extreme situation

s > s_{1}

occurred.

S (u, t)

is the soil moisture level at time t and location

u, n

is the soil porosity and

Z_{r}

is the depth of the root zone, and the product

n Z_{r}

represents the capacity of the soil to store water.

I (t)

is the intensity of the irrigation process and

(1 - φ)

is the net irrigation coefficient (usually 0.4~0.6), which is determined by the plant species and the condition of vegetation.

K_{s}

is soil hydraulic conductivity. The sum of evaporation transportation, leakage, and runoff losses is associated with the current soil moisture and represented by

V \cdot S (u, t)

, in which V is the soil water loss coefficient, depending on vegetation coefficient

K_{c}

3.2. Irrigation Model

Assume a greenhouse environment using sprinkler irrigation technology, with sprayers uniformly distributed in the crop area. As shown in Figure 1, distance between sprayers is D; each sprayer uniformly watered a circle whose radius is R. Consider that no area is irrigated by 3 sprayers at the same time, and thus we have $2 R > D > \sqrt{3} R$ .

Figure 1

Greenhouse irrigation environment.

In Figure 1, take the lower left point of the monitoring area as the coordinate origin, with sprayers located at $(m D, n D) (m, n = 0, 1, 2, \dots)$ . For a point in the area, its irrigation coverage degree C is the number of the sprayers whose irrigation range covered the point (distance between the sprayer and $(x, y)$ is smaller than R). Consequently, the irrigation intensity is expressed as $C (x, y) \cdot X$ , in which X is irrigation intensity of one single sprayer.

Therefore, irrigation model of point $(x, y)$ is represented as follows:

\begin{matrix} I (x, y, t) = C \cdot X \cdot t . \end{matrix}

(2)

3.3. Correlation Characteristics Model

Based on the soil moisture model and the irrigation model, we can get spatiotemporal correlation characteristics model of soil moisture, which indicate the relation between each influence factor and the correlation characteristics. Accordingly, it can better guide our clustering strategy when network is deployed.

The correlation coefficient is used to define the linear correlation of two variables. Corresponding, we use it to describe the correlation of soil moisture at two different points. The correlation coefficients $cor (S_{A} (t), S_{B} (t + h))$ characterize the correlation degree of soil moisture at point A and point B. The bigger the $| cor (S_{A} (t), S_{B} (t + h)) |$ is, the bigger the correlation degree of soil moisture at point A and point B is.

The correlation coefficient is defined as $ρ = (cov (S_{A}, S_{B}) / (\sqrt{σ^{2} (S_{A})} \cdot \sqrt{σ^{2} (S_{B})}))$ , which is calculated from the covariance and square deviation of soil moisture.

The soil moisture model in (1) can be normalized as

\begin{matrix} \frac{d S_{A} (t)}{d t} = - a_{A} S_{A} (t) + b_{A} I_{A} (t), \end{matrix}

(3)

where

a_{A} = {[V / (n Z_{r})]}_{A}

is the water loss coefficient of point A. The net irrigation coefficient is

b_{A} = {[(1 - ϕ) / (n Z_{r})]}_{A}

, associated with the intercepting characteristics of vegetation.

Accordingly, the soil moisture of point A is represented as

\begin{matrix} S_{A} (t) = e^{- a t} (\int e^{a t} b I (t) d t + c_{1}) = \int e^{- a_{A} ω} b_{A} I_{A} (t - ω) d ω . \end{matrix}

(4)

The expectation of soil moisture at point A is calculated as

\begin{matrix} E [S_{A} (t)] = \frac{b_{A}}{a_{A}} E [I (t)] = \frac{b_{A}}{a_{A}} \frac{π R^{2}}{D^{2}} \cdot X \cdot t . \end{matrix}

(5)

The covariance of soil moisture at point A and point B is

\begin{array}{l} C_{A B} (h) = \iint b_{A} b_{B} e^{- a_{A} u - a_{B} v} cov \\ \times (I_{A} (t - u), I_{B} (t + h - v)) d u d v \\ = b_{A} b_{B} \cdot X^{2} \\ \times (C_{A} C_{B} + \frac{b_{A} b_{B}}{a_{A} a_{B}} ρ^{2}) \\ \times {\frac{2 e^{- a_{B} h}}{(a_{A} + a_{B}) (ρ - a_{A})} - \frac{2 e^{- h}}{(ρ^{2} - a_{B}^{2})}}, \end{array}

(6)

where

C_{A}

is the irrigation coverage degree of point A and

ρ = (π R^{2} / D^{2})

describes the distribution density of irrigation sprayers.

According to the calculation of correlation coefficient, the spatiotemporal correlation characteristics model of soil moisture is proposed as follows:

\begin{array}{l} cor (S_{A} (t), S_{B} (t + h)) \\ = \frac{n_{A} n_{B} - (b_{A} b_{B} / a_{A} a_{B}) ρ^{2}}{\sqrt{| C_{A}^{2} - {((b_{A} / a_{A}) ρ)}^{2} |} \sqrt{| C_{B}^{2} - {((b_{B} / a_{B}) ρ)}^{2} |}} \\ \cdot \sqrt{a_{A} a_{B} (ρ + a_{A}) (ρ + a_{B})} \\ \times {\frac{2 e^{- a_{B} h}}{(a_{A} + a_{B}) (ρ - a_{A})} - \frac{2 e^{- h}}{(ρ^{2} - a_{B}^{2})}} . \end{array}

(7)

This correlation characteristics model can be used to approximate the real data correlation to describe the correlation characteristics of soil moisture.

4. Associated Clustering Strategy

According to the correlation characteristics calculated in (7), the associated clustering strategy uses cluster analysis to divide associated sets into clusters.

Clustering is process to partition data into different clusters or classes. Objects within one cluster share a great similarity, while objects between different clusters have a great dissimilarity. Clustering is a statistical analysis technique that divides objects into relatively homogeneous groups (clusters). The clustering analysis process is shown in Figure 2. Data correlation is the basis of clustering process in this paper.

Figure 2

Clustering analysis process.

The initialization process of associated set partition algorithm is described as each node is initialed as a single associated set $G_{i}$ ( $v_{i} \in G_{i}, i \leq N$ , in which $V_{i}$ is node i and N is the number of sensor nodes). The correlation characteristics of two associated sets are defined as the minimum correlation characteristics between a random node in $G_{i}$ and a random node in $G_{j}$ . During the process of initialization, using bivariate correlation represents the correlation characteristics of the initiated associated set. The clustering process of associated set partition algorithm is described as follows: select two sets (for example: $G_{m}$ and $G_{n}$ ) with the biggest correlation among all the associated sets and then combine these two sets. Specifically, combine these two sets into one set $G_{m}$ or $G_{n}$ , and then delete another set. After that, update the correlation characteristics between other associated sets and the newly generated set. Repeat the clustering process, until the number of associated sets is no larger than the expected number of clusters K.

Algorithm 1 describes the process of our associated clustering strategy. Comparing with random clustering strategy, associated clustering strategy guarantees that the nodes divided into the same cluster have a greater data correlation. Thus, more efficient data aggregation can be done at the cluster head.

Algorithm 1: Associated set partition algorithm.

Input: N is the number of sensor nodes, K is the expected number of clusters, data $[N] [2]$ is the coordinate data of N sensor

nodes.

Output: $G [i]$ : Associated set obtained from the associated set partition algorithm.

(1) /* Initialization: initialize associated set and initial correlation between these sets */

(2) FOR i IN $[$ 1 : 1 : $N]$

(3) {/* Initialize associated set $G [i]$ : nodes set contained by associated set marked with i*/

(4) InitGroup( $V i, G [i]$ ); //Initialize $G [i]$ with single $V i$

(5) FORj IN $[$ 1 : 1 : $i - 1]$

(6) /* Initialize the correlation $c o r [i] [j]$ between $G [i]$ and $G [j]$ by using (3) */

(7) InitCor( $c o r [i] [j]$ ); /* During initialization, correlation between $G [i]$ and $G [j]$ is the correlation between

node $V i$ and $V j$ */

(8) }

(9)

(10) /* Clustering process: combine the associated sets until its number is no larger than K*/

(11) WHILE ( $G r o u p > K$ ) //Group is number of associated set

(12) {

(13) FindMaxCor( $G [m], G [n]$ ); /* select two set $G [m]$ and $G [n]$ with the biggest correlation among all the associated

sets */

(14) Merge( $G [m], G [n], G [m]$ ); /* combine this two sets into one set $G_{m}$ */

(15) Delete( $G [n]$ ); //delete $G [n]$

(16) /* update the correlation between newly generated $G [m]$ and the other set. $c o r [k] [m]$ is then smaller one between

$c o r [m] [k]$ and $c o r [n] [k]$ */

(17) FOR k IN (1 : 1 : N)

(18) IF (Exist( $G [k]$ ) && ( $k! = m$ ))

(19) UpdateCor( $c o r [k] [m]$ , $\min (c o r [m] [k], c o r [n] [k])$ );

(20) } //end of while

5. Simulation Results

In order to verify the effectiveness and performance of associated clustering strategy, simulation was taken in experimental environment with a monitor area 4 m × 6 m. 50 sensor nodes were deployed randomly in the area, with location known. For the soil moisture application, sprayers in the greenhouse are distributed with irrigation radius $R = 1$ , distance $D = 1.85$ . The distribution of nodes, sprayers, and crops in the monitoring area is shown in Figure 3. Take the lower left point of the area as coordinate origin. The crop 1 is planted in the area within x: $[0,2]$ , the coefficient of this area is $a = 0.07$ , $b = 0.001$ ; The crop 2 is planted in the area within x: $[2.5,4]$ , the coefficient of this area is $a = 0.4$ , $b = 0.002$ ; The area between crop 1 and crop 2 is open, its coefficient is $a = 0.2$ , $b = 0.01$ .

Figure 3

Object distribution in the monitoring area.

5.1. Effectiveness of Soil Moisture Correlation Characteristics

An ideal data correlation characteristics model should be able to approximately describe the real data correlation. Model with such character can effectively illustrate the spatiotemporal correlation of soil moisture. Figure 4 is verification of the soil moisture correlation characteristics model.

Figure 4

Effectiveness verification of correlation model.

In Figure 4, take the correlations between node 1 and node $1,2, \dots, 20$ as an example and compare the real data correlation and the correlation calculated by correlation model to verify the effectiveness of the soil moisture correlation characteristics model. Each node holds a serial of 5 values as its collected data. The real data correlation is calculated by the corrcoef function in Matlab. In Figure 4, calculated correlation represents the correlation calculated by correlation model, and the real correlation represents the real data correlation. As the result shown in Figure 4, spatiotemporal correlation characteristics model in (7) is verified to be effective to approximate the real data correlation. So, the model can be used to describe the correlation characteristics of soil moisture.

5.2. Associated Clustering

According to Algorithm 1 and the soil moisture correlation characteristics calculated by (7), Use the randomly generated coordination data of sensor nodes data[N] [2] and expected cluster number K as input data, executing associated clustering on the experimental environment in Figure 3. The result of clustering is in Figure 5.

Figure 5

Associated clustering based on data correlation.

As known from (7), the main factor of data soil moisture correlation characteristics contains soil and vegetation coefficient a and b and irrigation coverage degree C. According to the experimental environment settings, coefficients a, b, C can be got from the coordination data. The expected cluster number K helps to determine the grain of cluster. Therefore, the result of clustering is associated with node location and expectation cluster number.

5.3. Comparison with Random Clustering

In order to verify the associated clustering strategy based on spatiotemporal data correlation can make nodes within each cluster share a greater data correlation and select the random clustering protocol: LEACH clustering algorithm as comparison. Figure 6 is the result of LEACH clustering. In LEACH, cluster head is randomly generated and the other nodes decide to join a cluster according to the distance between it and the cluster head. As a result, LEACH clustering is more likely to cluster the nodes closer.

Figure 6

Result of LEACH clustering.

In order to compare the data correlation within a cluster, use multivariate linear relationship $ρ_{X} = ((n - r + 1 - | R_{r} |) / (n - 1))$ in [9] to measure the correlation degree of nodes data within a cluster, in which $ρ_{X}$ defined the linear correlation coefficient of n-dimension random variables $X = (x_{1}, x_{2}, \dots, x_{n})$ . R is the correlation coefficient matrix of X and r is the rank of R. $| R_{r} |$ is the largest nonzero subtype.

Take each node data as random variables $X_{i}$ , and then, we can calculate the multivariate linear relation $ρ_{X}$ of each cluster, which is the cluster correlation. For each cluster generated by associated clustering strategy and LEACH clustering, the comparison result of cluster correlation is in Figure 7.

Figure 7

Correlation comparison between associated clustering and LEACH.

As shown in Figure 7, compared with LEACH clustering, associated clustering strategy can get generally higher data correlation within each cluster. In this experimental environment, the average intracluster data correlation gained by associated clustering strategy is 0.6760, which is higher than the average data correlation 0.4331 gained by LEACH clustering. Therefore, associated clustering strategy can divide nodes with higher data correlation into the same cluster. Thereby enabling efficient data integration at cluster head achieves the goal of energy saving and network lifetime extending.

6. Conclusions and Further Work

In this paper, the clustering problem of WSN is studied. In the application of soil moisture measurement, we established a dynamic soil moisture model, and after correlation coefficient analysis of the model, proposed a soil moisture correlation characteristics model, which is used to represent the correlation of sensor data, also used as the basis of clustering. Finally, an associated clustering strategy based on spatiotemporal correlation characteristics is proposed. As a result, the associated clustering strategy divides nodes with high correlation into a cluster, makes the cluster head do the data fusion more efficiently. Thus the energy of each node will be saved and the lifetime of the whole sensor network will be extended. This associated clustering strategy is also suitable for other industrial or agriculture applications when using a particular industrial model to replace the soil moisture model.

In practical scenarios, the distance between nodes affects the energy consumption in WSN. During the process of data transmission, the longer is the distance, the more energy is consumed. So, in the further works, we can take both associated clustering strategy and distance factor into consideration, in order to optimize clustering strategy of WSN.

Footnotes

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

This work is supported by the National Natural Science Foundation of China under Grant No. 61170245 and the National Natural Science Foundation of China under Grant No. 61103242.

References

Rajaravivarma

Yang

An overview of wireless sensor network and applications

Proceedings of the 35th Southeastern Symposium on System Theory

March 2003

432 436

Yang

Design and implementation of intelligent urban irrigation system: ZigBee modules for sensor networking

Proceedings of the IEEE 2nd International Conference on Software Engineering and Service Science (ICSESS ′11)

July 2011

461 464

2-s2.0-80052481384

10.1109/ICSESS.2011.5982353

Bian

Zhang

Zhao

Research on clustering routing algorithms in wireless sensor networks

Proceedings of the International Conference on Intelligent Computation Technology and Automation (ICICTA ′10)

May 2010

1110 1113

2-s2.0-77955681885

10.1109/ICICTA.2010.343

Masoum

Meratnia

Havinga

P. J. M.

Analysis of the impact of data correlation on adaptive sampling in wireless sensor networks

Proceedings of the 9th International Conference on Networked Sensing Systems (INSS ′12)

June 2012

1 2

Geostatistical space-time modeling for temperature estimation

Proceedings of the 1st International Conference on Agro-Geoinformatics (Agro-Geoinformatics ′12)

August 2012

1 5

Kondo

Kanzaki

Hara

Nishio

Energy-efficient data gathering using sleep scheduling and spatial correlation based on data distribution in wireless sensor networks

Proceedings of the International Conference on Network-Based Information Systems (NBiS ′11)

September 2011

194 201

2-s2.0-80455178508

10.1109/NBiS.2011.36

M.-H.

Lin

C.-C.

Chuang

C.-C.

Chang

R.-I.

Error-bounded data compression using data, temporal and spatial correlations in wireless sensor networks

Proceedings of the 2nd International Conference on Multimedia Information Networking and Security (MINES ′10)

November 2010

111 115

2-s2.0-78751476641

10.1109/MINES.2010.31

Zhang

S.-Z.

Wang

S.-M.

The correlation coefficient matrix and multi linear correlation analysis

University Mathematics 2011 27 1 195 198

Kun

J. F.

Chun

L. Z.

Research on linear correlation of multi-dimensional random variables

University Mathematics 2008 24 3 144 147

10.

Jiang

Jin

Wang

Prediction or not? An energy-efficient framework for clustering-based data collection in wireless sensor networks

IEEE Transactions on Parallel and Distributed Systems 2011 22 6 1064 1071

2-s2.0-79955521337

10.1109/TPDS.2010.174

11.

Chen

Yang

Xie

A clustering approximation mechanism based on data spatial correlation in wireless sensor networks

Proceedings of the 9th Annual Wireless Telecommunications Symposium (WTS ′10)

April 2010

1 7

2-s2.0-77954356637

10.1109/WTS.2010.5479626

12.

Yeo

M.-H.

Lee

M. S.

Lee

S.-J.

Yoo

J.-S.

Data correlation-based clustering in sensor networks

Proceedings of the International Symposium on Computer Science and its Applications (CSA ′08)

October 2008

332 337

13.

Yang

Wang

Sun

Soil moisture content sensors placement based on the vertical variety law

Transactions of the Chinese Society of Agricultural Machinery 2008 39 5 104 107

2-s2.0-45449104760

14.

Rodríguez-Iturbe

Isham

Cox

D. R.

Manfreda

Porporato

Space-time modeling of soil moisture: stochastic rainfall forcing with heterogeneous vegetation

Water Resources Research 2006 42 6

2-s2.0-33746623626

10.1029/2005WR004497

W06D05

15.

Isham

Cox

D. R.

Rodríguez-Iturbe

Porporato

Manfreda

Representation of space-time variability of soil moisture

Proceedings of the Royal Society A 2005 461 2064 4035 4055

2-s2.0-33745473760

10.1098/rspa.2005.1568

16.

Jun

L. H.

Hong

G. S.

Mathematical modeling of soil moisture infiltration under sprinkler irrigation

Journal of Irrigation and Drainage 2006 25 2 15 19

17.

Zhu

P.-Y.

Song

J.-C.

Zhang

G.-S.

Calculus model in the design of sprinkler irrigation system

Journal of Southeast University for Nationalities (Natural Science Edition) 2005 31 4 495 497