Abstract
Fault detection is a method of crucial importance to guarantee Quality of Service (QoS) in industrial wireless sensor network (IWSN). Timely and accurate detection for fault nodes can increase the robustness of IWSN. Therefore, in order to reduce the network detection load and improve the system scalability, a cluster-based fault detection (CFD) algorithm is proposed. In this algorithm, the IWSN is clustered based on Low Energy Adaptive Clustering Hierarchy (LEACH) protocol at the beginning, and then the reliability of all cluster head nodes is verified. In each cluster, the cluster head is responsible for fault detection in it. In addition, the introduction of node credibility mechanism enables CFD algorithm to tolerate transient faults. Simulation results demonstrate that CFD algorithm has better performances in terms of network energy consumption and detection accuracy.
1. Introduction
With the development of Microelectromechanical System (MEMS), wireless sensor networks (WSNs) have and will play a vital role in our daily lives [1]. Humans have relied on wired sensor for years for simple tasks such as temperature monitoring and for complex tasks such as monitoring life signs in hospital patients. However, wireless sensor networks provide unforseen applications in this new field of design [2]. From military applications such as battlefield mapping and target surveillance to creating context-aware homes where sensors can monitor safety and provide automated services tailored to the individual user, the number of applications is endless. Recent advances in wireless sensing technology encourage the further optimization and improvement of the product development and service provision processes [3]. Industrial wireless sensor network (IWSN) is an emerging class of WSN that faces specific constraints linked to the particularities of the industrial processes. In this term, industrial wireless sensor network faces several challenges such as the reliability and robustness in harsh environments as well as the ability to properly execute and achieve the goal [4].
As one of wireless networks, industrial wireless sensor network is composed of amount of low-cost microsensors in a self-organizing way. Generally, since sensors are deployed randomly in uncontrolled environment or in hostile watched areas, sensor nodes are vulnerable to being unreliable [5, 6]. The fault nodes may produce inaccurate observed data values thereby weakening or evanishing their own functions more seriously leading to an entire system unusable [7]. Therefore, it is very significant to effectively realize fault management for WSN especially for IWSN, because it enables IWSN to detect automatically and troubleshoot nodes and reduce network energy consumption as far as possible to assure high Quality of Service (QoS) [8, 9].
Similar to other wireless sensor networks, the types of fault in IWSN can be classified into energy fault, node fault, and communication fault [10]. Energy fault denotes the node failure due to energy exhaust or damage of energy supply unit when the node works normally. Node fault [11] includes catastrophic fault and parametric fault. The former is caused by hardware damage of nodes and the latter is produced by wrong detection results because of inaccurate data collected by nodes. Communication fault derives from loss, destruction, or congestion of data during transmission caused by the interference of the external environment. This paper is mainly focused on the communication fault.
The IWSN is often envisioned to be deployed in large scale industrial applications [12]. In those applications, the IWSN may consist of hundreds or potentially thousands of sensor nodes. Such a large network usually cannot efficiently operate without some structuring [13, 14]. For this reason, clustering has been a way to introduce some hierarchical structure in the network, which has the side effect of creating nodes that can be considered more critical than others [15]. This is due to the fact that cluster head node serves as the central hubs or super peers for node management functions such as communication, organization, and security. In a clustered network, each sensor node communicates with the cluster head to which it is assigned [16]. It is the cluster head's job to route all received communications towards a destination or network sink. As a result, cluster heads are of increased importance to the proper functioning of the IWSN. From this fact, it is apparent that the compromise or loss of a cluster head will have a greater impact on the overall effectiveness and lifetime of the network [17]. Furthermore, the operations of sensors are seriously affected by their environments; for example, the industrial production environment is complicated and the sensors are interfered critically. How to detect the fault in IWSN with high detection accuracy and low false alarm rate is a challenge [18]. So a novel fault detection algorithm based on clustering in industrial wireless sensor networks is proposed to satisfy this new request in this paper.
2. Related Works
2.1. Background
The deployment and the arrangement of WSN are extremely challenging tasks, which become even more challenging in industrial applications. The environment where IWSNs are deployed in order to monitor environment or production processes is extremely dynamic; it can depend on the specific product, the phase of life of product, and the kind of service provision considered. In fact, each kind of product or phase of life has different requirements and imposes different constraints on the monitoring system. One of the challenges to face is the fault detection.
When the IWSN is deployed inside a factory to assess the production process quality, the designer has to deal with the fault detection produced by the production machines. In this case, the IWSN has to be deployed and calibrated not only to guarantee the correct assessment of the production process but also, above all, not to break down. The same logic holds for IWSNs used to monitor electricity, water, and gas consumption. Often nodes of the IWSN are immersed into the goods inside containers or any transportation means. For this kind of environments, the fault detection has to be carefully investigated in order to determine the most efficient and effective way to make nodes operate smoothly.
Recently, most researches on fault detection methods on WSNs have received much attention. The mainstream fault detection methods can be classified into centralized policies and distributed policies [19]. In centralized policies, control center adopts active detection model to collect node information and recognizes the fault nodes in WSN by comparing current state with history information of node. Control center is a base station or a center administrator and has infinite resource. MANNA fault management model is the origin of centralized policies which sets a special management station outside of WSN to control and manage the network to execute the complicated operation that internal network cannot complete to finish fault detection by polling [20]. This algorithm has a simple structure, clear division, and specific location. However, management nodes with heavy load are liable to become bottlenecks. Once the failure appears, the entire network will collapse. Meanwhile, the delay of internode communication is prolonged, which led to low efficiency. Moreover, the communication overhead of nodes is great, and thus it adds network energy consumption [21].
Distributed policies divide the network into several areas and distribute fault management tasks into each area uniformly. There is a management node in each area, taking charge of monitoring and detecting the fault nodes in this area or detecting fault with other management nodes collaboratively. The management node just monitors only a few nodes in the network and thus it saves communication energy consumption and reduces system response time when events occur as well. Xu et al. [22] review various distributed fault detection algorithms in detail, including Bayesian algorithm for fault-tolerant event detection [23], Weighted-Median Based Distributed Fault Detection algorithms [24], and double-neighborhood median algorithm [25]. According to comparison and analysis, Han et al. discuss the difference between various algorithms in terms of performance and energy. Among all distributed fault detection algorithms, DFD algorithm [26] is a typical detection algorithm. Its main idea is to compare [27] and judge the data which were collected by all neighborhood nodes to make sure whether the state of the node is well. Furthermore, the final state of nodes is confirmed and the fault detection of network nodes is realized according to node state diffusion mechanism. At present, many scholars study and modify CFD algorithm. The authors in [28] consider that DFD algorithm is harsh to the final judgment for normal node, so they do some improvement. However, the improved algorithm does not solve the problem of large energy consumption of the nodes radically. If there is not a good (GD) node during the algorithm running, the system can not detect the final state of the node. Based on tree topology of Zigbee, the authors in [29] propose a TreeDFD algorithm with the advantages in energy consumption and performance detection [30]. However, TreeDFD needs to construct detection tree so it is not suitable for large-size WSN. Based on DFD algorithm, the authors take full advantage of the characteristics of multisensor networks and improve the detection accuracy according to auxiliary state judgment for associations between nodes. However, this algorithm is only suitable for WSN with sparse nodes [31]. The authors in [32] propose a cluster-based sensor fault detection algorithm, clustering the network by LEACH routing protocol at first and detecting the node fault by DFD algorithm [33]. However, it does not provide a detailed method to assure the reliability of the cluster head node [34]. If the collected data [35] by cluster head node is wrong, it will have an effect on fault detection results. In addition, when neighborhood nodes are fewer and fault probability is high, CFD can reduce the fault detection accuracy rapidly.
2.2. Contributions
Aiming at the disadvantages of large computing redundancy and fast energy consumption in typical distributed fault detection algorithms, a simple and low-energy algorithm, named cluster-based fault detection (CFD) algorithm, is proposed in IWSN. To the best of our knowledge, this is the first paper to consider a fault detection algorithm based on clustering in IWSN. The main contributions of CFD algorithm are as follows:
After clustering the network, the cluster head node takes charge of the main fault detection. To save the communication overhead to a large extent, the member nodes in cluster do not have to communicate with all their neighborhood nodes, but they have to exchange data with the cluster head node. Judging the reliability of the cluster head node by collecting the observed value of cluster member nodes and cluster head node will prevent the fault cluster head nodes from influencing the detection results. In view of the influence of transient fault, a credibility value is set for each node to realize the fault tolerance mechanism, decrease the misjudgment ratio, and improve the detection accuracy.
2.3. Paper Organization
The rest of the paper is organized as follows. Section 3 designs the CFD algorithm and gives the algorithm description. Section 4 presents the simulation results and analyzes the algorithm. Section 5 concludes the paper.
3. CFD Algorithm Design
In IWSN, all senor nodes are immobile with the same transmission radius. The nodes in cluster use one hop communication mechanism and the cluster head nodes are not neighborhood nodes to each other. Since sensor nodes are deployed densely in IWSN, the nodes that are close in distance have certain spatial similarity at the same time. Therefore, the cluster head nodes have time-space relativity. Fault can occur at different levels in IWSN, such as physical level, hardware level, system software level, and middleware level [23]. This paper focuses on parametric fault of the nodes and allows the nodes to transmit, receive, and process information when fault occurs.
3.1. Definition
Definition 1 (node fault probability).
It is denoted as the rate in which the node has transient faults or permanent faults in IWSN.
Definition 2 (neighborhood node).
The nodes in the communication range of one hop are neighborhood nodes to each other. All neighborhood nodes of cluster head node
Definition 3 (observed value).
Current true values collected by sensor nodes include temperature value, humidity value, and pressure value.
Definition 4 (test value).
A binary value (0/1) is generated by any two neighborhood nodes according to their collected data.
Definition 5 (credibility value).
During network lifetime, all sensor nodes have an original credibility value, and the current credibility value is given by the original value minus 1. Once the current credibility value decreases to 0, the node is considered as a distrust node.
Definition 6 (node state).
The nodes are classified into fault nodes and normal nodes, and the node with transient faults in credibility range is also named as normal node. The normal node is denoted as
3.2. Algorithm Description
Based on the model discussed above, cluster-based fault detection algorithm in IWSN is mainly composed of the following four stages.
(1) Clustering algorithm: in this paper, the widely used LEACH protocol is used to cluster the whole network, where each cluster includes a cluster head node and several member nodes.
(2) The reliability of the cluster head nodes: judge the reliability of cluster head node by comparing the observed values of member nodes and head node in cluster. If the cluster head node is a fault node, the network needs to be clustered again. Let the member nodes of cluster head node
Compare the observed values of all nodes in the same cluster with the median value represented by (2). We can obtain a compared value-based set
Define the average value of set D as the threshold. Consider
If
(3) Fault detection for the nodes in cluster: by the previous stage, the cluster head node is judged as a trouble-free node. Then, referring to the cluster head node, the member nodes in this cluster can check their own states by threshold test directly. At time
Although the proposed algorithm realizes the fault detection mechanism in cluster by exchanging messages between cluster head node and member nodes under the condition of assurance that the cluster head node is trouble-free, it cannot ensure that the messages of cluster head node are absolutely right when messages exchanged. Once the cluster head node has transient faults, the cluster head node will be judged wrongly; thus, the detection accuracy of the whole network will also be affected.
(4) All clusters detect faults simultaneously and go to next clustering stage after stabilization stage to balance the network energy consumption.
CFD Algorithm
Step 1
If else
Step 2
If if else if if the node is in the credibility range, else the member node is not reliable;
Step 3
Else if if the cluster head node is in the credibility range, else the cluster head node is unreliable;
3.3. Performance Analysis
The proposed fault detection algorithm in this paper has the following characteristics.
3.3.1. Energy Consumption
According to the energy consumption model of WSN, we can see that the energy consumption of the node mainly contains three parts: the energy consumption of data acquisition, data processing, and wireless communication. As shown in Figure 1, the energy consumption of communication determines the lifetime of network. If the energy consumption of communication is greater, the energy of the node dissipates faster; otherwise, it dissipates slower. The lifetime of network is defined as the time of death of the last node in this network; thus, we need to balance the energy consumption of each node in network. Generally, the energy consumption of communication of nodes is determined by the times of data exchanging, so that of fault detection is also mainly determined by the times of data exchanging during detection.

Energy consumption model of network node in WSN.
Figure 1 shows that, in transmit state, the energy of nodes is consumed more than that in receive state and idle state. In idle state, the node would constantly monitor the wireless channel. So in this state the consumed energy is nearly the same as that in receive state.
By the typical DFD algorithm to detect faults each node communicates with all neighborhood nodes around to make sure the original state makes a final judgment according to the original state and spreads the judgment result to its neighborhood nodes at last. In spite of high detection accuracy, DFD also consumes much energy. As shown in Figure 2(a), the nodes distribute randomly in a certain watched area, where A and B are neighborhood nodes for each other. We need to compare A with B when detecting faults for A. Similarly, when we detect faults for B, we need to compare B with A once again. Obviously, this method causes redundancy and increases the network overhead.

Node distribution.
However, in CFD algorithm, we cluster the network to select a head node based on LEACH protocol. Each neighborhood node of cluster head node communicates with it one time when judging its normality at beginning. Since the credibility judgment of cluster head node just happens during clustering, the energy consumption can be negligible. In the stage of fault detection, with all the nodes in the same cluster, as shown in Figure 2(b), the fault detection cannot be completed until the member nodes communicate with A one time, where A is the cluster head node. Therefore, compared with DFD algorithm, CFD effectively avoids the repetitive computation, reduces the data communication times, and thus has an obvious advantage in network energy consumption.
3.3.2. Fault Tolerance
The typical DFD algorithm detects the transient fault due to the similarity of the observed test values of the nodes at a certain moment. However, CFD algorithm introduces the credibility value to realize the tolerance for transient fault. Although both DFD and CFD have better fault tolerance, the paper pays more attention to the method of saving network energy overhead. If the node has frequent transient faults, it should be marked as an unuseful node and should be omitted from detection computation. Based on efficiency and energy considerations, that method which introduced the credibility value has better performance.
3.4. Application Example
Let us assume that the original credibility value of all nodes is
The observed values of nodes.
We sort the observed values of each node as
As shown in Tables 2, 3, and 4, the node
Fault detection analysis for member node in first round.
Fault detection analysis for member node in second round.
Fault detection analysis for member node in third round.
In another case, if the cluster head node has a transient fault, the member node does not keep a record to deal with the credibility value of the cluster head node. Then, the next test continues. If the cluster head node is judged as a fault node, the network needs to be clustered again.
We can see from Table 5 that the node
Fault detection analysis for cluster head node.
4. Simulation and Analysis
4.1. Simulation Environment
The CFD scheme will be applied in a real industrial wireless sensor network system. It is expensive to run schemes on the hardware of the system, so the feasibility and accuracy of the schemes should be verified before being applied. Therefore, simulation becomes the best alternative way of testing, evaluating, and verifying. Furthermore, the DFD algorithm is an efficient algorithm to detect the sensor fault in wireless sensor networks, so, in this paper, the DFD is implemented to compare with the CFD algorithm. We programmed the CFD algorithm and DFD algorithm with Visual C++ and Matlab. Two hundred nodes are randomly deployed in the network. The network paremeters used in the simulation are shown in Table 6.
Network parameters.
4.2. Simulation Results and Analysis
First, we simulate the detection accuracy and false alarm rate of two algorithms when the node failure rate is set as 0.05, 0.10, 0.15, 0.20, 0.25, 0.30, 0.35, and 0.40, respectively. Detection accuracy and false alarm rate of sensors are two important metrics in evaluating performance of algorithm. Detection accuracy is the probability of detected fault sensors to all fault nodes. False alarm rate is the probability of the number of normal sensor nodes judged as fault nodes to the total amount of all normal sensor nodes.
Figures 3 and 4 show that the node fault detection accuracy and fault alarm rate of DFD and CFD algorithms change with sensor fault probability. When sensor fault probability is 0.05, both algorithms have higher fault detection accuracy. When sensor fault probability is beyond 0.05, fault detection accuracy of CFD algorithm outperforms DFD, but fault alarm rate is reverse. In particular, when sensor fault probability is 0.15, fault detection accuracy of CFD reaches 0.96, while fault alarm rate is only 0.01. This is because of the fact that as the fault probability in DFD algorithm increases, the number of normal nodes becomes fewer and fewer; detection accuracy is affected easily by inaccurate data of fault nodes so that the normal nodes are judged as fault nodes.

Relationship between detection accuracy and sensor fault probability.

Relationship between false alarm rate and sensor fault probability.
From the above description, the relationship of sensor fault probability (SFP), detection accuracy (DA), and false alarm rate (FAR) between CFD algorithm and DFD algorithm is shown in Table 7.
Values of sensor fault probability, detection accuracy, and false alarm rate in CFD and DFD.
Furthermore, we test the average transmission delay of system changing with sensor fault probability when the credibility value of the node is 5. The average transmission delay is the time to detect the fault under different sensor fault probability.
As shown in Figure 5, when the sensor fault probability is between 0.1 and 0.4, the average delay of DFD algorithm increases monotonically and that of CFD algorithm grows slowly. Due to the node credibility mechanism, once there are fault nodes, CFD algorithm will inform the base station to repair or discard the fault nodes to prevent these nodes from participating in the next computation, so the delay of node fault detection shortens effectively. Therefore, CFD algorithm outperforms DFD algorithm in terms of the average delay. At last, under different degree of average neighborhood node, we compare the energy consumption of the two algorithms. We use the wireless communication model in [24], where the parameters are as follows: the original energy is 1 J; energy consumption coefficient of radio is set to 50 nJ/bit; energy consumption coefficients of power amplification circuit are set to 10 pJ/bit/m2 and 0.0013 pJ/bit/m4, respectively; energy consumption of data fusion is set to 5 nJ/bit.

Relationship between average delay and sensor fault probability.
As shown in Figure 6, the average energy consumption of CFD algorithm is lower than that of DFD algorithm regardless of the number of average neighborhood nodes. When the average node degree is 5, the average energy consumption of DFD algorithm is almost double of that of CFD algorithm. As the node degree increases, the average energy consumption of DFD algorithm increases fast due to the burden of repetitive computation. However, based on the cluster-based topology, we can finish the node fault detection, while not all nodes have to communicate with their neighborhood nodes. Simulations prove that CFD algorithm has obvious advantage in terms of average network energy consumption.

Average network energy consumption under different average neighborhood node degrees.
5. Conclusion
The method of cluster-based fault detection can reduce network detection load, improve system scalability, and hold higher practical value. In the stage of verifying credibility of the cluster head node, selecting median to sample can well reflect the true value of sample which cannot be accurately represented by the average value. In the stage of fault detection, the introduction of node credibility mechanism can avoid additional burden due to mistake diffusion. However, the proposed CFD algorithm is based on node fault model without considering link fault. Therefore, CFD algorithm will judge the normal node as a fault node if the fault is caused by link failure between nodes. Therefore, we should further consider the impact of types of fault and event boundary area on detection performances in future work.
Footnotes
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
Acknowledgments
This paper is sponsored by the New Century Program for Excellent Talents of the Ministry of Education of China, Liaoning province innovation group project (LT2011005), the Shenyang Ligong University Computer Science and Technology Key Discipline Open Foundation (2012 and 2013), Liaoning fourth batch of distinguished professor project (2014), and Liaoning BaiQianWan Talents Program (2014921042).
