A Framework and Classification for Fault Detection Approaches in Wireless Sensor Networks with an Energy Efficiency Perspective

Abstract

Wireless Sensor Networks (WSNs) are more and more considered a key enabling technology for the realisation of the Internet of Things (IoT) vision. With the long term goal of designing fault-tolerant IoT systems, this paper proposes a fault detection framework for WSNs with the perspective of energy efficiency to facilitate the design of fault detection methods and the evaluation of their energy efficiency. Following the same design principle of the fault detection framework, the paper proposes a classification for fault detection approaches. The classification is applied to a number of fault detection approaches for the comparison of several characteristics, namely, energy efficiency, correlation model, evaluation method, and detection accuracy. The design guidelines given in this paper aim at providing an insight into better design of energy-efficient detection approaches in resource-constraint WSNs.

1. Introduction

The new paradigm of the Internet of Things (IoT) envisions a computing era outside the realm of the traditional desktop, where devices (as well as any kind of object) will be more and more connected, ubiquitous, dynamic, adaptive, and even embedded, so that we will encounter and use them in a variety of contexts, sometimes even without being aware of it. With the rapid technological development of sensors, Wireless Sensor Networks (WSNs) are more and more becoming a key enabling technology to realise the IoT vision [1]. Indeed, a WSN is a network formed by a large number of sensor nodes where each node is equipped with a sensor to detect and monitor physical phenomena such as light, heat, and pressure. Compared with the wired solution, WSNs feature easier deployment and better flexibility of devices. As a result, WSNs are regarded as a key information gathering method to build the information and communication infrastructure of future IoT systems.

The correctness of sensor data is crucial to WSN applications. False data may cause severe problems in some critical applications. However, faults are inevitable and WSNs are prone to be faulty [2], which may be due to abnormal software or hardware, poor communication link quality, or depletion of battery. Recent data shows that the quality of sensor data is not so satisfactory. Szewczyk et al. [3] classified 3% to 60% of data from each sensor as faulty in a deployment at Great Duck Island. Tolle et al. [4] also discovered that only 49% of the collected data could be used for meaningful interpretation in a sensor network for examining microclimate surrounding a redwood tree.

(1) Challenges in Fault Detection in WSNs. Before processing the faults to meet application requirements, firstly we need to know whether there is a fault or not, which is usually called fault detection. Fault detection in WSNs is much more challenging mainly in the following aspects compared with that in traditional wired networks.

Resource Constraints. Traditional observing and polling are not appropriate for detecting faults in resource-constrained WSNs. Periodic polling inside the whole network will quickly deplete the power supplies of sensors because of high energy-consuming wireless communications. It is also hardly possible or practical for human beings to take care of sensor nodes in the wild, especially for those being deployed in harsh areas.

Lack of Well-Defined Models. Complexity and uncertainty of the environment under monitoring make it not easy to have a well-defined model for the natural phenomenon being observed. Consequently, most WSN applications lack empirical models for sensor behaviors, sensor readings, and sensor faults.

Network Dynamics. Most WSNs have a great number of distributed sensor nodes involved across a large area. Wireless multihop communications are easily impacted by external environments. Dynamic network conditions, such as temporary link breakdown and frequent changing topologies, increase the complexity of fault detection.

(2) Energy Efficiency. Resource constraints make WSNs distinct from other networks. There are two typical ways to prolong the lifetime of WSNs. One is reducing the energy consumption of relevant tasks. Another is harvesting energy from external environments. Here our focus is the former, reducing the energy consumption of fault detection tasks. For fault-prone WSNs, fault detection is indispensable. The energy efficiency of a fault detection approach impacts a lot on the performance of WSNs. A fault detection approach should not increase too much energy consumption burden to the major monitoring tasks of WSN applications. The fault detection approach needs to be energy-efficient enough to be deployed in real WSNs. Besides detection accuracy, fault detection approaches designing must also have energy efficiency considered carefully.

Energy consumption mainly resides in wireless communication and computation tasks, while the former consumes much more energy than the latter. We can say that the number of messages exchanged is directly related to the energy consumption of WSNs. It is difficult to directly count the number of messages exchanged in a WSN, which is dependent on the details of the detection method itself and also on the topology of the network and number of sensor nodes in the network. Nevertheless, we can figure out when and where message exchanging takes place.

(3) Related Work. Many fault detection techniques for WSNs have been proposed in recent years. Energy-efficient fault detection approaches are especially attractive when deploying real WSN applications. While current literature discusses detection approaches in different ways, it is hard to find one explicitly discussing the position of message exchanging during the process of fault detection and how this message exchanging impacts the energy efficiency of a fault detection approach.

Yu et al. [5] investigate the three-phase fault management process, that is, fault detection, diagnosis, and recovery. They discuss explicit and implicit detection, centralized and distributed approaches, neighbor coordination, clustering, and distributed detection techniques. Paradis and Han [6] also give a survey to fault management in WSNs. They describe fault prevention, detection, isolation, identification, and recovery techniques separately.

Mahapatro and Khilar [2] adopt a fault type model from [7] and provide their own taxonomy of fault detection techniques. They discuss both centralized and distributed fault diagnosis approaches. Particularly, they classify distributed approaches into several categories, including hierarchical detection, node self-detection, and clustering-based approaches from architectural viewpoint; test-based approaches, neighbor coordination approaches, soft-computing approaches, watchdog approaches, and probabilistic approaches with their focuses on how to make decision; and also diagnosis in event detection domain. What is worth mentioning is that the neighbor coordination in [2] concerns majority voting and weighted majority voting, instead of focusing only on coordination between neighbors discussed in [5].

Sharma et al. [8] classify fault detection methods into four categories: rule-based methods, estimation methods, time-series-analysis-based methods, and learning-based methods. Jurdak et al. [9] present a model including different types of WSN anomalies. They illustrate a set of anomaly detection strategies and divide them according to centralized, distributed, and hybrid architectures. They also provide some design guidelines for anomaly detection strategies. Rodrigues et al. [10] evaluate fault diagnosis tools in WSNs in a comparative way. The comparison framework consists of architectural, functional, and dynamic aspects as different dimensions.

Ding [11] classifies fault diagnosis schemes into the following four categories and gives the schematic framework of each scheme from a system view. (i)

Hardware redundancy-based schemes reconstruct identical redundant hardware components for the process components and check if the output of the process component is different from the one of that of the redundant one.

(ii)

Signal processing-based schemes focus on processing signals of the symptoms of faults such as time domain functions, frequency domain functions, and statistical methods to achieve fault diagnosis.

(iii)

Plausibility test-based schemes check some physical laws for the process components to detect the faults.

(iv)

Software/analytical redundancy-based schemes are the focuses of [11], that is, model-based fault diagnosis. Based on the well-established process modeling techniques, a process model describes characteristics, whether quantitative or qualitative, of the process behaviors. Adopting similar concept of hardware redundancy and reconstructing the process behavior online, faults are detected by comparing the output signals of the measured process with their estimates determined by the process model.

Liu et al. [12] propose a self-learning sensor fault detection framework emphasizing implementation details, such as threads and database. The framework is more like an architecture description specific to those self-learning detection approaches. It is not general enough to describe other types of fault detection approaches.

(4) Contribution and Outline of the Paper. This paper focuses on fault detection approaches of WSNs from the perspective of energy efficiency. The contributions of the paper are as follows.

A Fault Detection Framework. We abstract common tasks of fault detection approaches and propose a fault detection framework from a function view, instead of from a system view [11] or an architecture view [12], to cover as many fault detection approaches as possible. Meanwhile, instead of only figuring out how fault detection decision is calculated, we adopt in the proposed framework a perspective of energy efficiency and explicitly identify the positions of message exchanging in the fault detection process, as energy consumption due to communication is much greater than that due to computation. The proposed framework can facilitate the design process of fault detection methods and the evaluation of their energy efficiency.

A Classification of Fault Detection Approaches. We propose a classification of fault detection approaches based on the proposed fault detection framework. For better and easier evaluation of energy efficiency, the classification also focuses on identifying the position of message exchanging in fault detection approaches.

Evaluation of Fault Detection Approaches. We investigate a number of existing fault detection approaches and classify them into four categories according to the proposed classification. After describing the main idea of each approach, we evaluate energy efficiency, correlation model, evaluation methods, and detection accuracy of existing detection approaches. Besides, we give some advice for designing energy-efficient fault detection approaches.

In the rest of the paper, we first illustrate our proposed fault detection framework (Section 2). Then, in Section 3, we propose a classification of fault detection approaches. In Section 4 we compare fault detection approaches according to their characteristics. In Section 5 we sum up some guidelines for energy-efficient fault detection. Section 6 concludes the paper.

2. Fault Detection Framework

In [11], Ding gives the schematic frameworks of the fault diagnosis schemes in his classification, that is, hardware redundancy-based, signal processing-based, plausibility test-based, and model-based schemes, with a system view. With the schematic descriptions, the differences between fault diagnosis schemes are depicted. However, the schematic descriptions look very alike especially for the latter three schemes. Ding regards residual generation in model-based schemes as an extended plausibility test and considers the process input-output behavior as the plausibility. The symptoms adopted by signal processing-based schemes can also be the basis of process models in model-based ones.

As we can see, the differences between those frameworks described in [11] are not very large. It is possible to define a more general framework to cover different types of fault detection approaches. Furthermore, both Ding [11] and Liu et al. [12] have not considered energy efficiency aspect of fault detection approaches in their frameworks. Here we try to define a fault detection framework general enough to cover as many fault detection approaches as possible from a function view to facilitate the design process of fault detection approaches. We also consider energy efficiency in this framework to ease the identification and evaluation of energy efficiency of fault detection approaches.

We consider fault detection as a decision making problem and focus on its functional components with the perspective of energy efficiency. For WSNs, wireless communication consumes most part of energy. This makes information collection during fault detection process greatly dependent on energy consumption. To evaluate the energy efficiency of a fault detection approach, we need to find out where and when information collection takes place. We use the following as design principles while designing the fault detection framework: firstly, identifying the position of communication during fault detection, the most energy-consuming part, and secondly, abstracting fault detection process into several tasks.

By investigating fault detection approaches in current literature, we propose a fault detection framework in Figure 1 with the position of information collection identified. Those blocks and edges in dash line are optional. Three major components in this model are model establishment, information collection, and decision making. We describe the details of them in the following.

Figure 1

Fault detection framework.

2.1. Model Establishment

Fault detection decisions are usually made based on some assumed correlation model for describing the relationships between sensor readings or other system attributes. Here we give a categorization of typical correlation models found in the literature. (i)

Spatial correlation models assume that there is a relationship between the sensor readings of sensor nodes within a certain physical spatial range such as neighborhood [13–15], cluster [16], or logical spatial range like a group of trusted sensors [17]. Typically, the spatial correlated sensor readings are assumed with similar values.

(ii)

Temporal correlation models assume that there is a relationship between the sensor reading at timestep n and those at previous timesteps.

(iii)

Phenomenon-related correlation models assume that there is a relationship between some phenomenon-related parameters. This might be setting a threshold to some sensor readings [18, 19] or describing the relationships between different kinds of sensor readings or other system attributes, for example, between sensor readings and energy level [20] and between a set of state attributes and root causes [21].

Traditional fault detection mainly focuses on establishing system model based on physical properties. The system model usually covers temporal and phenomenon-related correlation model described in this paper. Spatial correlation model is seldom considered in the system model since the system under monitoring is usually centralized. To achieve better performance, most approaches [13–15, 17, 22, 23] consider both spatial and temporal correlations instead of only one kind of correlation. What is worth mentioning is that not all the approaches give their correlation models explicitly. Nevertheless, the correlation model can be established based on ground truth, or empirically, or through learning. The learning process, usually based on probabilistic and statistical methods and state-space analysis, can either use some existing training data or collect the data for training during the process of fault detection. Here we call these two mechanisms as offline-learning and online-learning. Obviously, the latter needs extra information collection. Based on the correlation models and model establishment mechanism, the fault detection approach collects required information and makes detection decisions.

2.2. Information Collection

The main objective of our proposed framework is to identify the position of information collection, the most energy-consuming part, in fault detection in WSNs. As analyzed in Section 2.1, information is collected either for making fault detection decisions or for establishing the correlation model. During the phase of information collection, there are three aspects that need to be considered: (i)

How to design messages: the content and size of messages are two main focuses here. Besides the assumed correlation models, the content of a message consists of the major input to the decision making phase and it is highly related to the correlation model. Most approaches collect sensor readings, while some approaches might collect other system attributes, such as transmission time [24], energy level [20], or a set of state attributes [21]. There are also some approaches collecting probabilistic decisions [25]. As to the message size, whether to use some efficient coding mechanism to describe the information collected in a compact form also has impacts on energy consumption during communication. There must be a tradeoff between compact message size and comprehensive meaning.

(ii)

How to exchange messages: wireless sensor nodes may use different ways to communicate with their counterparts. Two typical patterns are adopted during message exchanging: two-way request-response and one-way broadcasting. The former usually uses pairwise query-based messages in hierarchical topologies, while the latter is more common in flat topologies, with messages sent without being requested.

(iii)

Who receives messages: the range of information collection is greatly dependent on the correlation model. Those approaches adopt spatial correlation models that usually collect information according to the spatial correlation. For instance, messages can be exchanged between neighbors, a pair of two nodes, a set of nodes within the same cluster, or a central sensor node and other sensor nodes. For those approaches adopting temporal correlation models, information at some previous timesteps is collected.

2.3. Decision Making

To make decisions on whether there is a fault, fault detection approaches need input for calculation. As mentioned in the previous sections, one part of the input, for example, sensor readings, system attributes, or relevant probabilities, can be collected from the contents of messages exchanged. Another important part of the input is the established correlation models. They comprise the context that the fault detection approach is running in. With those models and collected information, fault detection approaches can use predefined or estimated thresholds for direct comparison or indirect inference to make their detection decisions. Statistical methods are commonly adopted in estimating thresholds. And the inference process usually involves iterated information collection with regard to the correlation models until it converges.

3. Approach Classification

As mentioned in Section 1, there are several existing classifications for fault detection approaches in WSNs [2, 5, 6, 8, 9]. However, it is not easy to design a classification to cover all the detection approaches. And most of the existing classifications focus on architectural aspects and decision making methods. They do not have energy efficiency considered in detail. According to the fault detection framework in Section 2, information collection takes place during the process of decision making and model establishment. Specifically, online learning-based model establishment and inference-based decision making will involve extra message exchanging. Following the same design principle as that of the fault detection framework, we propose a classification for fault detection approaches with the position of information collection identified. We classify current fault detection approaches in WSNs into the following categories according to the learning method, if it exists, in model establishment and the calculation method adopted in decision making. (i)

Non-Inference Non-Learning (NINL) approaches adopt direct comparison with predefined models, which could be ground truth or empirically established.

(ii)

Non-Inference OFfline-Learning (NIOFL) approaches adopt direct comparison with models learned offline.

(iii)

Non-Inference ONline-Learning (NIONL) approaches adopt direct comparison with models learned online.

(iv)

Inference Non-Learning (INL) approaches adopt indirect inference with predefined models, which could be ground truth or empirically established.

(v)

Inference OFfline-Learning (IOFL) approaches adopt indirect inference with models learned offline.

(vi)

Inference ONline-Learning (IONL) approaches adopt indirect inference with models learned online.

As mentioned in Section 2.3, inference is dependent on the assumption models. If a fault detection approach makes its detection decisions without inference, there is no need for this approach to adopt learning mechanism during model establishment. This makes Non-Inference OFfline-Learning (NIOFL) and Non-Inference ONline-Learning (NIONL) in the above classification impractical. Thus, we apply the proposed classification to distributed fault detection approaches found in the literature and organize them into the following four subsections.

3.1. Non-Inference Non-Learning

Venkataraman et al. [18] deal with permanent faults due to energy depletion to keep connections in a cluster. They define two kinds of messages for every node in a cluster to its parent and children nodes: a hello_msg including location, energy, and node ID for indicating the existence of a node and a fail_report_msg, sent by a node whose energy is going to be exhausted, triggering the failure recovery process. The detection of energy exhaustion is done by simply checking the current energy level.

Taleb et al. [19] take into account faults when a node has died or it is not able to provide data at all. They adopt the De Bruijn graph in constructing multilayer clusters. A cluster header detects faulty leaf nodes by sending test packets within the cluster and comparing test results with expected values.

3.2. Inference Non-Learning

Chen et al. [13] adopt majority voting techniques to consider hardware level faults including calibration systematic error, random noise error, and complete malfunctioning. Nevertheless, the faulty sensor nodes are still able to communicate process data. Each sensor sends its sensor readings to its neighbors regularly so that the sensor can check the differences between the sensor reading of itself and those of its neighbors. They use two predefined thresholds $θ_{1}$ and $θ_{2}$ to generate a test result $c_{i j}$ for indicating if the statuses of sensor $S_{i}$ and sensor $S_{j}$ are different. If $d_{i j}^{t}$ , the difference between sensor readings of $S_{i}$ and $S_{j}$ at time t, is larger than $θ_{1}$ , and $Δ d_{i j}^{Δ t_{l}}$ , the difference between $d_{i j}^{t_{l + 1}}$ and $d_{i j}^{t_{l}}$ , is larger than $θ_{2}$ , then the test result $c_{i j}$ will be set to $1$ , which means $S_{i}$ and $S_{j}$ are more likely in different statuses. If the status of $S_{i}$ is more likely the same as those of most of its neighbors, then $S_{i}$ will decide its tendency value $T_{i} = L G$ ; otherwise $T_{i} = L F$ . After sending this tendency value $T_{i}$ to its neighbors, $S_{i}$ can decide its status to be faulty ( $T_{i} = F T$ ) or fault-free ( $T_{i} = G D$ ) according to the number of its LG neighbors sensors with the same test results. This approach does not have any constraints on topology and it has a high detection accuracy with the requirements on the number of neighbors and higher communication overhead due to several rounds of message exchanging.

Based on [13], Jiang [15] improves the decision making criteria for detecting a sensor node in faulty status: for a node and its neighbors which are possibly normal, that is, LG neighbors, if the number of test results indicating faulty within this neighborhood is more than the number of test results indicating normal, then the status of the node is faulty (FT). The improved approach decreases the requirement on the number of neighbors without decreasing the detection accuracy.

Lee and Choi [14] detect faulty sensor nodes based on comparisons of the differences between sensor readings of neighboring nodes and dissemination of local decision made at each node. Specifically, they adopt threshold tests and aggregation of the decision to complete the fault detection. They also use time redundancy to tolerate transient faults in communication and sensor readings.

De [26] designs a faulty sensor reading detection algorithm based on weighted voting with both distance and reliability used as weight. The reliability here is derived from a localization error detection algorithm with two-way request-reply messages sent between neighbors; that is, a node sends a hello or dummy message to its neighbors and each neighbor answers a reply message with calculated relative position information included. By this way every node is able to know its position and confidential level. Afterwards a weighted voting algorithm for detecting faulty sensor readings takes place, which exploits the confidence or reliability data from the previous algorithm plus distance. This approach has no specific requirement on node degree but it is specific to long-thin topology.

Kim and Prabhakaran [27] propose non-history-based and history-based fault detection methods for a Body Sensor Network (BSN) to detect faulty sensors and sensor readings. The non-history-based approach is used for getting sufficient amount of data entries or verifying the relative position of body joints. The history-based method has lower false alarm rate. Firstly, all the sensors readings are divided into multiple motion groups, including faulty node reading as well as abnormal motion reading, by using Gaussian Mixture Model Clustering. Secondly, the method computes the posterior probability of each sensor's input vector and its nearest cluster set to detect abnormal behaviors.

Farruggia and Vitabile [23] detect faulty sensors and correct corrupted data of those faulty sensors by exploiting the assumption that the sensor readings are spatiotemporal correlated. They use the Markov Random Field (MRF) to classify sensors that work properly and, in combination with the Locally Weighted Regression model, correct sensor readings from damaged sensors with the sensor model trained from the data of working sensors and its neighbors according to the degree of the average correlation.

3.3. Inference Offline-Learning

Zhuang et al. [25] use training data to learn parameters in divergence function and joint probability distributions to establish correlation models. They design three detection algorithms: centralized detection, distributed simultaneous detection with pure decision, and distributed collective detection with probabilistic decision. The first two use sensor readings to compute the mutual divergence value. The distributed collective detection method produces probabilistic decision results and offers higher accuracy. Every node sends its initial decision in uniform distribution to its neighbors; in this way a sensor obtains N samples from the distribution to update his own distribution and then calculates the probability to be faulty. If the probability is higher than a certain threshold, then the sensor node is faulty.

Dereszynski and Dietterich [22] present a method that exploits the spatial and temporal correlations in the data to distinguish sensor failures from valid observations. Sensor data got from the SensorScope project provide background knowledge for distinguishing data anomalies. They set up a Bayesian network and extend it to a dynamic Bayesian network to describe spatial relationships between sensors and temporal correlations separately. They also incorporate the sensor model to describe the state and the observation of the sensor. Then they infer the most likely state of the sensors with spatial and temporal correlated observations included, that is, the current observations and those of the immediate past.

The research by Gao et al. [28] introduces a fault detection method based on Hidden Markov Random Field (HMRF) model by exploring spatial correlations. They use measurements during a certain period at the initial state of network deployment as training data to obtain the coefficients for estimation under the HMRF model. Each node uses its own measurement, the coefficients, and its neighbor readings for measurement estimation. Then the differences between the current and estimated measurements of the node will be checked according to a threshold. Furthermore, a majority voting by weighted confidence technique is used to ensure higher accuracy to the results of the model.

Warriach et al. [29] focus on detecting faults in sensor readings, including outliers, spikes, stuck-at, and high noise or variance. They illustrate a fault detection approach which is a combination of three methods, namely, rule-based, learning-based, and estimation-based methods. The rule-based method exploits domain and expert knowledge to construct heuristic rules for identifying faults by using histogram method. The estimation method, Linear Least-Square Estimation method, uses the spatial and temporal correlations to predict normal behavior of a sensor and identify faulty measurements. Finally, the learning-based method is designed for those WSNs applications that may not be spatiotemporally correlated. This method uses training data to infer a model, such as Hidden Markov Model or neural networks, for the faulty sensor readings and statistically calculate if a reading is faulty or not.

Nie et al. [20] present a fault detection framework by deducing the root cause of the failures without adding any additional network burden. All the sensed data are directed to the base station, where they are checked by using a self-learning failure knowledge library, which is set up according to the relationships between the sensing data and the failures in the sensor networks.

Ma et al. [21] consider faults including node crash, traffic contention, and route loop. They present an in-network diagnosis approach named Local-Diagnosis (LD2) by distributed evidence fusion operations. It uses a Naive Bayesian Model to encode the probabilistic correlation between a set of state attributes and root causes. The parameter values of the model are learned from the historical data. Every node forwards its own evidence through a fusion tree within local area and the Dempster-Shafer theory is used for the fusion of the evidence.

Salem et al. [30] present a fault detection approach for healthcare applications in medical WSNs. The proposed fault detection method adopts the decision tree algorithm to detect abnormal records. For those abnormal records, they use linear regression to predict the measurement value. The J48 decision tree model and the coefficients of the regressors are learned during the training phase. And a threshold test is used to check the difference between the predicted and current value in order to differentiate faulty sensor readings with patient health degradation. The method is based on the fact that the physiological results are correlated.

Lo et al. [31] model the sensor monitoring system as a linear dynamical system. They divide sensor nodes into arbitrary groups and detect faults based on group testing and Kalman filtering. In their experiments on real bridge vibration sensing data, they consider faults including spike, nonlinear transduction, mean drift, and excessive noise. They also use measured data to learn parameters in the dynamical model for Kalman filtering.

Lau et al. [24] propose a centralized hardware fault detection technique based on Naïve Bayes Framework. The nodes send their readings to the sink and the sink extracts end-to-end transmission time. In the training phase, they estimate conditional probability and marginal probability of transmission time with Maximum Likelihood Estimation to build the Naïve Bayes classifier to be used in a following test phase. Then they compare the mode value of the transmission time with the normal conditional probability and analyze the last five transmission times with the Naïve Bayes classifier to detect the status of the network and the faulty sensors. They analyze the performance of the approach under three different network traffic conditions for a 100-node WSN with faulty node numbers ranging from one to five.

3.4. Inference Online-Learning

Miao et al. [32] deploy a fault detection algorithm in GreenOrbs to detect ingress drops, routing failures, link failures, and node failures based on temporal and spatial correlation between system metrics. Temporal detection investigates sudden change in the correlation graph of a node, while spatial detection discovers pattern differences in the graphs of nodes with similarities. Each node in the network periodically sends 22 metrics along with sensor readings to the base station. In each time window, correlation graphs are constructed. The longer the time window is, the more the detection accuracy increases with increasing detection delay.

Ni and Pottie [17] design a two-phase modular fault detection framework which includes four modules: blind modeling, trusted sensor selection, model reevaluation, and sensor evaluation. They first use prior knowledge of the phenomenon behavior to determine the parameters of the Hierarchical Bayesian Space Time (HBST) model adopted for sensor data modeling. Based on this model, they use maximum a posteriori (MAP) selection to identify a trusted group of sensors for evaluating the received data. Using the selected sensors, they reevaluate the HBST model. Then they check if the sensor reading is within calculated bounds. The approach is evaluated with injection of two types of faults, outlier and stuck-at, defined in [8, 33]. The results show that this HBST model-based approach has better detection accuracy than their previous linear autoregressive model-based approach [34].

Kamal et al. [35] give a two-phase framework called Packet-Level Attestation (PLA). They design a learning phase to establish spatial correlations among sensor readings and choose possible verifying nodes (PVN). Then they introduce an operational phase in which a sensor node sends its reading to its verifying node, which is the one-hop neighbors. The verifying nodes check if the data is fault-free by comparison and then forward the received data packet to the sink by adding one indication bit of individual check result.

Fang et al. [36] design a two-tiered data validation framework with a two-phase in-network, hierarchical, demand-based, adaptive fault detection (DAFD) method. During the learning phase, tier-one (local) model and tier-two (spatial) model are established in each node and between local nodes. The local model is learnt by ordinary least-square (OLS) estimation to describe the correlation between temperature and humidity. And the spatial correlation model is learned based on sensor readings of one-hop neighbors. The operational phase uses the above two models to check sensor readings to determine faulty data and uses feedback of the spatial model part to update local model. They also design an adaptive spatial validation selection mechanism to use either group voting or singular spatial validation for detecting faulty data. The approach demonstrates good detection accuracy during the evaluation with consideration of faults like short, constant, noise, and drift.

Nguyen et al. [37] model sensor data faults into discontinuous faults and continuous faults. Their detection approach considers spatial and temporal correlation. In detail, a sensor has its current reading compared with a value calculated by neighbor voting technique and with an expected value predicted by AutoRegressive-Moving Average (ARMA) model, a time series data analysis model. They use maximum likelihood (ML) computational method to learn the parameters in the above ARMA model with training data before deployment. Those parameters can also be updated during runtime through online learning. The correctness of the reading is decided based on the intersection or the union of the above two techniques.

4. Comparison of Fault Detection Approaches

Table 1 compares the approaches described in the above section according to the proposed classification in Section 3. It is easy to roughly identify the energy efficiency according to the category of the fault detection approach. As we have discussed, online learning-based model establishment and inference-based decision making will involve extra message exchanging. This makes NINL approaches the most energy-efficient, while IONL approaches are the least energy-efficient and INL and IOFL approaches fall in between.

Table 1

Comparison of fault detection approaches.

Category	Author	Paper	Year	Correlation model			Evaluation method				Detection accuracy	False alarm rate
Category	Author	Paper	Year	Spatial	Temporal	Phenomenon-related	Simulation	Fault injection	Real dataset validation	Testbed	Detection accuracy	False alarm rate
NINL	Venkataraman et al.	[18]	2007			✓	✓				N/A	N/A
NINL	Taleb et al.	[19]	2010			✓	✓				N/A	N/A

INL	Chen et al.	[13]	2006	✓	✓		✓				>0.97	<0.0025
	Lee and Choi	[14]	2008	✓	✓		✓				>0.91	<0.1
	Jiang	[15]	2009	✓	✓		✓				>0.94	N/A
	De	[26]	2009	✓			✓				>0.8	<0.2
	Kim and Prabhakaran	[27]	2011	✓			✓				>0.73	N/A
	Farruggia and Vitabile	[23]	2013	✓	✓				✓		>0.96	<0.01

IOFL	Zhuang et al.	[25]	2009	✓	✓		✓				1	N/A
	Dereszynski and Dietterich	[22]	2011	✓	✓			✓	✓		>0.70	0.04–0.07
	Gao et al.	[28]	2012	✓			✓				>0.8	<0.38
	Warriach et al.	[29]	2012	✓	✓			✓	✓		>0.96	N/A
	Nie et al.	[20]	2012			✓		✓		✓	>0.9	>0.35
	Ma et al.	[21]	2012			✓		✓		✓	>0.86	<0.16
	Salem et al.	[30]	2013	✓	✓				✓		1	0.074
	Lo et al.	[31]	2013		✓				✓		>0.80	<0.02
	Lau et al.	[24]	2014			✓	✓				>0.6	<0.05

IONL	Miao et al.	[32]	2011	✓	✓				✓		N/A	N/A
	Ni and Pottie	[17]	2012	✓	✓			✓	✓		>0.965	<0.023
	Kamal et al.	[35]	2013	✓			✓	✓	✓	✓	0.88–0.98	0.052
	Fang et al.	[36]	2013	✓		✓			✓		>0.926	<0.014
	Nguyen et al.	[37]	2013	✓	✓			✓	✓		0.8–0.92	0.05–0.26

We extract the detection accuracy and false alarm rate based on the data provided by the papers we have investigated. We also explicitly identify the evaluation methods for detection accuracy and false alarm rate in Table 1, including simulation, fault injection, real dataset validation, and testbed. Although different evaluation mechanisms may impact the accuracy of comparison, it is still possible to have a relative comparison before carrying out further comparison with a common evaluation method. In the following we would discuss the relationship between detection accuracy and other listed characteristics, that is, category/energy efficiency, correlation model, and evaluation method, separately.

Category/Energy Efficiency versus Detection Accuracy. Most fault detection approaches are inference based. Unless having a comprehensive understanding of the phenomenon under monitoring, it is not easy for the NINL approaches to make precise detection decisions simply according to existing model by direct comparison. That also explains why few approaches belong to NINL. Most IONL approaches have better detection accuracy and false alarm rate as they make decisions adaptively. For INL and IOFL approaches, it is hard to tell which category can make more accurate detection decisions if we only look at the numbers listed in Table 1. Both correlation model and evaluation method that the approach adopts have impacts on the detection accuracy.

Correlation Model versus Detection Accuracy. As we can see from Table 1, those approaches adopt more than one kind of correlation models which usually have better detection accuracy than those only considering one kind of correlation model. Nevertheless, the more the correlation models considered, the more the complexity for calculating the detection decision. In most cases, correlation models are application-specific, especially for those approaches adopting phenomenon-related models.

Evaluation Method versus Detection Accuracy. There are four typical means for evaluating the detection accuracy of fault detection approaches. They are simulation, fault injection, real dataset validation, and testbed. Detection accuracy results obtained through simulation are usually better as the simulation environments are more ideal and easy to establish. Real dataset validation is another commonly adopted method, while it is a static way. Fault injection is especially useful when the detection approach is designed for detecting certain faults. Testbed, the closest to real network environment, is the most costly to setup. Only a few approaches use testbed to evaluate their approaches.

5. Discussion

Our investigation has highlighted some thoughts/guidelines that WSNs developers could take into consideration for designing energy-efficient fault detection approaches. (i)

Before designing fault detection approach, it is necessary to have a comprehensive understanding of the phenomenon under monitoring and to check if there is some existing system model that can be used as part of correlation model.

(ii)

Bayesian networks and Markov Random Field are two commonly adopted probability graph models for illustrating spatial and temporal correlations.

(iii)

When spatial correlation is considered, the detection accuracy is dependent on the number of neighbors of sensor nodes.

(iv)

There are lots of uncertainties in the deployment environments of distributed WSNs. It is not easy for WSN applications to establish precise correlation models. Online learning seems to be the most promising way to acquire precise and adaptive models. However, it is costly with regard to extra message exchanging. Unless the WSN application is very critical and the detection accuracy is more important than the energy efficiency, offline-learning is a more energy-efficient choice.

(v)

Theoretically, the detection accuracy can be improved with more precise correlation model and more information collected. However, it is not practical to pursue detection accuracy too much with energy efficiency sacrificed. Establishing more precise correlation models based on existing dataset is more feasible.

(vi)

As we have discussed in Section 2.2, what to send in messages, how to exchange messages, and who receives messages are three major aspects that have direct impacts on the efficiency of information collection. If the architecture of a WSN is centralized, overload of multihop communication should also be considered.

(vii)

Besides correlation models, threshold computation is also important during decision making. Statistical and probabilistic methods, including root-mean-square (RMS), hypothesis test, likelihood ratio (LR), Neyman-Pearson criterion, maximum a posteriori probability (MAP) criterion, and Bayes' criterion, are commonly adopted in threshold computation.

(viii)

Convergence condition is essential for the inference process during decision making which is greatly relevant to the correlation model adopted.

(ix)

We have not found existing approaches which have mobility considered. And only a few [14] consider transient faults. More dynamics need to be considered when designing fault detection to tackle mobility and transient faults.

6. Conclusion

Considering resource limitations in WSNs, in this paper, we have proposed a fault detection framework from the perspective of energy efficiency. In particular, we have focused on message exchanging, as it constitutes the most energy-consuming part of the network. Following the same design principle of the fault detection framework, we have proposed a classification for fault detection approaches, which is then used to classify and compare existing fault detection approaches. Based on the data provided by the papers we investigate, we mainly compare energy efficiency, correlation model, evaluation method, and detection accuracy of fault detection approaches. The comparison and the resulting design guidelines aim at identifying the characteristics of different existing approaches and facilitating the design of energy-efficient fault detection in WSNs. While investigating existing detection approaches, we noticed that event detection and anomaly detection in WSNs adopt similar mechanisms for fault detection. We are going to extend our detection framework and classification to the above field in the future work.

Footnotes

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

This paper is finished under the support of China Scholarship Council during Yue Zhang's visit at DTU in the context of the IDEA4CPS project, National Natural Science Foundation of China (no. 61361136002), National Science and Technology Major Project (no. 2014ZX01038-101-001), Shanghai Committee of Science and Technology, China (no. 14511100400), and National Trusted Embedded Software Engineering Technology Research Center, China (no. 2012FU125X15).

References

Wireless Sensor Networks Project Team. IEC Market Strategy Board

Internet of Things: Wireless Sensor Networks (White Paper)

July 2015, http://www.iec.ch/whitepaper/pdf/iecWP-internetofthings-LR-en.pdf

Mahapatro

Khilar

P. M.

Fault diagnosis in wireless sensor networks: a survey

IEEE Communications Surveys and Tutorials 2013 15 4 2000 2026

10.1109/surv.2013.030713.00062

2-s2.0-84888347505

Szewczyk

Mainwaring

Polastre

Anderson

Culler

An analysis of a large scale habitat monitoring application

Proceedings of the 2nd International Conference on Embedded Networked Sensor Systems (SenSys '04)

November 2004

Baltimore, Md, USA

ACM

214 226

10.1145/1031495.1031521

Tolle

Polastre

Szewczyk

Culler

Turner

Burgess

Dawson

Buonadonna

Gay

Hong

A macroscope in the redwoods

Proceedings of the 3rd ACM International Conference on Embedded Networked Sensor Systems (SenSys '05)

November 2005

San Diego, Calif, USA

ACM

51 63

10.1145/1098918.1098925

2-s2.0-84905819805

Mokhtar

Merabti

A survey on fault management in wireless sensor networks

Proceedings of the 8th Annual Postgraduate Symposium

2007

Paradis

Han

A survey of fault management in wireless sensor networks

Journal of Network and Systems Management 2007 15 2 171 190

10.1007/s10922-007-9062-0

2-s2.0-34250722535

Barborak

Dahbura

Malek

The consensus problem in fault-tolerant computing

ACM Computing Surveys 1993 25 2 171 220

10.1145/152610.152612

2-s2.0-0027610755

Sharma

A. B.

Golubchik

Govindan

Sensor faults: detection methods and prevalence in real-world datasets

ACM Transactions on Sensor Networks June 2010 6 3, article 23 1 39

10.1145/1754414.1754419

Jurdak

Wang

X. R.

Obst

Valencia

Wireless sensor network anomalies: diagnosis and detection strategies

Intelligence-Based Systems Engineering 2011 chapter 12

Springer

309 325

10.1007/978-3-642-17931-0_12

10.

Rodrigues

Camilo

Silva

J. S.

Boavida

Diagnostic tools for wireless sensor networks: a comparative survey

Journal of Network and Systems Management 2013 21 3 408 452

10.1007/s10922-012-9240-6

2-s2.0-84879027816

11.

Ding

S. X.

Model-based Fault Diagnosis Techniques—Design Schemes, Algorithms and Tools 2013

Springer

12.

Liu

Yang

Wang

A self-learning sensor fault detection framework for industry monitoring IoT

Mathematical Problems in Engineering 2013 2013 8

712028

10.1155/2013/712028

2-s2.0-84887358733

13.

Chen

Kher

Somani

Distributed fault detection of wireless sensor networks

Proceedings of the Workshop on Dependability Issues in Wireless Ad Hoc Networks and Sensor Networks (DIWANS '06)

September 2006

Los Angeles, Calif, USA

ACM

65 72

10.1145/1160972.1160985

2-s2.0-34247378242

14.

Lee

M.-H.

Choi

Y.-H.

Fault detection of wireless sensor networks

Computer Communications 2008 31 14 3469 3475

10.1016/j.comcom.2008.06.014

2-s2.0-49649097264

15.

Jiang

A new method for node fault detection in wireless sensor networks

Sensors 2009 9 2 1282 1294

10.3390/s90201282

2-s2.0-63849103054

16.

Khazaei

Barati

Movaghar

Improvement of fault detection in wireless sensor networks

Proceedings of the ISECS International Colloquium on Computing, Communication, Control, and Management (CCCM '09)

August 2009

Sanya, China

644 646

10.1109/cccm.2009.5267508

2-s2.0-70449815695

17.

Pottie

Sensor network data fault detection with maximum a posteriori selection and bayesian modeling

ACM Transactions on Sensor Networks 2012 8 3, article 23

10.1145/2240092.2240097

2-s2.0-84866347229

18.

Venkataraman

Emmanuel

Thambipillai

A cluster-based approach to fault detection and recovery in wireless sensor networks

Proceedings of the 4th IEEE International Symposium on Wireless Communication Systems 2007 (ISWCS '07)

October 2007

Trondheim, Norway

35 39

10.1109/iswcs.2007.4392297

2-s2.0-84890860457

19.

Taleb

A. A.

Mathew

Pradhan

D. K.

Fault diagnosis in multi layered de Bruijn based architectures for sensor networks

Proceedings of the 8th IEEE International Conference on Pervasive Computing and Communications Workshops (PERCOM '10)

April 2010

Mannheim, Germany

456 461

10.1109/PERCOMW.2010.5470627

2-s2.0-77954007967

20.

Nie

Passive diagnosis for WSNs using data traces

Proceedings of the 8th IEEE International Conference on Distributed Computing in Sensor Systems (DCOSS '12)

May 2012

Hangzhou, China

IEEE

273 280

10.1109/dcoss.2012.63

2-s2.0-84864192057

21.

Liu

Miao

Liu

Sherlock is around: detecting network failures with local evidence fusion

Proceedings of the IEEE Conference on Computer Communications (INFOCOM '12)

March 2012

Orlando, Fla, USA

IEEE

792 800

10.1109/infcom.2012.6195826

2-s2.0-84861596876

22.

Dereszynski

E. W.

Dietterich

T. G.

Spatiotemporal models for data-anomaly detection in dynamic environmental monitoring campaigns

ACM Transactions on Sensor Networks 2011 8 1, article 3 36

10.1145/1993042.1993045

2-s2.0-80053007309

23.

Farruggia

Vitabile

A novel approach for faulty sensor detection and data correction in wireless sensor network

Proceedings of the IEEE 8th International Conference on Broadband and Wireless Computing, Communication and Applications (BWCCA '13)

October 2013

Compiègne, France

IEEE

36 42

10.1109/bwcca.2013.15

2-s2.0-84893269201

24.

Lau

B. C. P.

E. W. M.

Chow

T. W. S.

Probabilistic fault detector for wireless sensor network

Expert Systems with Applications 2014 41 8 3703 3711

10.1016/j.eswa.2013.11.034

2-s2.0-84891773824

25.

Zhuang

Wang

Shang

Distributed faulty sensor detection

Proceedings of the IEEE Global Telecommunications Conference (GLOBECOM '09)

December 2009

Honolulu, Hawaii, USA

1 6

10.1109/glocom.2009.5425702

2-s2.0-77951549912

26.

A distributed algorithm for localization error detection-correction, use in in-network faulty reading detection: applicability in long-thin wireless sensor networks

Proceedings of the IEEE Wireless Communications and Networking Conference (WCNC '09)

April 2009

Budapest, Hungary

1 6

10.1109/wcnc.2009.4917497

2-s2.0-70349179783

27.

Kim

D.-J.

Prabhakaran

Motion fault detection and isolation in body sensor networks

Proceedings of the 9th IEEE International Conference on Pervasive Computing and Communications (PerCom '11)

March 2011

Seattle, Wash, USA

IEEE

147 155

10.1109/percom.2011.5767579

2-s2.0-79957969237

28.

Gao

Wang

Zhang

HMRF-based distributed fault detection for wireless sensor networks

Proceedings of the IEEE Global Communications Conference (GLOBECOM '12)

December 2012

Anaheim, Calif, USA

IEEE

640 644

10.1109/glocom.2012.6503185

2-s2.0-84877649193

29.

Warriach

E. U.

Nguyen

T. A.

Aiello

Tei

A hybrid fault detection approach for context-aware wireless sensor networks

Proceedings of the 9th IEEE International Conference on Mobile AdHoc and Sensor Systems (MASS '12)

October 2012

Las Vegas, Nev, USA

IEEE

281 289

10.1109/mass.2012.6502527

2-s2.0-84877659465

30.

Salem

Guerassimov

Mehaoua

Marcus

Furht

Sensor fault and patient anomaly detection and classification in medical wireless sensor networks

Proceedingsof the IEEE International Conference on Communications (ICC '13)

June 2013

Budapest, Hungary

4373 4378

10.1109/icc.2013.6655254

2-s2.0-84891366564

31.

Liu

Lynch

J. P.

Gilbert

A. C.

Efficient sensor fault detection using combinatorial group testing

Proceedings of the 9th IEEE International Conference on Distributed Computing in Sensor Systems (DCoSS '13)

May 2013

Cambridge, Mass, USA

IEEE

199 206

10.1109/dcoss.2013.57

2-s2.0-84883427591

32.

Miao

Liu

Papadias

Agnostic diagnosis: discovering silent failures in wireless sensor networks

Proceedings of the IEEE International Conference on Computer Communications (INFOCOM '11)

April 2011

Shanghai, China

1548 1556

10.1109/infcom.2011.5934945

2-s2.0-79960850362

33.

Ramanathan

Chehade

M. N. H.

Balzano

Nair

Zahedi

Kohler

Pottie

Hansen

Srivastava

Sensor network data fault types

ACM Transactions on Sensor Networks 2009 5 3 1 29

10.1145/1525856.1525863

2-s2.0-67651030467

34.

Pottie

Bayesian selection of non-faulty sensors

Proceedings of the IEEE International Symposium on Information Theory (ISIT '07)

June 2007

Nice, France

616 620

10.1109/isit.2007.4557293

2-s2.0-51649101341

35.

Kamal

A. R. M.

Bleakley

Dobson

Packet-level attestation (PLA): a framework for in-network sensor data reliability

ACM Transactions on Sensor Networks 2013 9 2, article 19

10.1145/2422966.2422976

2-s2.0-84876065933

36.

Fang

Dobson

Hughes

An error-free data collection method exploiting hierarchical physical models of wireless sensor networks

Proceedings of the 10th ACM Symposium on Performance Evaluation of Wireless Ad Hoc, Sensor, & Ubiquitous Networks (PE-WASUN '13)

2013

Barcelona, Spain

81 88

10.1145/2507248.2507255

37.

Nguyen

T. A.

Bucur

Aiello

Tei

Applying time series analysis and neighbourhood voting in a decentralised approach for fault detection and classification in WSNs

Proceedings of the 4th Symposium on Information and Communication Technology (SoICT '13)

December 2013

Da Nang, Vietnam

ACM

234 241

10.1145/2542050.2542080

2-s2.0-84893018045