Sage Journals: Discover world-class research

Abstract

In existing anomaly detection approaches, sensor node often turns to neighbors to further determine whether the data is normal while the node itself cannot decide. However, previous works consider neighbors' opinions being just normal and anomalous, and do not consider the uncertainty of neighbors to the data of the node. In this paper, we propose SLAD (subjective logic based anomaly detection) framework. It redefines opinion deriving from subjective logic theory which takes the uncertainty into account. Furthermore, it fuses the opinions of neighbors to get the quantitative anomaly score of the data. Simulation results show that SLAD framework improves the performance of anomaly detection compared with previous works.

1. Introduction

Recently wireless sensor networks (WSNs) have been widely used in military surveillance, traffic monitoring, habitat monitoring and object tracking, and so forth [1, 2]. Such networks deploy lots of sensor nodes with sensing, data processing, and wireless communication capabilities in the monitoring area. Sensor nodes are resource-constrained and susceptible to interference from the environment so that their sensing data are often unreliable. Potential sources of anomalous data in WSNs are classified into three categories: faults (errors), events, and malicious attacks [3, 4]. While sensor nodes fail, their sensing data are faulty data [5]. Once the number of faulty data increases, it will bring great influence on the user query. Thus, they should be eliminated or corrected. When some event happens, the sensing data of the nodes in the area are informational data, which are different from the normal data. They should be reported to user for further deciding. The thirdly potential source of anomalous data is attacks which are beyond the scope of this paper. Anomaly detection is considered as a solution to detect faulty data and informational data.

In existing anomaly detection approaches, sensor nodes turn to neighbors to further determine whether the data is normal while the node itself cannot decide. In this process, existing solutions, including voting algorithms [6, 7] and aggregation frameworks [8– 10] which detect anomaly in the process of aggregating data, provide neighbors' opinions being just normal and anomalous. However, no neighbor can always say that the data of the node are absolutely normal or anomalous, and something is neglected by previous works which we call uncertainty. Thus, taking the degree for neighbors' opinions about the data being normal or anomalous into account can more realistically describe the view of neighbors. Consequently, the performance of anomaly detection is able to be improved.

In this paper, we propose SLAD (subjective logic-based anomaly detection) framework, which takes uncertainty into account, to improve the performance of anomaly detection. It includes three phases: preprocessing, self-monitoring, and cooperant detecting. Among them, pre-processing run on sink and self-monitoring execute on each node. After the two phases, sensor nodes send suspicious data to its neighbors to turn to further determine. The third phase is the key of our framework.

The important element of SLAD is ESLB (extended subjective logic-based algorithm), which is the key of the third phase mentioned above. Before plunging into the detail of ESLB, we first propose SLB (subjective logic-based algorithm) which elementarily describe our work. In SLB, each neighbor gives the quantitative opinion to the suspicious data involving with subjective logic theory. After fusing the opinions of all the neighbors, SLB gets the quantitative anomaly score, which demonstrates the degree of the suspicious data being considered as an anomaly. We extend SLB to ESLB in order to avoid the impact of those neighbors whose data are suspicious, effectively distinguish the faulty data from the informational data, and take the historical spatial correlations of the node and its neighbors into account.

The main contributions of this paper are as follows. (i)

Proposes SLAD framework which takes the uncertainty of neighbors to the data of the node into account. It redefines opinion deriving from subjective logic theory and can more realistically describe the view of neighbors on the data of the node.

(ii)

Presents SLB and ESLB algorithms. SLB fuses all the opinions of neighbors for the data of the node to get the quantitative anomaly score of the data. We extend SLB to ESLB to improve the performance further.

(iii)

Constructs the experiments to verify the detection performance of the framework we propose. Simulation results show that SLAD framework is effective and gains a lot of performance improvement of anomaly detection compared with the previous approaches.

The rest of the paper is organized as follows. Section 2 summarizes the related work of this paper. Section 3 presents preliminary concepts. Our framework SLAD is introduced in Section 4. Section 5 gives SLB algorithm and its extended algorithm ESLB. Section 6 discusses some problems which are not involve in the above sections. Section 7 describes the experimental setup and evaluates the performance of framework in realistic data set. Finally, Section 8 concludes the paper.

2. Related Work

Lots of efforts have been made in recent years to detect the anomaly in wireless sensor networks. We briefly survey the recent researches relevant to our work as follows.

First category involves voting algorithm and its improved algorithms. Authors in [6] propose majority voting algorithm. If some node v is aware that it's sensing data x maybe anomalous, it sends x to its all one-hop neighbors. Each neighbor $v^{'}$ compares x with its sensing data $x^{'}$ . If the difference is less than the threshold, $v^{'}$ casts a positive vote for v, otherwise casts a negative vote. Node v collects all the votes of its neighbors and gets the determination. If the number of positive votes is more than negative votes, x is thought to be normal, otherwise is anomalous. Based on majority voting algorithm [6, 7] proposes weighted voting algorithm which considers that the neighbors who are closer to the node should have greater weights. Authors in [11] discuss how to detect the faulty (erroneous) data in WSNs. It uses extended Jaccard's coefficient to compute the similarity degree between sensor nodes and set the different levels for the nodes to set up the correlation network. It presents an efficient two-phase voting algorithm called TrustVoting to determine whether the data is faulty. However, the algorithms mentioned above provide neighbors' opinions being just normal and anomalous. In addition, taking the degree for neighbors' opinions about the data being normal or anomalous into account is able to improve detection performance [4].

Second category is to detect anomaly in the process of aggregating data in the network. Authors in [8] propose a robust aggregate framework, which performs the similarity tests among sensor nodes to classify the particular node as anomaly. It returns the aggregate results excluding anomaly, which is also maintained and sent to the users. Furthermore, authors in [9] define minimum support MinSupp, which is the minimum count of sensor nodes to prove the data of the node being normal or anomalous. For some node holds on anomalous data, if it has MinSupp number of nodes whose data are similar to it, it is determined that some events happen, otherwise it is faulty data. On this basis, [10] present the in-network anomaly detection framework based on position sensitive hash function. It achieves the load balance of the network. Using comparison pruning methods, it assures the detection performance and energy efficiency. Authors in [12] introduce PAO framework to reliably and efficiently detect anomaly in WSNs, which is able to operate over multiple window type, and operate in exact or approximate mode suiting for a variety of application requirements. However, the outputs of similarity test for all these frameworks mentioned above are also only yes or no, which depends on the prethreshold, and do not provide quantitative determination, which are similar to the voting algorithms.

The third one regard the sensing data of the nodes as time-series data to some extent. Authors in [13, 14] construct autoregressive (AR) models for sensor nodes. Every sensor node sends the coefficients of the models to sink after establishing AR models, and sink estimates approximate values of the sensor nodes in the following rounds without getting real data from sensor nodes. Thus, it reduces the number of messages sent in the network a lot. Once the data are no longer predicable from AR models, it maybe due to that the models are not suitable to the data or anomalous data arise. If the reason is the former, it needs reconstructing AR models and repeating the process mentioned above. Otherwise, the anomalous data are identified to be eliminated or corrected. Authors in [13] use two thresholds to distinguish them. However, the approach only relies on the predefined thresholds and does not employ the spatial correlations among sensor nodes. If taking spatial correlations into account, it can make full use of neighbors' opinions to achieve better performance of anomaly detection.

According to the above-related works, we can draw the conclusion that providing quantitative opinions is very important for anomaly detection after self-monitoring on each node in WSNs. As we know, in subjective logic theory, the subjects express subjective beliefs about the truth of the objects with degree of uncertainty and indicate subjective belief ownership whenever required [15, 16]. Subjective logic provides the quantitative evaluation for the trust degree of the object. From this perspective, judgment among the adjacent nodes in WSNs is similar to trust evaluation. So we take subjective logic theory into the anomaly detection in WSNs. Subjective logic is involved to offer quantitative neighbors' opinions about the suspicious data of the node.

Besides, authors in [17– 19] use machine learning techniques to detect anomaly in WSNs, which are different from our solution. For machine learning techniques are resource intensive that are difficult to be implemented on sensor nodes, the early studies, for example [17], run their algorithms on gateway (or sink). Authors in [17] identify anomalies in critical gas monitoring using offline echostate network in an underground coal mine. The following researches try to do something to make it possible to run the algorithms on sensor nodes. Authors in [18] compares and classifies the input signals in accordance with online learned prototypes on node-level, and then sends the results of classification to a fusion center for further processing. Based on [17], the authors in [19] propose a general anomaly detection framework which unifies fault and event detection. It runs on sensor nodes, distinguishes faults from events, and improves the performance of detection. The focuses of [18, 19] are how to select appropriate machine learning techniques and then decrease the complexity to make the algorithms be suitable to run on nodes. It is different from our solution, the difficulty of which is how to provide the quantitative neighbors' opinions to improve the performance of detection.

3. Preliminaries

Suppose that a sensor network is modeled as an undirected connected graph $𝔾 = (𝕍, 𝔼)$ , where 𝕍 is the set of all sensor nodes (including n sensor nodes $v_{1}, \dots, v_{n}$ and one sink $v_{0}$ , denoted as $𝕍 = V_{n} \cup v_{0}$ ) and 𝔼 is the set of the edges. An anomaly is defined as a measurement that significantly deviates from the normal pattern of the sensing data [3]. Generally, the anomaly mentioned in this paper includes fault (error) and event, and the anomalous data includes the faulty (erroneous) data and the informational data, respectively.

For the data of sensor nodes can be regarded as time series data [13, 14], we construct AR model on each node. Suppose that the data of node $v_{i}$ at time t can be denoted by $AR (p)$ as $x_{i t} = \sum_{k = 1}^{p} φ_{k} x_{i (t - k)} + ɛ$ , where $x_{i (t - k)}$ is the data of $v_{i}$ at time $t - k (1 \leq k \leq p)$ , $φ_{k}$ is the corresponding coefficient of $x_{i (t - k)}$ , and ɛ is the random error and is the normal distribution of the mean being 0 and the variance being $σ^{2}$ . After that, given $Φ = {[φ_{1} \dots φ_{p}]}^{'}$ and $X_{t} = {[x_{1 t} \dots x_{n t}]}^{'}$ we can get $\hat{Φ}$ and ${\hat{X}}_{t}$ . Among them, $\hat{Φ}$ is the linear and the least variance-unbiased estimation of Φ, and ${\hat{X}}_{t}$ is the unbiased estimation of $X_{t}$ :

\begin{matrix} \hat{Φ} & = {[{\hat{φ}}_{1} \dots {\hat{φ}}_{p}]}^{'} = {(Y^{'} Y)}^{- 1} Y^{'} Z, \\ {\hat{X}}_{t} & = {\hat{φ}}_{1} X_{t - 1} + \dots + {\hat{φ}}_{p} X_{t - p}, \end{matrix}

(1)

where

Y = {[X_{j} \dots X_{j - p + 1}]}_{j = p \dots M - 1}, X_{j} = {[x_{1 j} \dots x_{n j}]}^{'}, Z = {[X_{p + 1} \dots X_{M}]}^{'}

. At last, given the confidence level 1-α, the confidence interval of the estimate value

{\hat{X}}_{t}

\begin{matrix} ({\hat{X}}_{t} \pm t_{α / 2} (M - 2 p) \cdot \hat{σ} \sqrt{1 + Y_{0} {(Y^{'} Y)}^{- 1} Y_{0}^{'}}) . \end{matrix}

(2)

We make the following assumptions about our framework. (1)

The wireless sensor network is static, and the topology does not change in the network lifetime.

(2)

All sensor nodes are homogeneous and have the same energy and capabilities, and there is only one sink which holds on infinite energy.

(3)

Sensor nodes are deployed densely; that is, if some events happen in the network, adjacent sensor nodes (one-hop neighbors) can monitor them at the same time. Of course, the situation can be extended to not densely deployed, which will be discussed in Section 6.

4. SLAD Framework

SLAD framework consists of three phases: preprocessing, self-monitoring, and cooperant detecting. Among them, preprocessing phase is executed on sink, self-monitoring run on each node, and cooperant detecting is semidistributed algorithm, that is, run on sink and sensor node.

In the first phase, all sensor nodes collect N rounds of data and transmit them to sink. Sink constructs autoregressive models $AR (p)$ and uses the least squares to estimate the coefficients $φ_{k} (1 \leq k \leq p)$ . As for ɛ, it is estimated by use of the first M rounds of data. Using the least p rounds of data and the coefficients $φ_{k}$ , we get the estimate value of the nodes. After that, using the last N-M rounds data, we get the confidence interval $(\hat{X} \pm c_{i t})$ under the given confidence level 1-α.

For each node $v_{i}$ , if its data $x_{i t}$ at time t is within the range of its confidence interval $({\hat{x}}_{i t} \pm c_{i t})$ , it is considered as normal, otherwise anomalous. However, this computation run on each node, if it is computed at each round on each node, the computational complexity is so high as to consume too much energy, which significantly leads to increased energy consumption. Consequently, a simple approach is taken to approximate as shown below. Through the use of $φ_{k}$ , each node predicts the latest N-M rounds of data and compares them with the real data to get the average value of the confidence intervals of those N-M rounds data, which is set as approximate confidence interval $({\hat{x}}_{i t} \pm τ_{i})$ at the given confidence level. Then it reduces the computational complexity on each node a lot. Sink sends the messages to each node including p coefficients of its AR model and its respectively approximate confidence interval.

In the second phase, each node uses p coefficients of its AR model and the most recently p rounds of data to predict current round of data. If the difference between the predicative data and the real data is less than the threshold τ, SLAD considers the data as normal. Otherwise, the data is regarded as suspicious which needs to be determined further among adjacent neighbors. It is noted that, if the data is thought to be normal, it does not compute the confidence interval. However, while v considers $x_{t}$ to be suspicious, it computes $({\hat{x}}_{t} \pm c_{t})$ at 1-α. And then, it sends the message to all its one-hop neighbors, which include $x_{t}$ and $({\hat{x}}_{t} \pm c_{t})$ .

In the third phase, sensor node whose data is suspicious sends its data to all its neighbors, and each neighbor produces its opinion about the suspicious data. SLAD fuses all the neighbors' opinions and gets the expectation of the consensus opinion. And thus we get the anomaly score of the suspicious data. If the anomaly score is more than the threshold, the suspicious data is anomalous, or else the data is normal. Additionally, to avoid the impact of those neighbors' opinions whose sensing data are suspicious, SLAD removes those opinions from the consensus opinion. In order to take the historical spatial correlations of the node and its neighbor nodes into account, SLAD computes the neighbors' opinions in another way. For the reason of different treatments to faulty data and informational data, SLAD adopt the approach as follows. The suspicious data, if anomalous, is to be marked as faulty data. When the faulty data of sensor nodes at this round are all sent to sink, sink distinguishes faulty data and informational data by employing the spatial correlations of adjacent nodes. The detailed process will be discussed further in Section 5. The third phase is the fundamental step of SLAD framework, which will be discussed in detail in Section 5.

5. Subjective Logic-Based Algorithms

In WSNs, no neighbor can always say that the data of the node are absolutely normal or anomalous, and something is neglected by previous works which we call uncertainty. On the other hand, subjective logic theory is suitable to model the situations with consideration to uncertainty. This drives us to involve subjective logic theory in anomaly detection to improve the detection performance.

Before detailing the subjective logic-based algorithms, it is necessary to address three problems, including expressiveness of neighbors' opinions, value assignment of neighbors' opinions, and consensus of neighbors' opinions. With the solutions of the problems, we propose SLB and ESLB which is the extension of SLB.

5.1. Expressiveness of Neighbors' Opinions

Definition 1.

Given sensor network $𝔾 = (𝕍, 𝔼), v, v_{i} \in 𝕍, (v, v_{i}) \in 𝔼$ , the opinion of the neighbor $v_{i}$ about the sensing data of node v is defined as follows:

\begin{matrix} ω_{v}^{v_{i}} = (s_{v}^{v_{i}}, d_{v}^{v_{i}}, u_{v}^{v_{i}}, a_{v}^{v_{i}}), s_{v}^{v_{i}} + d_{v}^{v_{i}} + u_{v}^{v_{i}} = 1, \end{matrix}

(3)

where

s_{v}^{v_{i}}

is the degree of belief that neighbor

v_{i}

considers the data of node v to be normal.

d_{v}^{v_{i}}

is the degree of disbelief that

v_{i}

considers the data of node v to be anomalous.

u_{v}^{v_{i}}

is the degree of uncertainty that

v_{i}

regards the data of node v as normal or anomalous.

a_{v}^{v_{i}}

is the base rate of that

v_{i}

regards the data of node v as normal or anomalous (i.e., a priori probability).

Definition 1 defines neighbor $v_{i}$ 's opinion about the degree of node v's data. $s_{v}^{v_{i}}, d_{v}^{v_{i}}$ and $u_{v}^{v_{i}}$ are combined to express the opinion thoroughly. The following problem is how to determine the opinion $ω_{v}^{v_{i}}$ of neighbor $v_{i}$ about the data of node v.

5.2. Value Assignment of Neighbors' Opinions

In this section, we discuss how to determine neighbor's opinion $ω_{v}^{v_{i}}$ . We compute the similarity degree and difference degree of node v and $v_{i}$ to denote as $s_{v}^{v_{i}}$ and $d_{v}^{v_{i}}$ , respectively. It is worth mentioning that the sum of $s_{v}^{v_{i}}$ and $d_{v}^{v_{i}}$ maybe more than one by use of the above method. In the case, we should scale the sum down to no more than one because of the requirement of the subjective logic theory. $u_{v}^{v_{i}}$ is equal to subtract the sum of $s_{v}^{v_{i}}$ and $d_{v}^{v_{i}}$ from one.

To scale them down, we take advantage of the observation that the data of the nodes are changing smoothly most of the time and changing nonsmoothly every some periods for the reason the sampling rates of the nodes are high in WSNs. We have taken into account the data trends while constructing AR model. So we just use the data at the current round to determine neighbors' opinions while the data are changing smoothly. Only while the data are changing non-smoothly, we use several rounds of data to get the neighbors' opinions. As we know, data trends of the nodes can be get according to historical data.

The detailed opinion $ω_{v}^{v_{i}}$ of neighbor $v_{i}$ about the data of node v is determined as follows. (1)

If the data are changing smoothly,

s_{v}^{v_{i}} = {\begin{matrix} \frac{x_{i}}{x}, & x_{i} \leq x \\ \frac{x}{x_{i}}, & x < x_{i}, \end{matrix} d_{v}^{v_{i}} = \frac{2 \cdot | x_{i} - x |}{(x_{i} + x)},

(4)

where

x_{i}

and x are the data of node

v_{i}

and node v, respectively, at current round. If

s_{v}^{v_{i}} + d_{v}^{v_{i}} > 1

, the sum is scaled down to no more than one.

u_{v}^{v_{i}} = 1 - s_{v}^{v_{i}} - d_{v}^{v_{i}} \cdot a_{v}^{v_{i}}

is the prior probability of

v_{i}

's opinion about v's data, that is, the expectation of the prior opinion. Initially it is set to 0.5; that is,

v_{i}

considers the probability of the data of v being normal and anomalous is 0.5.

(2)

If the data are changing nonsmoothly,

\begin{gathered} s_{v}^{v_{i}} = \frac{X_{i} \cdot X}{{‖ X_{i} ‖}^{2} + {‖ X ‖}^{2} - X_{i} \cdot X}, \\ d_{v}^{v_{i}} = \sum_{j = 1}^{l} \frac{2 \cdot | X_{i} (j) - X (j) |}{l \cdot (X_{i} (j) + X (j))}, \end{gathered}

(5)

where

X_{i} = [x_{1 i} \dots x_{l i}], X = [x_{1} \dots x_{l}]

, supposing the current round is

l, X_{i}

and X are the vector data of node

v_{i}

and v from 1 round to l rounds,

X_{i} (j)

and

X (j)

are the jth element of

X_{i}

and X, l is the length of vector data (

X_{i}

and X). If

s_{v}^{v_{i}} + d_{v}^{v_{i}} > 1

, the sum is scaled down to no more than one.

u_{v}^{v_{i}} = 1 - s_{v}^{v_{i}} - d_{v}^{v_{i}} \cdot a

is same as above.

5.3. Consensus of Neighbors' Opinions

The opinions of neighbors $v_{i}$ and $v_{j}$ about node v's data can be fused to get the consensus which is the new opinion about the proposition on node v's data being anomalous according to Lemma 2.

Lemma 2.

Given $v, v_{i}, v_{j} \in 𝕍, (v, v_{i}) \in 𝔼, (v, v_{j}) \in 𝔼, ω_{v}^{v_{i}} = (s_{v}^{v_{i}}, d_{v}^{v_{i}}, u_{v}^{v_{i}}, a_{v}^{v_{i}})$ and $ω_{v}^{v_{j}} = (s_{v}^{v_{j}}, d_{v}^{v_{j}}, u_{v}^{v_{j}} a_{v}^{v_{j}})$ are the opinions of neighbors $v_{i}$ and $v_{j}$ about the data of node $v, ω_{v}^{v_{i}, v_{j}} = (s_{v}^{v_{i}, v_{j}}, d_{v}^{v_{i}, v_{j}}, u_{v}^{v_{i}, v_{j}}, a_{v}^{v_{i}, v_{j}})$ is the consensus of two neighbors' ( $v_{i}$ and $v_{j}$ ) opinions about the proposition on node v's node being anomalous, it can be computed as follows. Let $k = u_{v}^{v_{i}} + u_{v}^{v_{j}} - u_{v}^{v_{i}} u_{v}^{v_{j}}$ .

If $k \neq 0$ ,

\begin{matrix} s_{v}^{v_{i}, v_{j}} & = \frac{d_{v}^{v_{i}} u_{v}^{v_{j}} + d_{v}^{v_{j}} u_{v}^{v_{i}}}{k}, \\ d_{v}^{v_{i}, v_{j}} & = \frac{s_{v}^{v_{i}} u_{v}^{v_{j}} + s_{v}^{v_{j}} u_{v}^{v_{i}}}{k}, \\ u_{v}^{v_{i}, v_{j}} & = \frac{u_{v}^{v_{i}} u_{v}^{v_{j}}}{k}, \\ a_{v}^{v_{i}, v_{j}} & = \frac{(1 - a_{v}^{v_{i}}) (1 - u_{v}^{v_{i}}) u_{v}^{v_{j}} + (1 - a_{v}^{v_{j}}) (1 - u_{v}^{v_{j}}) u_{v}^{v_{i}}}{k - u_{v}^{v_{i}} u_{v}^{v_{j}}} . \end{matrix}

(6)

If $k = 0$ ,

\begin{matrix} s_{v}^{v_{i}, v_{j}} & = \frac{d_{v}^{v_{j}} + d_{v}^{v_{i}} γ}{γ + 1} \\ d_{v}^{v_{i}, v_{j}} & = \frac{s_{v}^{v_{j}} + s_{v}^{v_{i}} γ}{γ + 1} \\ u_{v}^{v_{i}, v_{j}} & = 0 \\ a_{v}^{v_{i}, v_{j}} & = \frac{(1 - a_{v}^{v_{j}}) + γ (1 - a_{v}^{v_{i}})}{γ + 1}, \end{matrix} \begin{matrix} γ = \lim (\frac{u_{v}^{v_{j}}}{u_{v}^{v_{i}}}) \end{matrix}

(7)

Proof.

From [15], we know that posteriori probabilities (ppdf) of binary events can be expressed as

\begin{matrix} f (p | r, t, a) & = \frac{Γ (r + t + 2)}{Γ (r + 2 a) Γ (t + 2 (1 - a))} p^{r + 2 a - 1} \\ \begin{matrix} \times {(1 - p)}^{t + 2 (1 - a) - 1}, \end{matrix} \end{matrix}

(8)

where

0 \leq p \leq 1

r \geq 0

t \geq 0

0 < a < 1

Here $r, t$ , and a represent positive evidence, negative evidence, and relative atomicity (base rate), respectively. The probability expectation value is $E (f (p)) = (r + 2 a) / (r + t + 2)$ .

Let $f (p | r_{v}^{v_{i}}, t_{v}^{v_{i}}, a_{v}^{v_{i}})$ and $f (p | r_{v}^{v_{j}}, t_{v}^{v_{j}}, a_{v}^{v_{j}})$ be two ppdfs, respectively, held by the neighbor nodes $v_{i}$ and $v_{j}$ regarding the truth of the suspicious sensing data of the node v. The ppdf $f (p | r_{v}^{v_{i}, v_{j}}, t_{v}^{v_{i}, v_{j}}, a_{v}^{v_{i}, v_{j}})$ defined as that [15]:

\begin{gathered} r_{v}^{v_{i}, v_{j}} = r_{v}^{v_{i}} + r_{v}^{v_{j}}, \\ t_{v}^{v_{i}, v_{j}} = t_{v}^{v_{i}} + t_{v}^{v_{j}}, \\ a_{v}^{v_{i}, v_{j}} = \frac{a_{v}^{v_{i}} (r_{v}^{v_{i}} + t_{v}^{v_{i}}) + a_{v}^{v_{j}} (r_{v}^{v_{j}} + t_{v}^{v_{j}})}{r_{v}^{v_{i}} + t_{v}^{v_{i}} + r_{v}^{v_{j}} + t_{v}^{v_{j}}} . \end{gathered}

(9)

Let $ω = (s, d, u, a)$ be a neighbor node's opinion about the suspicious sensing data, and let $f (p | r, t, a)$ be the same neighbor node's probability estimate regarding the same data. For $E (f (p)) = E (ω)$ , that is, $(r + 2 a) / (r + t + 2) = s + a u$ , and $s + d + u = 1$ , it is easy to get $r = 2 s / u$ , $t = 2 d / u$ , where $u \neq 0$ .

The following is the process to prove that the equations of the lemma are correct. Because we want to get the consensus about the proposition on node v's data is anomalous, we get the equations with exchanging r and t of (9); respectively,

\begin{gathered} r_{v}^{v_{i}, v_{j}} = t_{v}^{v_{i}} + t_{v}^{v_{j}} = \frac{2 d_{v}^{v_{i}}}{u_{v}^{v_{i}}} + \frac{2 d_{v}^{v_{j}}}{u_{v}^{v_{j}}} = \frac{2 d_{v}^{v_{i}} u_{v}^{v_{j}} + 2 d_{v}^{v_{j}} u_{v}^{v_{i}}}{u_{v}^{v_{i}} u_{v}^{v_{j}}} = \frac{2 d_{v}^{v_{i}, v_{j}}}{u_{v}^{v_{i}, v_{j}}}, \end{gathered}

(10)

\begin{matrix} t_{v}^{v_{i}, v_{j}} = r_{v}^{v_{i}} + r_{v}^{v_{j}} = \frac{2 s_{v}^{v_{i}}}{u_{v}^{v_{i}}} + \frac{2 s_{v}^{v_{j}}}{u_{v}^{v_{j}}} = \frac{2 s_{v}^{v_{i}} u_{v}^{v_{j}} + 2 s_{v}^{v_{j}} u_{v}^{v_{i}}}{u_{v}^{v_{i}} u_{v}^{v_{j}}} = \frac{2 s_{v}^{v_{i}, v_{j}}}{u_{v}^{v_{i}, v_{j}}}, \end{matrix}

(11)

\begin{matrix} (10) ⟹ \frac{d_{v}^{v_{i}} u_{v}^{v_{j}} + d_{v}^{v_{j}} u_{v}^{v_{i}}}{u_{v}^{v_{i}} u_{v}^{v_{j}}} = \frac{d_{v}^{v_{i}, v_{j}}}{u_{v}^{v_{i}, v_{j}}}, \end{matrix}

(12)

\begin{matrix} (11) ⟹ \frac{s_{v}^{v_{i}} u_{v}^{v_{j}} + s_{v}^{v_{j}} u_{v}^{v_{i}}}{u_{v}^{v_{i}} u_{v}^{v_{j}}} = \frac{s_{v}^{v_{i}, v_{j}}}{u_{v}^{v_{i}, v_{j}}}, \end{matrix}

(13)

\begin{matrix} (12) + (13) ⟹ \frac{(s_{v}^{v_{i}} + d_{v}^{v_{i}}) u_{v}^{v_{j}} + (s_{v}^{v_{j}} + d_{v}^{v_{j}}) u_{v}^{v_{i}}}{u_{v}^{v_{i}} u_{v}^{v_{j}}} = \frac{s_{v}^{v_{i}, v_{j}} + d_{v}^{v_{i}, v_{j}}}{u_{v}^{v_{i}, v_{j}}}, \end{matrix}

(14)

\begin{matrix} (14) ⟹ \frac{(1 - u_{v}^{v_{i}}) u_{v}^{v_{j}} + (1 - u_{v}^{v_{j}}) u_{v}^{v_{i}}}{u_{v}^{v_{i}} u_{v}^{v_{j}}} = \frac{1 - u_{v}^{v_{i}, v_{j}}}{u_{v}^{v_{i}, v_{j}}}, \end{matrix}

(15)

\begin{gathered} (15) ⟹ 1 + \frac{u_{v}^{v_{j}} - u_{v}^{v_{i}} u_{v}^{v_{j}} + u_{v}^{v_{i}} - u_{v}^{v_{i}} u_{v}^{v_{j}}}{u_{v}^{v_{i}} u_{v}^{v_{j}}} = \frac{1}{u_{v}^{v_{i}, v_{j}}} . \end{gathered}

(16)

Let $k = u_{v}^{v_{i}} + u_{v}^{v_{j}} - u_{v}^{v_{i}} u_{v}^{v_{j}}$ .

If $k \neq 0$ ,

\begin{matrix} (16) ⟹ u_{v}^{v_{i}, v_{j}} = \frac{u_{v}^{v_{i}} u_{v}^{v_{j}}}{k} \end{matrix} .

(17)

Combining (17) onto (12), we get

\begin{matrix} \begin{matrix} s_{v}^{v_{i}, v_{j}} = \frac{d_{v}^{v_{i}} u_{v}^{v_{j}} + d_{v}^{v_{j}} u_{v}^{v_{i}}}{u_{v}^{v_{i}} u_{v}^{v_{j}}} \times \frac{u_{v}^{v_{i}} u_{v}^{v_{j}}}{k} = \frac{d_{v}^{v_{i}} u_{v}^{v_{j}} + d_{v}^{v_{j}} u_{v}^{v_{i}}}{k} \end{matrix} . \end{matrix}

(18)

Combining (17) onto (13), we obtain

\begin{matrix} d_{v}^{v_{i}, v_{j}} & = \frac{s_{v}^{v_{i}} u_{v}^{v_{j}} + s_{v}^{v_{j}} u_{v}^{v_{i}}}{u_{v}^{v_{i}} u_{v}^{v_{j}}} \times \frac{u_{v}^{v_{i}} u_{v}^{v_{j}}}{k} \\ = \frac{s_{v}^{v_{i}} u_{v}^{v_{j}} + s_{v}^{v_{j}} u_{v}^{v_{i}}}{k} \end{matrix}

(19)

\begin{matrix} a_{v}^{v_{i}, v_{j}} & = \frac{(1 - a_{v}^{v_{i}}) (r_{v}^{v_{i}} + s_{v}^{v_{i}}) + (1 - a_{v}^{v_{j}}) (r_{v}^{v_{j}} + s_{v}^{v_{j}})}{r_{v}^{v_{i}} + s_{v}^{v_{i}} + r_{v}^{v_{j}} + s_{v}^{v_{j}}} \\ = \frac{(1 - a_{v}^{v_{i}})  (2 (s_{v}^{v_{i}}  +  d_{v}^{v_{i}}) / u_{v}^{v_{i}})  +  (1 - a_{v}^{v_{j}})  (2 (s_{v}^{v_{j}}  +  d_{v}^{v_{j}})  /  u_{v}^{v_{j}})}{(s_{v}^{v_{i}}  +  d_{v}^{v_{i}}) / u_{v}^{v_{i}}  +  (s_{v}^{v_{j}}  +  d_{v}^{v_{j}}) / u_{v}^{v_{j}}} \\ = \frac{(1 - a_{v}^{v_{i}})  (1 - u_{v}^{v_{i}}) u_{v}^{v_{j}}  +  (1  -  a_{v}^{v_{j}})  (1  -  u_{v}^{v_{j}}) u_{v}^{v_{i}}}{k  -  u_{v}^{v_{i}} u_{v}^{v_{j}}} . \end{matrix}

(20)

If $k = 0$ , let $γ = \lim (u_{v}^{v_{j}} / u_{v}^{v_{i}})$ , it is easy to get the equation (7) which is similar as the above.

To be simply presented, we denote $ω_{v}^{v_{i}, v_{j}} = (s_{v}^{v_{i}, v_{j}}, d_{v}^{v_{i}, v_{j}}, u_{v}^{v_{i}, v_{j}}, a_{v}^{v_{i}, v_{j}})$ as $ω_{v}^{v_{i}, v_{j}} \equiv ω_{v}^{v_{i}} \bar{\oplus} ω_{v}^{v_{j}}$ , among which $\bar{\oplus}$ is the new operator which is similar to the consensus operator of subjective logics. The expectation of consensus of neighbors' opinion about the data of node v decides the thorough consideration of neighbors about the data of v. Given consensus of neighbors' opinion $ω_{v}^{v_{i}, v_{j}}$ , the expectation of the opinion is $E (ω_{v}^{v_{i}, v_{j}}) = s_{v}^{v_{i}, v_{j}} + a_{v}^{v_{i}, v_{j}} u_{v}^{v_{i}, v_{j}}$ .

Example 3.

Suppose that the opinions of neighbors $v_{i}$ and $v_{j}$ about the data of node v are $ω_{1} = (0.7,0.2,0.1,0.5)$ and $ω_{2} = (0.8,0.1,0.1,0.5)$ at some round, respectively, then the consensus of the opinions is $ω_{1,2} = ω_{1} \bar{\oplus} ω_{2} = (s_{1,2}, d_{1,2}, u_{1,2}, a_{1,2}) = (0.158,0.789,0.053,0.50)$ , and the expectation is $E (ω_{1,2}) = s_{1,2} + a_{1,2} u_{1,2} = 0.158 + 0.5 \times 0.053 = 0.184$ .

As we all know, each node has many neighbors in WSNs. We need to fuse the opinions of all neighbors into the consensus opinion. Suppose that node v has m neighbors, their opinions about the data of v are $ω_{v}^{v_{1}} = (s_{v}^{v_{1}}, d_{v}^{v_{1}}, u_{v}^{v_{1}}, a_{v}^{v_{i}}), \dots, ω_{v}^{v_{m}} = (s_{v}^{v_{m}}, d_{v}^{v_{m}}, u_{v}^{v_{m}}, a_{v}^{v_{m}})$ . To get the thorough consideration of neighbors about v's data, we fuse all its neighbors' opinions, which denote as $ω_{v}^{v_{1}, v_{2}, \dots, v_{m}} \equiv ω_{v}^{v_{1}} \bar{\oplus} ω_{v}^{v_{2}} \bar{\oplus} \dots \bar{\oplus} ω_{v}^{v_{m}}$ , that is, $ω_{v} = (s_{v}, d_{v}, u_{v}, a_{v})$ . The consensus process is recursively called by use of Theorem 4.

Theorem 4.

Given i neighbors $v_{1}, v_{2}, \dots, v_{i}$ of node v, their opinions about the data of v are $ω_{v}^{v_{1}} = (s_{v}^{v_{1}}, d_{v}^{v_{1}}, u_{v}^{v_{1}}, a_{v}^{v_{1}}), \dots, ω_{v}^{v_{i}} = (s_{v}^{v_{i}}, d_{v}^{v_{i}}, u_{v}^{v_{i}}, a_{v}^{v_{i}})$ , the consensus of their opinions about the proposition on node v's node being anomalous is $ω_{v}^{v_{1}, \dots, v_{i}}$ , then it can be computed as follows:

\begin{matrix} ω_{v}^{v_{1}, \dots, v_{i}} = ω_{v}^{v_{1}, \dots, v_{i - 1}} \bar{\oplus} ω_{v}^{v_{i}} = ω_{v}^{v_{1}} \bar{\oplus} ω_{v}^{v_{2}, \dots, v_{i}} (2 \leq i \leq m) . \end{matrix}

(21)

Proof.

We utilize the mathematical induction approach to prove the theorem. (1)

If $i = 2$ , $ω_{v}^{v_{1}, v_{2}} = ω_{v}^{v_{1}} \bar{\oplus} ω_{v}^{v_{2}}$ , which illustrates that (21) is true.

(2)

Suppose that, if $i = k$ , (21) is true; that is,

\begin{matrix} ω_{v}^{v_{1}, \dots, v_{k}} = ω_{v}^{v_{1}, \dots, v_{k - 1}} \bar{\oplus} ω_{v}^{v_{k}} = ω_{v}^{v_{1}} \bar{\oplus} ω_{v}^{v_{2}, \dots, v_{k}}, \end{matrix}

(22)

we need to prove that (21) is true while $i = k + 1$ ; that is,

\begin{matrix} ω_{v}^{v_{1}, \dots, v_{k + 1}} = ω_{v}^{v_{1}, \dots, v_{k}} \bar{\oplus} ω_{v}^{v_{k + 1}} = ω_{v}^{v_{1}} \bar{\oplus} ω_{v}^{v_{2}, \dots, v_{k + 1}} . \end{matrix}

(23)

It is equivalent to

\begin{matrix} s_{v}^{v_{1}, \dots, v_{k + 1}} & = s_{v}^{v_{1}, \dots, v_{k}} \bar{\oplus} s_{v}^{v_{k + 1}} = s_{v}^{v_{1}} \bar{\oplus} s_{v}^{v_{2}, \dots, v_{k + 1}}, \\ d_{v}^{v_{1}, \dots, v_{k + 1}} & = d_{v}^{v_{1}, \dots, v_{k}} \bar{\oplus} d_{v}^{v_{k + 1}} = d_{v}^{v_{1}} \bar{\oplus} d_{v}^{v_{2}, \dots, v_{k + 1}}, \\ u_{v}^{v_{1}, \dots, v_{k + 1}} & = u_{v}^{v_{1}, \dots, v_{k}} \bar{\oplus} u_{v}^{v_{k + 1}} = u_{v}^{v_{1}} \bar{\oplus} u_{v}^{v_{2}, \dots, v_{k + 1}}, \\ a_{v}^{v_{1}, \dots, v_{k + 1}} & = a_{v}^{v_{1}, \dots, v_{k}} \bar{\oplus} a_{v}^{v_{k + 1}} = a_{v}^{v_{1}} \bar{\oplus} a_{v}^{v_{2}, \dots, v_{k + 1}} . \end{matrix}

(24)

(i)

\begin{matrix} s_{v}^{v_{1}, \dots, v_{k}} \bar{\oplus} s_{v}^{v_{k + 1}} = \frac{d_{v}^{v_{1}, \dots, v_{k}} u_{v}^{v_{k + 1}} + d_{v}^{v_{k + 1}} u_{v}^{v_{1}, \dots, v_{k}}}{u_{v}^{v_{1}, \dots, v_{k}} + u_{v}^{v_{k + 1}} - u_{v}^{v_{1}, \dots, v_{k}} u_{v}^{v_{k + 1}}} \end{matrix}

(25)

For $ω_{v}^{v_{1}, \dots, v_{k}} = ω_{v}^{v_{1}, \dots, v_{k - 1}} \bar{\oplus} ω_{v}^{v_{k}} = ω_{v}^{v_{1}} \bar{\oplus} ω_{v}^{v_{2}, . ., v_{k}}$ , we can get the following:

\begin{matrix} (25) ⟹ \frac{d_{v}^{v_{1}} u_{v}^{v_{2}, \dots, v_{k}} u_{v}^{v_{k + 1}}  +  u_{v}^{v_{1}} s_{v}^{v_{2}, \dots, v_{k}} u_{v}^{v_{k + 1}}  +  u_{v}^{v_{1}} u_{v}^{v_{2}, \dots, v_{k}} d_{v}^{v_{k + 1}}}{u_{v}^{v_{1}} u_{v}^{v_{2}, \dots, v_{k}}  +  u_{v}^{v_{1}} u_{v}^{v_{k + 1}}  +  u_{v}^{v_{2}, \dots, v_{k}} u_{v}^{v_{k + 1}}  -  2 u_{v}^{v_{1}} u_{v}^{v_{2}, \dots, v_{k}} u_{v}^{v_{k + 1}}} \end{matrix},

(26)

(ii)

\begin{matrix} s_{v}^{v_{1}} \bar{\oplus} s_{v}^{v_{2}, \dots, v_{k + 1}} = \frac{d_{v}^{v_{1}} u_{v}^{v_{2}, \dots, v_{k + 1}} + d_{v}^{v_{2}, \dots, v_{k + 1}} u_{v}^{v_{1}}}{u_{v}^{v_{1}} + u_{v}^{v_{2}, \dots, v_{k + 1}} - u_{v}^{v_{1}} u_{v}^{v_{2}, \dots, v_{k + 1}}}, \end{matrix}

(27)

\begin{matrix} (27) ⟹ \frac{d_{v}^{v_{1}} u_{v}^{v_{2}, \dots, v_{k}} u_{v}^{v_{k + 1}}  +  u_{v}^{v_{1}} d_{v}^{v_{2}, \dots, v_{k}} u_{v}^{v_{k + 1}}  +  u_{v}^{v_{1}} u_{v}^{v_{2}, \dots, v_{k}} d_{v}^{v_{k + 1}}}{u_{v}^{v_{1}} u_{v}^{v_{2}, \dots, v_{k}}  +  u_{v}^{v_{1}} u_{v}^{v_{k + 1}}  +  u_{v}^{v_{2}, \dots, v_{k}} u_{v}^{v_{k + 1}}  -  2 u_{v}^{v_{1}} u_{v}^{v_{2}, \dots, v_{k}} u_{v}^{v_{k + 1}}} . \end{matrix}

(28)

Equation (26) = (28); that is, $s_{v}^{v_{1}, \dots, v_{k + 1}} = s_{v}^{v_{1}, \dots, v_{k}} \oplus s_{v}^{v_{k + 1}} = s_{v}^{v_{1}} \oplus s_{v}^{v_{2}, . ., v_{k + 1}}$ .

It is easy to know that the others ( $d, u$ , and a) can be proved as above. So (21) is true while $i = k + 1$ .

The above procedure illustrates that (21) is true while i is no less than 2 and no more than m. That is, the theorem is proved to be true as follows:

\begin{matrix} ω_{v}^{v_{1}, \dots, v_{i}} = ω_{v}^{v_{1}, \dots, v_{i - 1}} \bar{\oplus} ω_{v}^{v_{i}} = ω_{v}^{v_{1}} \bar{\oplus} ω_{v}^{v_{2}, \dots, v_{i}} (2 \leq i \leq m) \end{matrix}

(29)

Given m neighbors $v_{1}, v_{2}, \dots, v_{m}$ of node v, their opinions about the data of v are $ω_{v}^{v_{1}} = (s_{v}^{v_{1}}, d_{v}^{v_{1}}, u_{v}^{v_{1}}, a_{v}^{v_{1}}), \dots, ω_{v}^{v_{m}} = (s_{v}^{v_{m}}, d_{v}^{v_{m}}, u_{v}^{v_{m}}, a_{v}^{v_{m}})$ , the consensus of all the neighbors' opinions can be got through the computation of Theorem 4, then the expectation of consensus is $E (ω_{v}) = s_{v} + a_{v} u_{v}$ , where $s_{v} = s_{v}^{v_{1}, \dots, v_{m}}, u_{v} = u_{v}^{v_{1}, \dots, v_{m}}, a_{v} = a_{v}^{v_{1}, \dots, v_{m}}$ . The anomaly score of the node v's data is defined according to the expectation $E (ω_{v})$ .

Definition 5.

Suppose that the consensus of all the neighbors' opinions about node v's data is $ω_{v}$ and the expectation of the consensus is $E (ω_{v})$ , then the anomaly score of node v is defined as follows:

\begin{matrix} \begin{matrix} A S_{v} = E (ω_{v}) \end{matrix} . \end{matrix}

(30)

There are some to be said. In the scenario that node v has one neighbor, Lemma 2 is not able to deal with it. To do with that, we suppose an imaginary neighbor who holds the opinion $ω = (0,0, 1,0.5)$ and the neighbor takes part in the consensus with the real neighbor. Thus, we still get the consensus according to Lemma 2.

In the following sections, we present two algorithms to further determine whether suspicious data are normal or anomalous. The notations used to describe the algorithms are shown as in Table 1.

Table 1

Notations used in the algorithms.

Notation	Description
m	Number of node v's neighbors
$V_{neighbor}$	Node v's neighbors set, ${v_{1}, \dots, v_{m}}$
x	Suspicious sensing data of node v
X	Suspicious vector data of node v
r	Current round
$D_{neighbor}$	Sensing data of $V_{neighbor}$ at round r, ${x_{1}, \dots, x_{m}}$
$V D_{neighbor}$	Vector data of $V_{neighbor}$ from $r - l + 1$ to r rounds, ${X_{1}, \dots, X_{m}}$
$X^{'}$	Historical vector data of node v
$V D_{neighbor}^{'}$	Historical vector data of $V_{neighbor}, {X_{1}^{'}, \dots, X_{m}^{'}}$
$F_{x}$	Indication of whether x is normal, faulty, or informational data, $F_{x} = 0 : x$ is normal;
$F_{x}$	$F_{x} = 1 : x$ is faulty data; $F_{x} = 2 : x$ is informational data
$A S_{v}$	Anomaly score of node v
θ, thre	Predefine thresholds, discussed in Section 6
$Corr (x, x_{i})$	Spatial correlation between x and $x_{i}$ can be computed using extended Jaccard coefficient or correlation coefficient and so forth

5.4. SLB Algorithm

The process of subjective logic-based algorithm (SLB) is as follows with discussion above. This process is executed among the node and its neighbors. Supposing node v has m neighbors $v_{1}, v_{2}, \dots, v_{m}$ . According to the suspicious data of node v whether it is changing smoothly or nonsmoothly, each neighbor node $v_{i}$ gives the opinion $ω_{v}^{v_{i}}$ about the data of node v (Line 1–10). Utilizing Theorem 4 to compute, we get the consensus opinion $ω_{v}$ of all the neighbors of node v (Line 11). The expectation of consensus opinion is obtaind through the equation $E (ω_{v}) = s_{v} + a_{v} u_{v}$ (Line 12). And then, the anomaly score $A S_{v}$ can be get through Definition 5 (Line 13). If the anomaly score is less than the predefined threshold θ, the suspicious data of node v is considered as normal, or it is thought of as anomalous(Line 14–18) (Algorithm 1).

Algorithm 1: Subjective logic-based (SLB) algorithm.

Input: $V_{neighbor}$ , $x, X, D_{neighbor}, V D_{neighbor}$ ;

Output: $F_{x}$ ;

(1) if r is at the time of data changing smoothly

(2) for $1 \leq i \leq m$

(3) compute the opinion $ω_{v}^{v_{i}} = (s_{v}^{v_{i}}, d_{v}^{v_{i}}, u_{v}^{v_{i}}, a_{v}^{v_{i}})$ of neighbor $v_{i}$ about v by use of (4)

(4) end for

(5) end if

(6) if r is at the time of data changing nonsmoothly

(7) for $1 \leq i \leq m$

(8) compute the opinion $ω_{v}^{v_{i}} = (s_{v}^{v_{i}}, d_{v}^{v_{i}}, u_{v}^{v_{i}}, a_{v}^{v_{i}})$ of $v_{i}$ about v by use of (5)

(9) end for

(10) end if

(11) get the consensus opinion $ω_{v}$ of all the neighbors $v_{1}, v_{2}, \dots, v_{m}$ about node v

(12) compute the expectation $E (ω_{v})$ of the consensus opinion

(13) get the anomaly score $A S_{v}$ of node v

(14) if $A S_{v} \leq θ$

(15) x is normal data, $F_{x} = 0$ ;

(16) else

(17) x is anomalous data, $F_{x} = 1$ //here we do not distinguish faulty data from informational data

(18) end if

(19) return $F_{x}$ ;

5.5. ESLB Algorithm

SLB algorithm fuses the opinions of all the neighbors about the data of the node to decide whether the data is normal or anomalous. However, it has the following disadvantages. (1) In the process of judgement among the node and its neighbors, the opinions of the neighbors whose data are suspicious are also included so as to affect the performance of anomaly detection. It is more severely affected especially when the proportion of anomalous data is ascending. (2) It does not distinguish the faulty data from the informational data. (3) The base rate a of all the neighbors' opinions is set to 0.5 which is not reasonable. It does not take the historical information of the node and its neighbors into account.

To overcome the disadvantages of SLB, we extend SLB to ESLB. For the first point, ESLB removes the opinions of those neighbors whose data are suspicious. To solve the second point, ESLB employ the correlations of anomalous data. If those data are spatial correlated, they are the informational data or else the faulty data. Thirdly, we define a as follows in considering the historical information.

Suppose that $X'$ and $X_{i}^{'}$ are the latest l rounds of historical data of node v and neighbor $v_{i}$ in the pre-processing phase, the historical opinion of neighbor $v_{i}$ about node v's data is ${ω_{v}^{v_{i}}}^{'} = ({s_{v}^{v_{i}}}^{'}, {d_{v}^{v_{i}}}^{'}, {u_{v}^{v_{i}}}^{'}, {a_{v}^{v_{i}}}^{'})$ . We set base rate ${a_{v}^{v_{i}}}^{'}$ of historical opinion is 0.5; that is, ${a_{v}^{v_{i}}}^{'} = 0.5$ . Then we have the following definition.

Definition 6.

Given the historical opinion of neighbor $v_{i}$ about node v's data is ${ω_{v}^{v_{i}}}^{'}$ , base rate a of current opinion $ω_{v}^{v_{i}}$ of $v_{i}$ about v's data is defined as follows:

\begin{matrix} a_{v}^{v_{i}} = E ({ω_{v}^{v_{i}}}^{'}) . \end{matrix}

(31)

Theorem 7.

Suppose that historical opinion of neighbor $v_{i}$ about node v's data is ${ω_{v}^{v_{i}}}^{'}$ , then base rate a of current opinion of $v_{i}$ about v's data is $a_{v}^{v_{i}} = {s_{v}^{v_{i}}}^{'} + 0.5 {u_{v}^{v_{i}}}^{'}$ .

Proof.

From the definition of the expectation, we know that

\begin{matrix} E ({ω_{v}^{v_{i}}}^{'}) = {s_{v}^{v_{i}}}^{'} + {a_{v}^{v_{i}}}^{'} {u_{v}^{v_{i}}}^{'}, \end{matrix}

(32)

\begin{matrix} \begin{matrix} (31) \\ (33) \\ {a_{v}^{v_{i}}}^{'} = 0.5 \end{matrix}} ⟹ a_{v}^{v_{i}} = {s_{v}^{v_{i}}}^{'} + 0.5 {u_{v}^{v_{i}}}^{'} . \end{matrix}

(33)

We extend SLB to ESLB algorithm as follows. If the data x is suspicious, node v turns to its neighbors set $V_{neighbor}$ to further determine (Line 1–3). If the data of some neighbors are suspicious, they do not provide their opinions about the suspicious data of node v. We exclude the neighbors from the candidate neighbors set $V_{neighbor}$ and get the neighbors set $V_{tneighbor}$ which provides the opinions about the data of node v (Line 4–8). For each node in $V_{tneighbor}$ , it computes its historical opinion ${ω_{v}^{v_{k}}}^{'}$ of neighbor $v_{k}$ about v's data by use of $X'$ and $X_{k}^{'}$ , and ${a_{v}^{v_{k}}}^{'}$ is set to 0.5 (Line 11). We compute the current opinion $ω_{v}^{v_{i}}$ according to SLB algorithm excluding $a_{v}^{v_{i}}$ which is computed through Theorem 7 (Line 12). Then we get the result whether x is normal according to calling SLB algorithm (Line 15). If x is not normal, it sends message $M_{x}$ to sink, which includes node v, current round l, data x, and flag $F_{x}$ (Line 21). Sink receives all the messages at round r and further analyzes neighbors who hold on faulty data at this round. If x and $x_{i}$ are all faulty at the same time and are spatial correlated, they are informational data or else faulty data (Line 24–32) (Algorithm 2).

Algorithm 2: Extended subjective logic-based (ESLB) algorithm.

Input: $V_{neighbor}, x, X, D_{neighbor}, V D_{neighbor}, X^{'}, V D_{neighbor}^{'}$ ;

Output: $F_{x} (, F_{x_{i}})$ ;

(1) for each node v

(2) if x is suspicious data

(3) node v turns to its neighbors $V_{neighbor}$ to further determine

(4) for $1 \leq i \leq m$

(5) if $x_{i}$ is suspicious data

(6) $v_{i}$ does not provide its opinion to node v, $V_{tneighbor} = V_{neighbor} - {v_{i}}$

(7) end if

(8) end for

(9) for $1 \leq k \leq m$

(10) if $v_{k} \in V_{tneighbor}$

(11) compute historical opinion ${ω_{v}^{v_{k}}}^{'}$ of $v_{k}$ about v by use of X and $X_{k}^{'}, {a_{v}^{v_{k}}}^{'} = 0.5$

(12) call SLB Algorithm (Line 1–10) to compute current opinion $ω_{v}^{v_{k}}$ excluding $a_{v}^{v_{k}}$ , and $a_{v}^{v_{k}} = {s_{v}^{v_{k}}}^{'} + {a_{v}^{v_{k}}}^{'} {u_{v}^{v_{k}}}^{'}$

(13) end if

(14) end for

(15) call SLB Algorithm (Line 11–18) to get the result whether x is normal

(16) if x is normal

(17) $F_{x} = 0$

(18) else

(19) $F_{x} = 1$

(20) end if

(21) send message $M_{x}$ to sink, $M_{x} = {v, r, x, F_{x}}$

(22) end if

(23) end for

(24) sink receives all the messages at round r, and analyzes neighbors holding on faulty data at this round

//following executes on sink node

(25) if x and $x_{i}$ are faulty at the same time

(26) if $Corr (x, x_{i}) > thre$

(27) x and $x_{i}$ are informational data, $F_{x} = 2, F_{x_{i}} = 2$

(28) else

(29) x and $x_{i}$ are faulty data, $F_{x} = 1, F_{x_{i}} = 1$

(30) end if

(31) end if

(32) return $F_{x}, F_{x_{i}}$ ;

6. Discussion

There are some problems to be explained further. First, authors in [8, 9] point out that voting algorithms cannot deal with the situation, in which the events are detected by sensor nodes which are not adjacent. However, our framework can do with the situation after minor revision. For example, suppose that node $v_{i}$ and $v_{j}$ are not within the radio range of each other and they detect the same event at some time. Suppose that the impact range of events is $I R$ , radio range is $C R, h = ⌈ I R / C R ⌉$ . Our framework can still detect the event by computing the spatial correlation among h-hop neighbors. For the computation is executed on sink, it does not increase the energy consumption.

Second, in order to reduce the energy consumption, we use the idea proposed by [13] to construct and maintain AR models. (1) It avoids unnecessary data transmission. While the data of nodes are normal, it does not transmit data in the network but estimates the data according to AR models by sink. (2) It reduces the computational complexity of constructing and maintaining AR models. The main computation is executed on sink and not sensor nodes. Please refer to [13] for more detail.

Third, although the thresholds, like θ and $thre$ , are vital to SLAD, we do not pay much attention to them. We focus on how to more realistically quantize the opinion of the neighbors to special sensor node. In this paper, we set them with the historical experience. However, excellent methods are not excluded to improve SLAD further.

7. Simulation Results

7.1. Experimental Setup

We implement our simulation experiments in OMNET++ platform [20]. The topology and the sensing data come from Intel Berkeley research lab data set [21]. 54 sensors are deployed in the Lab of $400 * 700$ , and the locations of sensor nodes are known in advance. In the experiments of Section 7.2, radio range is set to 150. Section 7.3 shows the impact of different radio ranges on the detection performance. All the experiments suppose that the radio links are reliable and do not fail. The sensing data have four attributes, yet only temperature is selected in our experiments. We use 1000 rounds of data as experimental data, and use the initial 100 rounds of data to construct $AR$ models.

While using $AR (p)$ models to predicate the sensing data in WSNs, AR (3) model can get good estimation and low cost of maintenance [13, 14]. So we use AR(3) as the models constructed on the nodes. If p is set to 3, AR models can express as $X_{t} = φ X_{t - 1} + φ X_{t - 2} + φ X_{t - 3} + ɛ$ . In the beginning, we use the first 100(N) rounds as the training data, among which the first 90 (M) rounds of data are used to estimate the coefficients of AR model and the last 10 (N-M) rounds of data are used to determine the threshold τ.

If the sensing data are changing nonsmoothly, we would use the vector data to compute neighbors' opinions. To compute the base rate of neighbors to the node (historical information), it also needs to utilize the vector data. So, it needs to select the appropriate length of vector data $(l)$ . If l is set too small, it cannot express the data trends. Otherwise, it consumes too much energy to exchange sensing data. Figure 1 shows the detection rate of SLAD framework under the condition of different lengths of vector data. While l is not more than 5, the detection rate increases obviously with the increase of l. Once l achieves 5, the detection rate varies not obviously with the increase of l. Consequently, we set the length of vector data (l) to 5.

Figure 1

Impact of length on detection rate.

We randomly change some of normal data as faulty data and define the faulty rate as the proportion of faulty data to the whole data. In the experiments, we compare the performance of different algorithms at various faulty rate, and the results are mean of 20 times of executions.

7.2. Comparison of Detection Performance

In order to compare the anomaly detection performance of different algorithms, we define detection rate, false detection rate, and undetection rate. Among these definitions, the whole experimental data set is denoted as $W_{D}$ , the real faulty data set is expressed as $F_{D}$ , and the identified faulty data set which is determined by anomaly detection algorithms is marked as $I_{D}$ .

Definition 8 (detection rate).

It is defined as the faulty data which are determined as faulty in the proportion of the real faulty data:

\begin{matrix} Detection_rate = \frac{| F_{D} ⋂ I_{D} |}{| F_{D} |} . \end{matrix}

(34)

Definition 9 (false detection rate).

It is defined as those normal data which are determined as faulty in the proportion of the real faulty data:

\begin{matrix} FalseDetection_rate = \frac{| (W_{D} - F_{D}) ⋂ I_{D} |}{| F_{D} |} . \end{matrix}

(35)

Definition 10 (undetection rate).

It is defined as those faulty data which are determined as normal in the proportion of the real faulty data:

\begin{matrix} UnDetection_rate = \frac{| F_{D} - (F_{D} ⋂ I_{D}) |}{| F_{D} |} . \end{matrix}

(36)

In this section, we compare the performance of different algorithms. These algorithms are listed as follows. (1) MV (majority voting algorithm) [6]. (2) DWV (distance weight voting algorithm) [7]: it use the Euclidean distance of sensor nodes as the weight, and the weight is smaller with the distance being farther. Please refer to Section 2 about the details of $MV$ and $DWV$ algorithms. (3) VWV(value weight voting algorithm): it is different from $DWV$ , and it uses the distance of the data of node and its neighbors as the weight, that is, the difference of the data. It considers that the neighbors whose data are closer to that of the node should have greater weights. (4) ASLB(autoregressive model and SLB): it combines autoregressive models with subjective logic-based algorithm (SLB algorithm). (5) SLAD (subjective logic-based anomaly detection framework): it integrates autoregressive model and extended subjective logic-based algorithm(ESLB algorithm).

Figure 2 shows the detection rate of five algorithms at different faulty rate. It indicates that detection rates of all the algorithms are greater than 0.8 when faulty rate is low. The performances of $ASLB$ and $SLAD$ are better than MV, $DWV$ , and $VWV$ . The detection rates of MV, DWV, and VWV decrease sharply as faulty rate increases. $ASLB$ keeps the high detection rate when faulty rate is less than 0.4, which decreases sharply once faulty rate reaches 0.4 and holds this trend with the increase of faulty rate. However, the detection rate of $SLAD$ is still greater than 0.9 even though faulty rate increases, which shows the best performance compared with the other algorithms.

Figure 2

Detection rate of different algorithms.

Figure 3 presents the detailed comparison results of these algorithms at different faulty rate. The false detection rate of all the algorithms increases as faulty rate becomes larger. The false detection rate of $MV$ , DWV, and $VWV$ keeps in some specified scope as faulty rate increases, and $ASLB$ increases suddenly once faulty rate achieves 0.4. $SLAD$ holds the false detection rate within limits which is no greater than 0.1. The false detection rate of SLAD is much less than the others.

Figure 3

False detection rate of different algorithms.

We then study the impact of different faulty rate on undetection rate of these algorithms. The undetection rate of $MV$ , DWV, and $VWV$ decreases as faulty rate increases. The undetection rate of $ASLB$ increases abruptly while faulty rate achieves 0.4, and it keeps the rising trend with the increase of faulty rate. $SLAD$ preserves very low undetection rate which does not exceed 0.1 even though faulty rate is high. The undetection rate of $SLAD$ is much less than other algorithms though it increases as faulty rate increases.

From the above figures, we note that $ASLB$ suddenly changes its trends of detection performance when faulty rate is 0.4. The reason is presented as follows. When faulty rate is 0.4, the number of neighbors whose sensing data are right is more than those data being anomalous on average. It results that the detection performance does not decline too much. However, once faulty rate is more than 0.4, the number of neighbors whose data are faulty is no less than that whose data are normal. It is hard to decide whether the suspicious data is normal for $ASLB$ , and it results to the poor detection performance significantly.

We also draw the following conclusion according to Figures 2, 3, and 4. The overall performance of $SLAD$ is much better than the other algorithms, and the performance of $ASLB$ is better than $MV$ , DWV, and $VWV$ when faulty rate is low. The cause is the combination of subjective logic. Using subjective logic, $ASLB$ and $SLAD$ fuses the quantitative opinions of neighbors which avoid the problems other algorithms are facing. Because $MV$ , $DVW$ , $VWV$ , and $ASLB$ use the opinions of all the neighbors, the number of faulty data of neighbors may be rising along with faulty rate increasing, which shows the bad impact on the detection rate, false detection rate, and undetection rate. However, $SLAD$ has removed the opinions of the neighbors whose data are suspicious before providing their opinions and takes historical spatial correlations of the nodes and their neighbors into account. So, $SLAD$ holds significantly superior performance than other algorithms, especially when faulty rate is high.

Figure 4

Undetection rate of different algorithms.

The above experiments discuss the cases that the network are only involving the faulty data, and not including the informational data. In the monitoring area, some events randomly arise. The anomalous data of sensor nodes detecting the events are spatial correlations (i.e., informational data). The faulty rate, which is defined as the number of informational data in proportion of the whole data, is set to 0.2. The experiment shows that detection rate of $SLAD$ framework for informational data reaches more than 0.9, and $MV$ , $DWV$ , $VWV$ are only about 0.7. The reason is that $SLAD$ framework utilizes subjective logic to fuse the quantitative opinions of neighbors so as to improve the detection performance obviously.

7.3. Impact of Radio Range on Detection Performance

In this section, we analyze the impact of radio range on detection rate, false detection rate, and undetection rate at different faulty rate. The number of neighbors affects the detection performance of the algorithm. Different radio range of the nodes leads to different number of neighbors. Thereby, we discuss the detection performance of $SLAD$ framework under the condition of different radio ranges.

We conduct the experiments to compare the detection performance of $SLAD$ framework under different radio ranges. We set faulty rate to 0.3, 0.4, and 0.5 in the experiments. Figures 5, 6, and 7 show the detection rate, false detection rate, and undetection rate of SLAD, respectively. These figures indicate that detection rate decreases; false detection rate and undetection rate increase with the increase of faulty rate. They also show that detection rate increases; false detection rate and undetection rate decrease as the radio range becomes larger. The reason is that there are more neighbors providing the opinions with the radio range increasing.

Figure 5

Impact of radio range on detection rate.

Figure 6

Impact of radio range on false detection rate.

Figure 7

Impact of radio range on undetection rate.

8. Conclusions

In this paper, we present SLAD framework which considers the uncertainty of neighbors to the data of the node. It includes three phases: pre-processing, self-monitoring, and cooperant detecting. In the first phase, sink constructs AR model for each node. In the second phase, it uses AR models to check whether the sensing data are suspicious. In the third phase, it presents two novel algorithms SLB and ESLB. The third phase is the key of our framework. In SLB, each neighbor gives the quantitative opinion to the suspicious data involving with subjective logic theory. After fusing the opinions of all the neighbors, SLB gets the expectation of the consensus opinion and anomaly score, which demonstrates the degree of the suspicious data being considered as an anomaly. We extend SLB to ESLB in order to avoid the impact of those neighbors whose data are suspicious, effectively distinguish the faulty data from the informational data, and take the historical spatial correlations of the node and its neighbors into account. Simulation results show that SLAD framework improves the performance of anomaly detection effectively compared with previous works.

However, we find there is something to do for further improving SLAD. We believe that the opinion of the neighbor, who holds the higher historical spatial correlation with the node, should be paid more attention to. An example is given to demonstrate that. Suppose node A and node B are the neighbors of node C and node A and node C are located in the room while node B is out of the room. Generally, the historical spatial correlation between node A and node C is higher than that between node B and node C. Thus, the opinion of node A to node C should be given more attention. Unfortunately, the subjective logic, which works as the foundation of SLAD, treats the opinions equally and has no capability to deal with it. As the preparatory work, we proposed an operator for subjective logic which is capable of making the consensus on several neighbors' opinions with their weights in a fair way [22]. With the support of the new operator, we can map the historical spatial correlation to the weight of the opinion to improve SLAD. In theory, we believe it will improve the performance of anomaly detection for SLAD. It is our future work.

Footnotes

Acknowledgments

This work is supported by the National Science Foundation (61070056, 61033010), the National 863 High-tech Plan (2008AA01Z120), Program for New Century Excellent Talents in University, and the Research Funds of the Renmin University of China (10XNI018).

References

Kahn

J. M.

Katz

R. H.

Pister

K. S. J.

Next century challenges: mobile networking for “smart dust”

Proceedings of the 5th Annual ACM/IEEE International Conference on Mobile Computing and Networking

August 1999

Seattle, Wash, USA

271 278

Cruller

Estrin

Sivastava

Overview of sensor networks

Computer 2004 37 41 49

Chandola

Banerjee

Kumar

Anomaly detection: a survey

ACM Computing Surveys 2009 41 3 1 58

2-s2.0-68049121093

10.1145/1541880.1541882

Zhang

Meratnia

Havinga

Outlier detection techniques for wireless sensor networks: a survey

IEEE Communications Surveys & Tutorials 2010 12 2 159 170

10.1109/SURV.2010.021510.00088

Jeffery

Alonso

Franklin

M. J.

Hong

Widom

Declarative support for sensor data cleaning

Proceedings of the 4th International Conference on Pervasive Computing

May 2006

Dublin, Ireland

83 100

Krishnamachari

Iyengar

Distributed Bayesian algorithms for fault-tolerant event region detection in wireless sensor networks

IEEE Transactions on Computers 2004 53 3 241 250

2-s2.0-1842478856

10.1109/TC.2004.1261832

Krasniewski

Varadharajan

Rabeler

Bagchi

Y. C.

TIBFIT: trust index based fault tolerance for arbitrary data faults in sensor networks

Proceedings of the International Conference on Dependable Systems and Networks

July 2005

Yokohama, Japan

672 681

2-s2.0-27544433189

10.1109/DSN.2005.92

Kotidis

Deligiannakis

Stoumpos

Robust management of outliers in sensor network aggregate queries

Proceedings of 6th International ACM Workshop on Data Engineering for Wireless and Mobile Access

June 2007

Beijing, China

17 24

Deligiannakis

Kotidis

Vassalos

Stoumpos

Delis

Another outlier bites the dust: computing meaningful aggregates in sensor networks

Proceedings of the 25th IEEE International Conference on Data Engineering (ICDE '09)

April 2009

Shanghai, China

988 999

2-s2.0-67649637306

10.1109/ICDE.2009.100

10.

Giatrakos

Kotidis

Deligiannakis

Vassalos

Theodoridis

TACO: tunable approximate computation of outliers in wireless sensor networks

Proceedings of the International Conference on Management of Data (SIGMOD '10)

June 2010

Indianapolis, Ind, USA

279 290

2-s2.0-77954750313

10.1145/1807167.1807199

11.

Xiao

X. Y.

Peng

W. C.

Hung

C. C.

Lee

W. C.

Using sensor ranks for in-network detection of faulty readings in wireless sensor networks

Proceedings of the 6th International ACMWorkshop on Data Engineering for Wireless and Mobile Access

June 2007

Beijing, China

1 8

12.

Giatrakos

Kotidis

Deligiannakis

PAO: power-efficient attibution of outliers in wireless sensor networks

Proceedings of the 7th International Workshop on Data Management for Sensor Networks

September 2010

Singapore

33 38

13.

Tulone

Madden

PAQ: time series forecasting for approximate query answering in sensor networks

Proceedings of the European Conference on Wireless Sensor Networks

February 2006

Zurich, Switzerland

21 37

14.

Tulone

A resource—efficient time estimation for wireless sensor networks

Proceedings of the Joint Workshop on Foundations of Mobile Computing (DIALM-POMC '04)

October 2004

Philadelphia, Pa, USA

52 59

2-s2.0-13944274418

15.

Jøsang

A logic for uncertain probabilities

International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 2001 9 3 279 311

1843261

16.

Josang

Fission of opinions in subjective logic

Proceedings of the 12th International Conference on Information Fusion

July 2009

Seattle, Wash, USA

1911 1918

17.

Obst

Wang

X. R.

Prokopenko

Using echo state networks for anomaly detection in underground coal mines

Proceedings of the ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN '08)

April 2008

St. Louis, Mo, USA

219 229

2-s2.0-51249088236

10.1109/IPSN.2008.35

18.

Wälchli

Efficient signal processing and anomaly detection in wireless sensor networks

Proceedings of the EvoWorkshops on Applications of Evolutionary Computing: EvoCOMNET, EvoENVIRONMENT, EvoFIN, EvoGAMES, EvoHOT, EvoIASP, EvoINTERACTION, EvOmUSART, EvoNUM, EvoSTOC, EvoTRANSLOG

April 2009

Tübingen, Germany

81 86

19.

Chang

Terzis

Bonnet

Mote-based online anomaly detection using echo state networks

Proceedings of the 5th IEEE International Conference on Distributed Computing in Sensor Systems

June 2009

Marina Del Rey, Calif, USA

72 86

20.

Varga

The OMNET++ discrete event simulation system

Proceedings of the European Simulation Multiconference

June 2001

Prague, Czech

319 324

21.

Intel Berkeley Research Lab http://berkeley.intel-research.net/labdata/

22.

Zhou

Shi

Liang

Using new fusion operations to improve trust expressiveness of subjective logic

Wuhan University Journal of Natural Sciences 2011 16 5 376 382

Subjective Logic-Based Anomaly Detection Framework in Wireless Sensor Networks

Abstract

1. Introduction

2. Related Work

3. Preliminaries

4. SLAD Framework

5. Subjective Logic-Based Algorithms

5.1. Expressiveness of Neighbors' Opinions

Definition 1.

5.2. Value Assignment of Neighbors' Opinions

5.3. Consensus of Neighbors' Opinions

Lemma 2.

Proof.

Example 3.

Theorem 4.

Proof.

Definition 5.

5.4. SLB Algorithm

Algorithm 1: Subjective logic-based (SLB) algorithm.

5.5. ESLB Algorithm

Definition 6.

Theorem 7.

Proof.

Algorithm 2: Extended subjective logic-based (ESLB) algorithm.

6. Discussion

7. Simulation Results

7.1. Experimental Setup

7.2. Comparison of Detection Performance

Definition 8 (detection rate).

Definition 9 (false detection rate).

Definition 10 (undetection rate).

7.3. Impact of Radio Range on Detection Performance

8. Conclusions

Footnotes

Acknowledgments

References