A Novel Distributed Online Anomaly Detection Method in Resource-Constrained Wireless Sensor Networks

Abstract

In this paper, a novel distributed online anomaly detection method in resource-constrained WSNs was proposed. Firstly, the spatiotemporal correlation existing in the sensed data was exploited and a series of single anomaly detectors were built in each distributed deployment sensor node based on ensemble learning theory. Secondly, these trained detectors were broadcasted to the member sensor nodes in the cluster, combining with its trained detector, and the initial ensemble detector was built. Thirdly, considering resources-constrained WSNs, ensemble pruning based on biogeographical based optimization (BBO) was employed in the cluster head node to obtain an optimized subset of ensemble members. Further, the pruned ensemble detector coded by the state matrix was broadcasted to each member sensor nodes for the distributed online global anomaly detection. Finally, the experiments operated on a real WSN dataset demonstrated the effectiveness of the proposed method.

1. Introduction

Wireless sensor networks (WSNs) are integrated with sensing, data processing, and wireless communication capabilities [1], which have received considerable attention for multiple types of applications. However, WSNs are highly susceptible to suffer from various kinds of interferences and faults hardware fault, electromagnetic interference, environmental factor, and network intrusion. Consequently, anomalous observations arise inevitably in WSNs. These unusual observations (i.e., anomalies or outliers) can be generally classified into two different types: one is error and the other is event [2, 3]. The former refers to the observations that deviate from the true measurement significantly such as the dirty data. Detecting and cleaning them timely can save the limited memory and computation as well as expensive communication resources. The later usually refers to an event that occurred such as temperature change caused by the forest fire. Detecting such event timely can help to take corresponding measure. With the wide application of WSNs, detecting these anomalous observations accurately and timely is an important task.

Though there are many anomaly detection methods available up to now based on data mining and machine learning methods, most of them do not take the resource limitation into account and are not designed specifically for WSNs. Considering the limited resource (i.e., computation, memory, communication, and so on) of WSNs, how to develop a suitable anomaly detection method becomes an important and urgent work. Up to now, researchers have done some works and proposed some anomaly detection methods for WSNs [1, 4–7], which took resource limitation into account to some extent.

As the first of four research directions in machine learning community, ensemble learning has attracted many researchers attention and been used widely in different applications [8]. However, seldom work was done for anomaly detection of WSNs. A large body of theoretical and empirical researches has shown that the combination of the detecting results of multiple individual detectors can improve the generalization performance observably, but original ensemble learning method usually needs to build and store multiple individual detectors which incur a large amount of computation and storage resource requirement and may not be appropriate for WSNs. The possible strategy is to select part of individual detectors to perform the anomaly detection. Consequently, the ensemble pruning is a necessary strategy [9], which can obtain the better (at least same) performance compared to the initial ensemble while the number of individual detectors decreased greatly.

Analyzing the spatiotemporal correlation of sensed data in WSNs and motived by the online ensemble learning method, the paper proposes a distributed anomaly detection method for WSNs from the perspective of both model building and resource saving. Further, to mitigate the high communication requirements caused by broadcast ensemble detectors, BBO based ensemble pruning is used to select the optimized individual detectors to build the final ensemble detector that has at least same performance compared to the initial ensemble detector. The main contributions of this paper include the following: (1)

A distributed anomaly detection method for WSNs is proposed based on online ensemble learning.

(2)

BBO based ensemble pruning is used to get the optimal subset for saving the limited store and communication resources in WSNs.

(3)

State matrix encoding method is designed for ensemble detector, which can decrease the communication and memory overhead significantly.

The rest of this paper is organized as follows. The related work is described in Section 2. Based on ensemble learning theory and BBO, our proposed anomaly detection method is presented in Section 3. Experiment analysis is provided in Section 4. Finally, conclusions and future work are presented in Section 5.

2. Related Work

To clearly analyze the motivation of the paper, the state of the art of three key aspects related to our paper is summarized, that is, anomaly detection in WSNs, online ensemble learning, and ensemble pruning.

2.1. Anomaly Detection Method and Classification in WSNs

With the rapid development and wide application of WSNs, some anomaly detection techniques for WSNs have been developed and summarized based on different perspective. For example, [10] discussed the prioritization of various characteristics of WSNs including of spatiotemporal and attribute correlations of sensed data, anomaly types, anomaly identification, anomaly score, and so forth. A brief overview of the classifications strategies for anomaly detection methods in WSNs deployed in harsh environment was provided, which grouped anomaly detection methods into four types, that is, statistical-based techniques, the nearest neighbor based techniques, the clustering based techniques, and classification-based techniques. Based on the nature of sensor data, specific requirements, and limitations of the WSNs, [1] provided a comprehensive overview of existing anomaly detection techniques specifically developed for WSNs. It presented a technique-based taxonomy and gave a comparative table which could be used as a guideline to select the suitable method for the specific application. For example, based on the characteristic such as data types, anomaly types, and anomaly degree, statistical-based methods are further classified into parametric-based methods and nonparametric-based methods. Based on how the probability distribution model is built, classification-based methods are categorized as support vector machined-based methods and radial basis function neural networks-based methods [11], and so on. The interested reader is referred to more anomaly detection methods and taxonomies in [5, 6, 12, 13]. These taxonomies of aforementioned methods may be some overlaps, and machine learning and computational intelligence-based techniques are an increasing important research direction beyond all doubt with respect to the complicated application. Moreover, though these methods have acceptable performance to some extent, the resource constraint usually was not or seldom taken into account. With the wide applications of WSNs, it also attracted some researchers' attentions [14]. Another noticeable characteristic of aforementioned methods was that only a single detector or model was trained. It is well-known that the single model may be not well learned the complicated decision boundary with respect to the complicated data set. For the sensed streaming data with the dynamic data distribution, single model is hard to or need expensive cost to learn and obtain the whole profile such as training artificial neural network, which leads to overlearning and degrade the generalization performance. Besides, concept drift [15] was a common phenomenon that occurred in dataset collected from WSNs, and the single mode was difficult in dealing with such dynamic changing of data distribution and providing a comprehensive detector to detect anomaly. Moreover, detector updating based on all available dataset is also a hard work for online learning.

2.2. Ensemble Learning Method

Ensemble learning is a computational intelligence method, and theory and experiment have proved that the combination of the predictions of many individual detectors can enhance the generalization performance. There are many different ensemble learning methods used widely and successfully such as Bagging [16, 17], Boosting [18, 19], Random Forest [20], and their online version [21, 22]. Generally, an ensemble anomaly detector is constructed in two steps. Firstly, a number of base detectors are trained using the training dataset. Secondly, a combination strategy of result is designed to obtain the aggregated result based on the results of each single detector. For time-series dataset such as sensed dataset in WSNs, learning a single model to profile the whole dataset usually is difficult or impossible. Generally, there are two categorized ensemble patterns to handle the streaming data, that is, horizontal ensemble and vertical ensemble. The former follows such strategy that the nearest n consecutive data chunks are firstly used to train n base detector and the combination method is employed to build the ensemble detector used to predict data in the yet-to-arrive chunk. The advantage of horizontal ensemble is that it can handle noise data in the streaming dataset because the prediction of newly arriving data chunk depends on the combination of different chunks. Even if the noise data may deteriorate some chunks, the ensemble can still generate relatively accurate prediction result. The disadvantage of horizontal ensemble is that the streaming data is continuously changing, and the information contained in the previous chunks may be invalid so that using these old concept models will not improve the overall result of prediction. The latter ensemble pattern is vertical ensemble, which uses the newest chunk to build ensemble model. The advantage of vertical ensemble is that it uses different algorithms (heterogeneous ensemble) on same dataset or same algorithm (homogeneous ensemble) on different sampling subdataset from the chunk to build the model, which can decrease the bias error between models. The disadvantage is that vertical ensemble assumes that the data chunk is errorless; in real situation, this precondition usually is hard to meet. Currently, because online ensemble learning method can address the concept drift and noisy data problem in streaming data, ensemble learning has been used in anomaly detection for WSNs [23–25]. In this paper, after exploiting the spatiotemporal correlation existing in the sensed dataset in WSNs, a distributed method is proposed based on horizontal ensemble and like-vertical ensemble. Section 3 will give the detailed description.

2.3. Ensemble Pruning Based on Optimization Search Method

Although there are many advantages for ensemble learning, the nontrivial disadvantage is that it needs more memory, especially more communication resource to store and communicate multiple detectors in WSNs, which can drain energy quickly and is intolerable in WSNs. Motivated by the “many could be better than all” in the ensemble learning community [9], it implied that the combination of all detectors maybe not a good choice in ensemble learning community. Ensemble pruning, as necessary strategy to solve resource-limitation question [26], is employed, which selects a subset of initial ensemble and obtains better or at least equal detecting performance than the original ensemble. The most advantage of ensemble pruning is that it reduces the communication requirement greatly. In WSNs, broadcasting the relative few detectors can save the battery energy considerably. However, it is well known that pruning an ensemble of size N requires to search in the space composed of $2^{N} - 1$ nonempty subensembles, which is a $N P$ complete problem. Hence, some heuristic searching approaches are used to find the expected appropriate subset. Biogeographical based optimization (BBO) [27, 28], as a novel population-based global optimization method, had some features in common with existing optimization methods such as genetic algorithm (GA) and harmony search (HS) [29]. In this paper, BBO is used to obtain an optimal/suboptimal ensemble for reducing the communication cost. To the best of our knowledge, as a new optimization method, there is no paper employing this method to apply in the fields of WSNs, and our study will extend its application.

3. Proposed Method

Motivated by the increasing online ensemble learning methodology [25] and considering the resource limitation of sensor node in WSNs, we propose a distributed online anomaly detection method based on the ensemble learning. Further, BBO is used for ensemble pruning to decrease the communication and memory requirements.

3.1. Problem Statement of WSNs

In this paper, we assume that the WSNs is applied in untouched area and to assure the sensed data quality, the sensor nodes usually are deployed densely. Besides, we assume that sensor nodes are time synchronized, which is mainly for clear presentation purpose rather than a limitation of our proposed method. Figure 1 shows WSNs, which consist of a large amount of sensor nodes and a base station (BS) [30]. Generally, the WSNs can be represented as a graph $G = (V, E)$ , where $V = {v_{1}, v_{2}, \dots, v_{| V |}}$ is a finite set of vertices and $E = {e_{1}, e_{2}, \dots, e_{| E |}}$ is a finite set of edges, and vertex ( $v_{i}, i = 1, \dots, | V |$ ) and edge ( $e_{i}, i = 1, \dots, | E |$ ) refer to sensor nodes and the one-hop or multihop communication link reachable between sensors $v_{i}$ and $v_{j}$ , respectively.

Figure 1

The considered WSN.

From Figure 1, we can clearly have an idea that some clusters are formed based on node geographical positions information and communication capability reachable. Here, we only consider the one-hop communications among sensor nodes. Similarly, this assumption is mainly for clear representation of our proposed method rather than a limitation of communication capability of sensors. In fact, our proposed method can easily extend to multiple hop relaying communication. Besides, in order to concisely describe our proposed anomaly detection method, a relatively small subnetwork consisted of some sensor nodes deployed densely is taken into account, which forms a cluster $C_{i}$ consisting of one cluster head node and a number of sensor nodes represented as ${C H}_{i}$ and $N_{i, j} : j = 1, \dots, |C_{i}|$ , respectively. For the whole WSNs, $V = C_{1} \cup C_{2} \cup, \dots, \cup C_{n}$ and $C_{i} \cap C_{j} = Φ$ . All nodes in a cluster are reachable to each other by one-hop communication and the communication between clusters depends on the direct links of cluster heads. In each cluster, the selection of cluster head is randomized among all nodes in that cluster to avoid draining of the energy.

For one cluster, $C_{i} = {{C H}_{i}, N_{i 1}, \dots, N_{i m}}$ , which contains a cluster head ${C H}_{i}$ and its m spatially neighboring nodes ( ${N_{i, j} : j = 1, \dots, m}$ ). Each sensor node in the subnetwork measures a data vector at every time interval $Δ t$ which is composed of multiple attribute values. For the cluster head ${C H}_{i}$ , the observation is $X^{i} = (x_{1}^{i}, x_{2}^{i}, \dots, x_{d}^{i})$ , where d denotes the dimension. For the jth neighbor node, $N_{i, j}$ , the observation is $X_{j}^{i} = (x_{j, 1}^{i}, x_{j, 2}^{i}, \dots, x_{j, d}^{i})$ . Nodes in the cluster collect samples synchronously and our proposed method is to identify these new observations of each sensor node as normal or anomalous online.

3.2. Spatial and Temporal (Spatiotemporal) Correlation of Sensed Dataset

For the sensed dataset in a cluster, we described the spatiotemporal correlation firstly, which will be used later to build our proposed online ensemble detection.

The collected sensor dataset from WSNs is a time series dataset. A time series is a sequence of value $X = {x (t), t = 1, \dots, n}$ which follows a nonrandom order and the n consecutive observation values are collected at same time intervals. Analyzing and learning from these observations [31] can help to understand the data trend over time and build the appropriate detector based on temporal correlation as well as to predict the label of new coming observations.

To obtain the detector, the foremost requirement is to achieve a stationary time series dataset. Some data processing methods are used to eliminate data trend and obtain a stationary time series dataset such as polynomial fitting, moving averages, differencing, and double exponential smoothing [32–34]. Considering the requirement of low computational complexity, a simple and efficient nonparametric technique (i.e., first differencing) is used to eliminate the temporal trend and obtain a stationary time series for dataset collected in WSNs, which can be formulated as:

\begin{matrix} X^{'} = \{x^{'} (s, t) = x (s, t) - x (s, t - 1) : t = 2,3, \dots, n\} . \end{matrix}

(1)

Besides, the sensor nodes are always deployed densely and the space redundancy existed. A dataset, $X = {x (s), s = 1, \dots, m}$ , is collected from m sensor nodes in a cluster at a timestamp. This dataset can help to understand the spatial correlation structure of data and predict the data value at a location nearby. Spatial data may present the local dependency which represents the similarity relationship of observations collected at adjacent locations in a local region. Usually, for a specified region, the observations of one sensor can be estimated by a linear weighted combination of observations collected at its adjacent locations [32], which can be expressed as

\begin{array}{l} x (s_{i}) = λ_{1} x (s_{1}) + \dots + λ_{i - 1} x (s_{i - 1}) + λ_{i + 1} x (s_{i + 1}) \\ + \dots + λ_{m} x (s_{m}), \end{array}

(2)

where

{s_{1}, \dots, s_{i - 1}, s_{i + 1}, \dots, s_{m}}

denotes positions of sensor nodes and

{λ_{1}, \dots, λ_{i - 1}, λ_{i + 1}, \dots, λ_{m}}

denotes the weights of observations,

\sum_{k = 1, k \neq i}^{m} λ_{k} = 1

Consequently, for sensed data collected in a local region, two reasonable assumptions are described as follows: (1)

The sensed data of adjacent nonfault sensor nodes are similar at the same timestamp.

(2)

The sensed data of adjacent nonfault sensor nodes have the similar trend over time.

Motivated by the two assumptions and ensemble learning theory, a novel anomaly detection method is proposed in this paper. We will give the details in the following section.

3.3. Proposed Ensemble Learning Method of Anomaly Detection in WSNs

Spatiotemporal correlation exists among sensor data in a local region of WSNs, and a relatively small component, that is, a cluster, consisting of a few of sensor nodes and a cluster head node, is taken into account to clearly describe proposed distributed anomaly detection method based on ensemble learning. Ensemble pruning based on BBO was adopted to optimize the initial trained detector for mitigating the resource requirements. The optimized ensemble detector was used to identify global anomalous observations at each individual sensor timely. Our proposed method is shown as Figure 2.

Figure 2

Distributed ensemble anomaly detection method based on BBO pruning in WSNs.

Online anomaly detection method consists of three key procedures, that is, detector training, online detecting, and online detector updating. From Figure 2, it can be seen that our proposed method enables each distributed deployment sensor node to globally judge every new coming observation normal or anomalous in time. Distributed detecting is employed to achieve load (communication, computation, and storage) evenly in the network and to prolong the lifetime of the whole network.

The whole procedure of proposed method is described as follows.

Step 1.

Considering the temporal correlation at the certain time period, each sensor node $s_{i}$ trains a local ensemble detector using the history dataset collected from a time interval. In facts, using this initial local ensemble detector, the new coming observation is normal or anomalous can be determined locally.

Step 2.

Each sensor node $s_{i}$ transmits its local ensemble detector as well as some related parameters, such as the maximize value, minimum value, and mean of training dataset, to the cluster head node and other member sensor node.

Step 3.

Cluster head node received the local ensemble detectors from its member nodes and combined with its own trained detector, the initial global ensemble detector is built.

Step 4.

The BBO method is introduced in the cluster head to prune the initial global ensemble detector and to obtain an acceptable final ensemble detector.

Step 5.

The pruned ensemble detector, that is, final ensemble detector, is broadcasted to its each member sensor node for online global anomaly detection.

Step 6.

Each sensor node selectively retains the test data for online update based on the predefined sampling probability p.

Step 7.

Once the updating condition was activated, the procedure of retraining and detector updating was triggered.

This method can scale well with increase of number of nodes in WSNs due to its distributed processing nature. It has low communication requirements and does not need to transmit any actual observations between cluster head node and its member sensor node, which saves the communication resource significantly.

Next, we described some important procedures mentioned above in detail. Further, considering the context of resource constraint of each sensor node in WSNs, some tricks are designed to save the communication and memory requirements.

3.3.1. Building the Initial Ensemble Detector

An initial ensemble detector is constructed by two steps. Firstly, a number of base detectors are trained sequentially for each sensor nodes in a cluster (including the cluster head node itself) based on the history dataset. Because the data distribution may be changed over time, the previous trained detector may be useless for the future detection. Moreover, the limited memory resource in the sensor node is another constraint to store too many previous detectors. In practice, according to the space of memory resource, only the latest multiple detectors are kept to build the initial local ensemble for one sensor node. For example, to sensor node i, the sensed data is collected and divided into data chunk based on a time interval $Δ t$ which is determined by the actual monitoring process. Consequently, each node trains multiple individual detectors over time. In our paper, supposing n latest detector is kept for a sensor node, if there are m nodes in one cluster, then totally $n * m$ detectors are obtained for the initial ensemble. Secondly, each sensor node (including cluster head node) broadcasts its n trained detector in the cluster. Taking the cluster head as an example, after all ( $n * (m - 1)$ ) individual detectors are received from its member nodes; the cluster head combines with its n trained detector, and the initial ensemble (including $n * m$ individual detectors) is built in cluster head node.

Many techniques can be employed for combining the results of each detector to obtain the final detection result. The common used method in the literature is the majority vote (for classification problem) and weighted average (for regression problem). In our paper, the final ensemble detection result can be calculated by (3), where $w_{i}$ denotes weight coefficient; that is, $w_{i} = 1$ means the simple average, otherwise weighted average. In our paper, for simplicity, the simple average strategy is employed to combine the finally result:

\begin{matrix} y_{f i n} (x) = \frac{1}{n * m} \sum_{i = 1}^{n * m} y_{i} (x) * w_{i} . \end{matrix}

(3)

3.3.2. Ensemble Pruning Based on BBO Search

To mitigate the expensive communication cost and high memory requirement induced by ensemble learning, inspired by the principle of “many could be better than all” in the ensemble learning community, the ensemble pruning is necessary.

Given an initial ensemble anomaly detector, $E = {A D_{1}, A D_{2}, \dots, A D_{n * m}}$ , $A D_{i}$ is a trained anomaly detector which can test an observation anomalous or not: a combination method C and a test dataset T. The goal of ensemble pruning is to find an optimal/suboptimal subset $E^{'} \subseteq E$ , which can minimize the generalization error and obtain better or at least same detection performance compared to E. Let $f_{i, j}$ ( $i = 1,2, \dots, m, j = 1,2 \dots, n$ ) be the fitness values of the detecting performance, such as true positive rate, false positive rate, accuracy, and so on. Obviously, the fitness value F can be defined as (4) based on the results of testing data:

\begin{matrix} F = [\begin{bmatrix} f_{1,1} & f_{1,2} & \dots & f_{1, n} \\ f_{2,1} & f_{2,2} & \dots & f_{2, n} \\ \dots & \dots & \dots & \dots \\ f_{m, 1} & f_{m, 2} & \dots & f_{m, n} \end{bmatrix}] . \end{matrix}

(4)

The final fitness function can be defined as

\begin{matrix} Maximize (\sum_{i = 1, \dots, m, j = 1, \dots, n}^{N^{'}} f_{i, j}), \\ s.t. N^{'} \leq m * n . \end{matrix}

(5)

Here, the problem of ensemble pruning is to find the subset of $E^{'}$ which was composed of part single detectors. Finding the optimized subset requires much heavier and more delicate computation resources. Biogeography-based optimization (BBO) is a novel optimization method and is employed to find out the acceptable set of ensemble. We only simply present some key information about BBO; the interested reader can be referred to the detailed description in [28].

BBO is a population-based, global optimization method, which has some common characteristics similar to the existing evolutionary algorithms (EAs) such as genetic algorithm (GA), particle swarm optimization (PSO), and ant colony optimization (ACO). When it was used to search the solution domain and obtain an optimal/suboptimal solution, some operators were employed to share information among solutions, which makes BBO applicable to many problems that GA and PSO are used. The more distinctive difference between BBO and other EAs can be seen in [27, 28].

The pseudo-code of ensemble pruning based on BBO can be described as shown in Algorithm 1 [7]. Here H indicates habit, HIS is fitness, and SIV (suitability index variable) is a solution feature.

Algorithm 1: Ensemble pruning BBO (E, T).

Input: E—initial ensemble anomaly detector, T—The number of maximization iteration

Output: $E^{'}$ —final ensemble anomaly detector

$/^{*}$ BBO parameter initialization $^{*}$ /

Create a random set of habitats (populations) $\{H_{1}, H_{2}, \dots, H_{N}\}$ ;

Compute corresponding fitness, that is, HSI values;

$/^{*}$ Optimization search process $^{*}$ /

While (!T)

Compute immigration rate λ and emigration rate u for each habitat based on HSI;

/ $^{*}$ Migration $^{*} /$

Select $H_{i}$ with probability based on $λ_{i}$ ;

If $H_{i}$ is selected

Select $H_{j}$ with probability based on $u_{j}$ ;

If $H_{j}$ is selected

Randomly select a SIV form $H_{j}$ ;

Replace a random SIV in $H_{i}$ with one from $H_{j}$ ;

End if

/ $^{*}$ Mutation $^{*}$ /

Select an SIV in $H_{i}$ with probability based on the mutation rate η;

If $H_{i}$ (SIV) is selected

Replace $H_{i}$ (SIV) with a randomly generated SIV;

End if

Re-compute HSI values;

T = T − 1;

End while

$/^{*}$ Ensemble pruning $^{*}$ /

Get the final ensemble of anomaly detector $E^{*}$ based on the habitats ${H_{i}}^{*}$ with acceptable HSI.

3.3.3. Some Tricks Designed to Mitigate the Communication Requirement

In the WSNs, the main reason of quick energy depletion is the radio communication among the sensor nodes. It has been known that the cost of communication of one bit equals the cost of processing thousands of bits in sensors [35]. This means that the most energy in sensor node is consumed by radio communication rather than collecting or processing data. Consequently, reducing the communication quantity will decrease the power resource requirement and eventually lengthen the lifetime of the whole WSNs.

It is obvious that the aforementioned method has relative high communication overhead. Each sensor node transmits its local ensemble detector to the cluster head and the final pruned global ensemble detector broadcasts back to its each member sensor nodes. In order to relieve communication burden, some skills are used to descend the communication overhead.

In fact, the distributed training/learning method only transmits the summary information of trained local ensemble detector to the cluster head which has significantly decreased the communication cost compared to centralized anomaly detection manners that sent all trained data to cluster head to build detector. Besides, after the pruned ensemble is obtained in cluster head node, each member sensor node in this cluster can obtain the pruned ensemble detector from the cluster head node. A straightforward method is broadcasting this pruned ensemble to its member sensor nodes. This is a common used strategy, but it does not make full use of local ensemble detector information and will cost more communication resources. Here, a state matrix P is designed in the cluster head; its element $p_{i, j}$ is defined by formula (6) to represent each single detector in initial ensemble. Then each local ensemble detector is represented as a bit string, using one bit for each single detector. Detector is included or excluded from the ensemble detector depending on the value of the corresponding bit; that is, 1 denotes this single detector that is included in the final ensemble, and 0 means it was not included:

\begin{array}{l} p_{i, j} = \{\begin{cases} 1 & {A D}_{i, j} \in E^{'}, i = 1 : m, j = 1 : n \\ 0 & otherwise, \end{cases} \\ P \\ = \begin{array}{l} 1 & 2 & i - 1 & i & i + 1 & \dots & n \\ \begin{array}{l} S_{1} \\ S_{2} \\ S_{m} \end{array} & [\begin{array}{l} 0 \\ 1 \\ \cdot \\ 0 \end{array} & \begin{array}{l} 1 \\ 0 \\ \cdot \\ 0 \end{array} & \begin{array}{l} \dots \\ \dots \\ \dots \\ \dots \end{array} & \begin{array}{l} 0 \\ 1 \\ \cdot \\ 1 \end{array} & \begin{array}{l} 1 \\ 1 \\ \cdot \\ 1 \end{array} & \begin{array}{l} 1 \\ 0 \\ \cdot \\ 1 \end{array} & \begin{array}{l} \dots \\ \dots \\ \dots \\ \dots \end{array} & \begin{array}{l} 1 \\ 0 \\ \cdot \\ 1 \end{array}] \end{array} . \end{array}

(6)

After the pruned procedure is finished, the cluster head broadcasts the state matrix P to its member sensor node, each sensor node keeps the single detector, whose corresponding value of state element equals 1, and it deletes the rest to build the pruned ensemble global detector. Employing the state matrix can save the energy greatly. For example, after the ensemble pruning is finished, $N^{'}$ ( $N^{'} \leq n * m$ ) individual detectors are broadcast in cluster. If matrix P is not used, it will need $4 * N^{'} * d$ bytes communication cost (suppose that the individual detector can be represented by d parameters and each parameter needs at least 4 bytes). If matrix P is introduced, each item of matrix P only needs 1 bit to represent an individual detector. Consequently, only $m * n / 8$ bytes are required to broadcast. Suppose that one-third of individual detectors are pruned (i.e., $N^{'} = 2 * n * m / 3$ ), then $(4 * n * m * d * 2 / 3) / (m * n / 8) \approx 21.33 d$ . By introducing the ensemble pruning and state matrix, the quantity of energy saving in cluster head sensor is significant and the lifetime of WSNs can be lengthened.

3.3.4. Online Update and Relearning

Distribution change of sensed dataset occurred possibly and detector updating is necessary. Online detector update will accompany a relearning procedure. A comprised strategy (i.e., delay updating strategy [36]) can cater this situation and save the computation, communication, and memory resources to some extent. Simple to say, for the new coming observation, whether saving and using it to update the current detector or not are decided by a sample probability p. Some heuristic rules can be employed to guide its value; for example, if the dynamics is relatively stationary, the small p should be used; otherwise, the big p should be chosen. When the buffer of a sensor node is replaced by the new data completely, online update is triggered and new detector is trained. The pseudo-codes of algorithm can be described as shown in Algorithm 2.

Algorithm 2: Online updating ( $E^{'}, p$ ).

Input: $E^{'}$ —Current pruned ensemble anomaly detector, p—Sampling probability

Output: $E^{*}$ —Updated pruned ensemble anomaly detector

For each sensor node

Retain the new observation with probability p;

If buffer is replaced completely by new observations

Train new detector and transmit its summary to cluster head;

$E^{*}$ = Ensemble_Pruning_BBO( $E^{'}, T$ )

Broadcast $E^{*}$ to its member sensor node for subsequent anomaly detection.

4. Experimental and Analysis

In this section, the dataset, data preprocessing method, experiment results, and analysis are described, respectively. Experiments were conducted on a personal PC with Intel Core 2 Duo CPU, P7450@2.13 GHZ, and 4 GB memory. The operating system is Windows 7 professional. The data processing is partly on the MATLAB 2010, and the algorithm mentioned in Section 3 was implemented with Microsoft Visual C++ platform.

4.1. Dataset and Data Preprocessing

IBRL datasets [37] were used in our paper to validate proposed method, which was collected from a WSN deployed in Intel Research Laboratory at University of Berkeley and commonly used to evaluate the performance of some existing models for WSNs [35, 36, 38–41]. This network consists of 54 Mica2Dot sensor nodes; Figure 3 shows the location of each node of the deployment (node locations are shown in black hexagon with their corresponding node IDs) [35]. The whole dataset was collected from 29/02/2004 to 05/04/2004. Four types of measures data, that is, light, temperature, and humidity as well as voltage, were collected and those measurements were recorded in 31 s interval. Because these sensors were deployed inside a lab and the measurement variables had little changes over time (except the light having the sudden changes due to the irregular nature of this variable and frequent on/off operation), this dataset was considered a type of static datasets for many researchers. In our experiments, to evaluate our proposed anomaly detection algorithm, some artificial anomalies are created by randomly modifying some observations which is widely used by many researchers in the literature [41].

Figure 3

Sensor nodes location in the IBRL deployment.

Since our proposed method adopts the cluster structure, a cluster (consisting of 4 sensor nodes, i.e., N7, N8, N9, and N10) and dataset (collected on 29/02/2004) are chosen. The data distribution can be seen in [7]. Here, only part observations (during 00:00:00 a.m.–07:59:59 a.m.) from each sensor node are employed to evaluate proposed method. The data trend is depicted in Figure 4.

Figure 4

The data (temperature, humidity) trend during 0:00:00 a.m.–7:59:59 a.m. on February 29, 2004.

From Figure 4, an obvious fact is that data distribution in a cluster is almost same which well proved that spatial correlation exists. Though there are some trivial differences, after analyzing the dataset carefully, the main reason is that dataset has some missing data points largely due to packet loss which can be further proved from Figure 4. In our experiment, these missing observations can be interpolated using the method described in Section 3.3. The obvious fact is that sudden peak/valley appeared in Figure 4 for each sensor observation, which implies that an interested event may occurred.

Suppose that $D = \{x_{i}, y_{i}\}$ , $i = 1,2, \dots, n$ is a dataset used to train an anomaly detector. Here, the $x_{i}$ is a vector with feature values and $y_{i}$ is the label which indicates whether the given observation is normal or anomalous. Because the IBRL dataset regards all its observations as normal, some anomaly data points are generated and inserted to evaluate the performance of our proposed method. In the paper, a number of 30 data points of artificial anomalies for each sensor were injected consecutively in each dataset to calculate the true positive rate (TPR), false negative rate (FPR), and detection accuracy (ACC). Without loss of generality, the anomalous dataset should follow a distribution very much different from that of the training dataset, but their ranges should be overlapped as much as possible. Besides, an anomalous event should be a small probability event for a normal dataset collected by a nonfault sensor node. The anomalies were generated using a normal randomizer with slightly deviate statistical characteristics from the normal data characteristics [41]. The detailed dataset information (including statistical parameters) of selected sensor node is presented in Table 1.

Table 1

Detail dataset information of selected sensor node on 29/02/2004.

Node	Initial sample	Mean		Variance		Injected anomaly	Mean		Variance
Node	Initial sample	T	H	T	H	Injected anomaly	T	H	T	H
N7	823	18.4154	40.9176	0.5238	1.4494	30	18.21	41.10	0.54	1.46
N8	548	17.9844	41.7123	0.5315	1.4612	30	17.75	41.95	0.55	1.48
N9	652	18.1140	42.6295	0.5288	1.4827	30	18.35	42.45	0.55	1.50
N10	620	18.1144	42.6215	0.5244	1.4191	30	18.33	42.47	0.54	1.43

T: temperature, H: humidity.

4.2. Performance Evaluation Metrics and BBO Parameters

In order to evaluate our proposed method, some commonly used performance evaluation metrics for anomaly detection are used in our paper, such as detection accuracy (ACC), true positive rate (TPR), and false positive/alarm rate (FPR). They are described as follows:

\begin{matrix} ACC = \frac{(TP + TN)}{(TP + TN + FP + FN)}, \\ TPR = \frac{TP}{(TP + FN)}, \\ FPR = \frac{FP}{(FP + TN)}, \end{matrix}

(7)

where TP means number of samples correctly predicted as anomaly class, FP means number of samples incorrectly predicted as anomaly class, TN means number of samples correctly predicted as normal class, and FN means number of samples incorrectly predicted as normal class.

BBO is employed to prune the initial ensemble, and the migration model is same as that present in [27, 28] and the related parameters are set as follows.

Habitat (population) size $S = 30$ , the number of SIVs (suitability index variables) in each island $n = {20,40,60,80}$ , the maximum migration rates $E = 1$ and $I = 1$ , and the mutation rate $η = 0.01$ , and λ, μ are the immigration rate and the emigration rate, respectively. The elitism parameter $ρ = 2$ .

HSI (habitat suitability index) is a fitness function similar to other population-based optimization algorithms. HIS is evaluated by F-measure (F-score), which considers both the precision probability and the recall probability of binary classification problem:

\begin{array}{l} F -measure = \frac{(1 + β^{2}) precision * recall}{β^{2} * precision + recall} \\ = \frac{(1 + β^{2}) * TP}{(1 + β^{2}) * TP + β^{2} * FN + F P} . \end{array}

(8)

F-measure can be interpreted as a weighted average of the precision and recall, and its value reaches best at 1 and worst at 0. β is a parameter used to adjust the relative importance between precision and recall, $β = {0.5,1, 2}$ . Usually, the value of F-measure is close to the relative small value of precision and recall; that is, the big F-measure means that the precision and recall are all big. Consequently, a good detector is analogous to a habitat with a high HSI and is included in the final ensemble detector, and a poor detector is analogous to a habitat with a low HIS and is discarded from the final ensemble detector. In our paper, $β = 1$ is specified.

4.3. Results Presentation and Discussions

In the data mining and machine learning communities, SVM-based method has been widely used in classification problem, which separates the data belonging to the different classes by fitting a hyperplane. One class SVM based method, as a variation of this method, is especially favored for anomaly detection [42–44]. In the paper, it was used to train the base detector. The dataset of each sensor node was divided into two parts: about 66% was used for training the local detector and the remainder as the test set was to evaluate proposed method.

Online Bagging, the commonly used ensemble strategy, was used to build initial ensemble detector. Our experiments aim to achieve two goals. Firstly, it is to prove the effectiveness of proposed method based on ensemble learning theory. Secondly, it is to prove that pruned ensemble detector can obtain better (at least equal) performance compared to initial ensemble detector and mitigate the resource requirement. As a result, three experiments were done, that is, local ensemble anomaly detector only considering the temporal correlation of each sensor node, global ensemble anomaly detector considering the spatiotemporal correlation, and the global pruned ensemble anomaly detector based on BBO. The experimental results can be seen in Tables 2, 3, and 4, respectively.

Table 2

Detection performance of local ensemble detector.

Ensemble size	N7			N8			N9			N10
Ensemble size	ACC	TPR	FPR	ACC	TPR	FPR	ACC	TPR	FPR	ACC	TPR	FPR
5	0.8700	0.5833	0.1181	0.7900	0.3333	0.1809	0.8267	0.5000	0.1549	0.8267	0.5714	0.1608
10	0.8800	0.6667	0.1111	0.8033	0.3889	0.1702	0.8267	0.4375	0.1514	0.8333	0.6429	0.1573
15	0.8900	0.7500	0.1042	0.8167	0.5000	0.1631	0.8433	0.5000	0.1373	0.8600	0.7143	0.1329
20	0.8933	0.8333	0.1042	0.8200	0.5000	0.1596	0.8367	0.5000	0.1444	0.8567	0.7143	0.1364

Table 3

Detection performance of global ensemble detector [7].

Combined ensemble size	N7			N8			N9			N10
Combined ensemble size	ACC	TPR	FPR	ACC	TPR	FPR	ACC	TPR	FPR	ACC	TPR	FPR
20	0.9467	0.8333	0.0486	0.9300	0.7778	0.0603	0.9467	0.7500	0.0423	0.9500	0.7857	0.0420
40	0.9700	0.7500	0.0208	0.9433	0.8333	0.0496	0.9710	0.8938	0.0246	0.9650	0.8929	0.0315
60	0.9700	0.8333	0.0243	0.9733	0.8889	0.0213	0.9800	0.9375	0.0176	0.9783	0.9357	0.0196
80	0.9817	0.9583	0.0174	0.9800	0.9444	0.0177	0.9767	0.9375	0.0211	0.9780	0.9714	0.0217

Table 4

Detection performance of global ensemble detector based on BBO pruning [7].

Ensemble size (BBO pruned)	N7			N8			N9			N10
Ensemble size (BBO pruned)	ACC	TPR	FPR	ACC	TPR	FPR	ACC	TPR	FPR	ACC	TPR	FPR
14	0.9480	0.8000	0.0458	0.9327	0.7667	0.0567	0.9500	0.8125	0.0423	0.9533	0.8571	0.0420
23	0.9710	0.7750	0.0208	0.9447	0.8000	0.0461	0.9733	0.9250	0.0239	0.9697	0.9143	0.0276
27	0.9713	0.8500	0.0236	0.9683	0.8333	0.0230	0.9810	0.9563	0.0176	0.9797	0.9357	0.0182
32	0.9820	0.9750	0.0177	0.9750	0.8333	0.0160	0.9820	0.9500	0.0162	0.9830	0.9786	0.0168

Table 2 shows the performance of each sensor node under the different ensemble size, which does not take into account the spatial correlation of sensed data in a cluster. Though the ensemble detection performance is becoming “good” gradual with the increasing of ensemble size (the higher value of ACC, TPR, the better performance, and the lower value of FPR, the better performance), the overall performance is relatively low. The maximum value of detection accuracy is only 89.33%, and most of true positive rates are unacceptable and most of false positive rates (FPR) have a relative high value. All these results indicate that the performance of local ensemble detector is poor. Table 3 shows the global detection performance of each sensor node. Here, after the local ensemble detector was trained, each member node sent its local ensemble to each other to form the global ensemble detector and each member node used this global detector to online test the local observation. From the results of Table 3 [7], an obvious fact is that the detection performances are higher than presented in Table 2. With the help of neighbor detector, the detection results become better and better corresponding to the increasing of ensemble size.

In order to further optimize the proposed algorithm performance and save the resource, ensemble pruning is used for global ensemble detector; Table 4 [7] shows the result of detection performance of pruned global ensemble detector based on BBO.

Table 4 shows a more practicable result, and the size of global ensemble decreases sharply, while the detector performance is as good as or better than the initial global ensemble detector. From the results of Table 5, when the size of initial ensemble reaches 80, the 60% resource cost is saved. In our experiment, only for validating the method effectively, we set the ensemble sizes 5, 10, 15, and 20 for each local ensemble detector, which may be small for the practical applications. In fact, how many local ensemble detectors are chosen is an open topic and is decided by many factors such as the computation capability and the communication cost as well as memory usage of sensor node, the expected detecting accuracy requirement, and so on. In the practical application, a trade-off is commonly considered.

Table 5

Rate of saving resource cost based on global ensemble detector of BBO pruned.

Number	Initial ensemble size	Pruned ensemble size	Saving resource cost
1	20	14	30%
2	40	23	42.5%
3	60	27	55%
4	80	32	60%

5. Conclusion and Future Work

After exploiting the spatiotemporal correlation existing in the sensed data of WSNs and motivated by the advantages of online ensemble learning, a distributed online ensemble anomaly detector method has been proposed. Due to the specific resource constrained in the WSNs, ensemble pruning based on BBO is employed to mitigate the high resource requirement and obtain the optimized detector that performs at least as good as the original ones. The experimental results on real dataset demonstrated that our proposed method is effective.

Because the diversity of base learners is a key factor related to the performance of ensemble learning, as a possible extension of our work, we plan to include some diversity measures in fitness function to improve the detecting performance in future. Besides, the cost of communication is the main reason of quick energy depletion of sensor nodes, especially for the cluster head, the adaptive selection of cluster head based on energy state will be taken into account to lengthen the lifetime of WSNs in next work.

Footnotes

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

This work is supported by the National Key Scientific Instrument and Equipment Development Project (2012YQ15008703), the Zhejiang Provincial Natural Science Foundation of China (LY13F020015), the Open Project of Top Key Discipline of Computer Software and Theory in Zhejiang Provincial (ZC323014100), National Science Foundation of China (61473182), Science and Technology Commission of Shanghai Municipality (11JC1404000, 14JC1402200), and Shanghai Rising-Star Program (13QA1401600).

References

Zhang

Meratnia

Havinga

Outlier detection techniques for wireless sensor networks: a survey

IEEE Communications Surveys and Tutorials 2010 12 2 159 170

10.1109/surv.2010.021510.00088

2-s2.0-77955082590

Zhang

Hamm

N. A. S.

Meratnia

Stein

van de Voort

Havinga

P. J. M.

Statistics-based outlier detection for wireless sensor networks

International Journal of Geographical Information Science 2012 26 8 1373 1392

10.1080/13658816.2012.654493

2-s2.0-84864698431

Peng

Han

Q.-L.

A novel event-triggered transmission scheme and L₂ control co-design for sampled-data control systems

IEEE Transactions on Automatic Control 2013 58 10 2620 2626

Rajasegarar

Leckie

Palaniswami

Bezdek

J. C.

Distributed anomaly detection in wireless sensor networks

Proceedings of the 10th IEEE Singapore International Conference on Communication systems (ICCS ′06)

October 2006

Singapore

IEEE

1 5

10.1109/ICCS.2006.301508

Rajasegarar

Leckie

Palaniswami

Anomaly detection in wireless sensor networks

IEEE Wireless Communications 2008 15 4 34 40

10.1109/mwc.2008.4599219

2-s2.0-49749114376

Xie

Han

Tian

Parvin

Anomaly detection in wireless sensor networks: a survey

Journal of Network and Computer Applications 2011 34 4 1302 1325

10.1016/j.jnca.2011.03.004

2-s2.0-79956116601

Ding

Fei

Online anomaly detection method based on BBO ensemble pruning in wireless sensor networks

Life System Modeling and Simulation 2014 461

Berlin, Germany

Springer

160 169 Communications in Computer and Information Science

10.1007/978-3-662-45283-7_17

Dietterich

T. G.

Machine-learning research—four current directions

AI Magazine 1997 18 4 97 136

2-s2.0-0031361611

Zhou

Z.-H.

Tang

Ensembling neural networks: many could be better than all

Artificial Intelligence 2002 137 1-2 239 263

10.1016/s0004-3702(02)00190-x

MR1906477

2-s2.0-0036567392

10.

Shahid

Naqvi

I. H.

Qaisar

S. B.

Characteristics and classification of outlier detection techniques for wireless sensor networks in harsh environments: a survey

Artificial Intelligence Review 2012 137 1 36

10.1007/s10462-012-9370-y

2-s2.0-84868681399

11.

Fei

A fast multi-output RBF neural network construction method

Neurocomputing 2010 73 10–12 2196 2202

10.1016/j.neucom.2010.01.014

2-s2.0-77952485893

12.

Gil

Santos

Cardoso

Dealing with outliers in wireless sensor networks: an oil refinery application

IEEE Transactions on Control Systems Technology 2014 23 4 1589 1596

10.1109/tcst.2013.2288519

2-s2.0-84887933698

13.

Rassam

M. A.

Maarof

M. A.

Zainal

Adaptive and online data anomaly detection for wireless sensor systems

Knowledge-Based Systems 2014 60 44 57

10.1016/j.knosys.2014.01.003

2-s2.0-84896405910

14.

Rajasegarar

Gluhak

Ali Imran

Nati

Moshtaghi

Leckie

Palaniswami

Ellipsoidal neighbourhood outlier factor for distributed anomaly detection in resource constrained networks

Pattern Recognition 2014 47 9 2867 2879

10.1016/j.patcog.2014.04.006

2-s2.0-84900803761

15.

Zhang

Concept drift detection via competence models

Artificial Intelligence 2014 209 11 28

10.1016/j.artint.2014.01.001

MR3165892

2-s2.0-84893974910

16.

Breiman

Bagging predictors

Machine Learning 1996 24 2 123 140

ZBL0858.68080

2-s2.0-0030211964

17.

Seguí

Igual

Vitrià

Bagged one-class classifiers in the presence of outliers

International Journal of Pattern Recognition and Artificial Intelligence 2013 27 5

1350014

10.1142/s0218001413500146

2-s2.0-84885145479

18.

Duffy

Helmbold

Boosting methods for regression

Machine Learning 2002 47 2-3 153 200

10.1023/a:1013685603443

2-s2.0-0036568038

19.

Chang

W.-C.

Cho

C.-W.

Online boosting for vehicle detection

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics 2010 40 3 892 902

10.1109/tsmcb.2009.2032527

2-s2.0-77952581402

20.

Désir

Bernard

Petitjean

Heutte

One class random forests

Pattern Recognition 2013 46 12 3490 3506

10.1016/j.patcog.2013.05.022

2-s2.0-84881077165

21.

Fern

Givan

Online ensemble learning: an empirical study

Machine Learning 2003 53 1-2 71 109

10.1023/a:1025619426553

2-s2.0-0141921552

22.

Bifet

Holmes

Pfahringer

Gavaldà

Improving adaptive bagging methods for evolving data streams

Advances in Machine Learning 2009 5828

Berlin, Germany

Springer

23 37 Lecture Notes in Computer Science

10.1007/978-3-642-05224-8_4

23.

Curiac

D. I.

Volosencu

Ensemble based sensing anomaly detection in wireless sensor networks

Expert Systems with Applications 2012 39 10 9087 9096

10.1016/j.eswa.2012.02.036

2-s2.0-84859212083

24.

Zhou

A novel system anomaly prediction system based on belief markov model and ensemble classification

Mathematical Problems in Engineering 2013 2013 10

179390

10.1155/2013/179390

2-s2.0-84884842001

25.

Chen

Incremental learning from stream data

IEEE Transactions on Neural Networks 2011 22 12 1901 1914

10.1109/TNN.2011.2171713

2-s2.0-83855162220

26.

Fei

A novel forward gene selection algorithm for microarray data

Neurocomputing 2014 133 446 458

10.1016/j.neucom.2013.12.012

2-s2.0-84894581321

27.

An analysis of the equilibrium of migration models for biogeography-based optimization

Information Sciences 2010 180 18 3444 3464

10.1016/j.ins.2010.05.035

ZBL1194.92073

2-s2.0-77958150944

28.

Simon

Biogeography-based optimization

IEEE Transactions on Evolutionary Computation 2008 12 6 702 713

10.1109/tevc.2008.919004

2-s2.0-57249115093

29.

Sheen

Anitha

Sirisha

Malware detection by pruning of parallel ensembles using harmony search

Pattern Recognition Letters 2013 34 14 1679 1686

10.1016/j.patrec.2013.05.006

2-s2.0-84885665087

30.

Zhang

Y.-Y.

Chao

H.-C.

Chen

Shu

Park

C.-H.

Park

M.-S.

Outlier detection and countermeasure for hierarchical wireless sensor networks

IET Information Security 2010 4 4 361 373

10.1049/iet-ifs.2009.0192

2-s2.0-78650330275

31.

Peng

Fei

M.-R.

An improved result on the stability of uncertain T-S fuzzy systems with interval time-varying delay

Fuzzy Sets and Systems 2013 212 97 109

10.1016/j.fss.2012.06.009

MR2996153

2-s2.0-84868532539

32.

Zhang

Observing the Unobservable: Distributed Online Outlier Detection in Wireless Sensor Networks 2010

Enschede, The Netherlands

University of Twente

33.

Peng

Yue

Fei

Relaxed stability and stabilization conditions of networked fuzzy control systems subject to asynchronous grades of membership

IEEE Transactions on Fuzzy Systems 2014 22 5 1101 1112

10.1109/tfuzz.2013.2281993

34.

Peng

Fei

M.-R.

Tian

Guan

Y.-P.

On hold or drop out-of-order packets in networked control systems

Information Sciences 2014 268 436 446

10.1016/j.ins.2013.08.003

MR3180791

2-s2.0-84897038995

35.

Rassam

M. A.

Zainal

Maarof

M. A.

An adaptive and efficient dimension reduction model for multivariate wireless sensor networks applications

Applied Soft Computing Journal 2013 13 4 1978 1996

10.1016/j.asoc.2012.11.041

2-s2.0-84890442312

36.

Xie

Han

Chen

H.-H.

Scalable hypergrid k-NN-based online anomaly detection in wireless sensor networks

IEEE Transactions on Parallel and Distributed Systems 2013 24 8 1661 1670

10.1109/tpds.2012.261

2-s2.0-84880077006

37.

Intel Berkely Reseach Lab (IBRL) dataset, 2004, http://db.csail.mit.edu/labdata/labdata.html

38.

Branch

J. W.

Giannella

Szymanski

Wolff

Kargupta

In-network outlier detection in wireless sensor networks

Knowledge and Information Systems 2013 34 1 23 54

10.1007/s10115-011-0474-5

2-s2.0-84872353254

39.

Moshtaghi

Havens

T. C.

Bezdek

J. C.

Park

Leckie

Rajasegarar

Keller

J. M.

Palaniswami

Clustering ellipses for anomaly detection

Pattern Recognition 2011 44 1 55 69

10.1016/j.patcog.2010.07.024

2-s2.0-77956478694

40.

Rajasegarar

Bezdek

J. C.

Leckie

Palaniswami

Elliptical anomalies in wireless sensor networks

ACM Transactions on Sensor Networks 2009 6 1 1 28

10.1145/1653760.1653767

2-s2.0-75149159654

41.

Rassam

M. A.

Zainal

Maarof

M. A.

One-class principal component classifier for anomaly detection in wireless sensor network

Proceedings of the 4th International Conference on Computational Aspects of Social Networks (CASoN ′12)

November 2012

São Carlos, Brazil

IEEE

271 276

10.1109/CASoN.2012.6412414

42.

Sagha

Bayati

Millán

J. D. R.

Chavarriaga

On-line anomaly detection and resilience in classifier ensembles

Pattern Recognition Letters 2013 34 15 1916 1927

10.1016/j.patrec.2013.02.014

2-s2.0-84885079530

43.

Hejazi

Singh

Y. P.

One-class support vector machines approach to anomaly detection

Applied Artificial Intelligence 2013 27 5 351 366

10.1080/08839514.2013.785791

2-s2.0-84878732872

44.

Zhang

Meratnia

Havinga

P. J. M.

Distributed online outlier detection in wireless sensor networks using ellipsoidal support vector machine

Ad Hoc Networks 2013 11 3 1062 1074

10.1016/j.adhoc.2012.11.001

2-s2.0-84875691648

A Novel Distributed Online Anomaly Detection Method in Resource-Constrained Wireless Sensor Networks

Abstract

1. Introduction

2. Related Work

2.1. Anomaly Detection Method and Classification in WSNs

2.2. Ensemble Learning Method

2.3. Ensemble Pruning Based on Optimization Search Method

3. Proposed Method

3.1. Problem Statement of WSNs

3.2. Spatial and Temporal (Spatiotemporal) Correlation of Sensed Dataset

3.3. Proposed Ensemble Learning Method of Anomaly Detection in WSNs

Step 1.

Step 2.

Step 3.

Step 4.

Step 5.

Step 6.

Step 7.

3.3.1. Building the Initial Ensemble Detector

3.3.2. Ensemble Pruning Based on BBO Search

Algorithm 1: Ensemble pruning BBO (E, T).

3.3.3. Some Tricks Designed to Mitigate the Communication Requirement

3.3.4. Online Update and Relearning

Algorithm 2: Online updating ( E ′ , p ).

4. Experimental and Analysis

4.1. Dataset and Data Preprocessing

4.2. Performance Evaluation Metrics and BBO Parameters

4.3. Results Presentation and Discussions

5. Conclusion and Future Work

Footnotes

Conflict of Interests

Acknowledgments

References

Algorithm 2: Online updating ( $E^{'}, p$ ).