Abstract
As WSNs gain popularity, they are becoming more and more necessary for traffic anomaly detection. Because worms, attacks, intrusions, and other kinds of malicious behaviors can be recognized by traffic analysis and anomaly detection, WSN traffic anomaly detection provides useful tools for timely reaction and appropriate prevention in network security. In the paper, we improve exploitation of GM(1,1) model to make traffic prediction and judge the traffic anomaly in WSNs. Based on our systematical researches on the characteristics of WSN traffic, the causes of WSN abnormal traffic, and latest related research and development, we better exploit the GM(1,1) model following four guidelines: using a sliding window to determine historical data for modeling, optimizing initial value of one-order grey differential equation, making traffic prediction by short step exponential weighted average method, and judging whether the traffic of the next moment is abnormal by Euclidean distance. Then, we propose a traffic anomaly detection algorithm for WSNs based on the improved exploitation of GM(1,1) model. Simulation results and comparative analyses demonstrate that our proposed WSN traffic anomaly detection algorithm can reduce the undetected rate and has better anomaly detection accuracy than traditional traffic anomaly detection algorithms.
1. Introduction
In recent years, the emergence of a variety of wireless sensor networks (WSNs) applications, such as military applications [1], home automation [2], smart building [3], health and medical applications [4], vehicle and target tracking [5], and industry domains [6, 7], has been prompted by the developments in the field of distributed computing and microelectromechanical systems. In general, a WSN is composed of a mass of battery-powered thick-deployed and low-power sensor nodes with sensing, processing, and storage capabilities and wireless communication [6]. Monitoring a certain phenomenon, such as object tracking or environmental data, is the main purpose of sensor nodes composed of power, sensing, computing, and communication modules [8].
As WSNs gain popularity, they are becoming more and more necessary for traffic anomaly detection. In a WSN, traffic anomaly detection is a useful method to understand the network behavior and determine network performance and reliability contributing to effective and prompt troubleshooting and resolving various issues. Over the past few years, traffic anomaly detection, applied in WSN scenario, has become increasingly a dynamic field of study. Furthermore, intrusions, attacks, worms, and other kinds of malicious behaviors can be identified by traffic analysis and anomaly detection, so traffic anomaly detection in a WSN provides a sound basis for prevention and reaction in network security.
As is well known, in the wired networks, in order to correctly detect abnormal traffic, traffic anomaly detection has been widely discussed and a variety of methods have been exploited. Because the traffic characteristics of traditional wired networks are greatly different from WSNs, the method of detecting abnormal served wired networks cannot be directly applied to WSNs. The fact that the nodes energy, storage capacity, and computing power are severely limited is an obvious characteristic of a WSN. In this case, while designing the WSN traffic anomaly detection algorithm is a huge challenge, dealing with the application correlation (burst) and nonstationary characteristics of WSN traffic is another huge challenge.
In this paper, we summarize the characteristics of traffic and the causes of abnormal traffic in a WSN. Classification research on traffic anomaly detection model and method in WSN is made, and comparative analyses are also carried out. The GM(1,1) model is efficient and has low computational complexity. So it is quite suitable for the real-time traffic anomaly detection of WSN in which the energy and capability in calculation of the node are limited. We better exploit the GM(1,1) model following four guidelines: using a sliding window to determine historical data for modeling, optimizing initial value of one-order grey differential equation, making traffic prediction by short step exponential weighted average method, and judging whether the traffic of the next moment is abnormal by Euclidean distance. Simulation results and comparative analyses indicate that the novel algorithm, which is based on improved exploitation of GM(1,1) model, possesses higher detection accuracy and better real time than the traditional method.
The remainder of this paper is organized as follows. In Section 2, we briefly introduce the existing anomaly detection algorithms and make a comparative analysis of them. In Section 3, we analyze the characteristics of WSN traffic and the cause of WSN abnormal traffic in depth and introduce GM(1,1) model in detail. Then, we design four methods to improve exploitation of GM(1,1) model in Section 4. And a complete traffic anomaly detection algorithm for WSN is proposed in Section 5. In Section 6, we use Matlab to simulate this algorithm and the simulation results demonstrate that this algorithm can reduce the undetected rate and improve the detection accuracy. Section 7 concludes our paper.
2. Related Work
The researches on traffic anomaly detection can be classified into three main research directions, namely, detection based on feature and behavior, statistic-based detection, and intelligent detection based on machine learning and data mining. Here, we will review the main research directions.
2.1. Detection Based on Feature and Behavior
The method to detect the anomaly, which is based on the flow characteristics and behavior, is to detect abnormal traffic through looking for patterns matching the anomalous traffic in traffic data of the network. This method, which requires the input of network traffic or data packets, has real-time performance and good detection accuracy. The approach can not only detect network anomalies, but also be applied to analyze and ascertain the types.
However, due to this method's requirement for real-time comparison between the features of abnormal traffic and current traffic, the database of the characteristics of abnormal traffic is a vital factor restricting the detection accuracy. In this method, a huge feature database needs to be built and constantly updated, which will be a great challenge for wireless sensor networks with constrained computing and storage capacity.
In [9], Wang extracts profiles of the characteristic of sensor nodes and network behavior through wireless sensor network packet traffic, and then anomalies can be identified by monitoring behavior of nodes and network.
2.2. Detection Based on Statistics
Detection method based on statistics, mainly including CUSUM algorithms and wavelet analysis, does not require advanced knowledge of the behaviors characteristics of nodes and network. It directly calculates statistics of the inputted traffic data, such as mean and variance, and then, according to statistical bias, we can determine whether the traffic is abnormal.
In [10], a multistatistics modified CUSUM algorithm (M-CUSUM), which is based on matrix, is proposed. By computing the ratio between the sum of subtracting and absolute value of traffic among ingress and egress ports, it can real-timely detect network flow. A wavelet analysis-based real-time anomaly detection (WARAD) algorithm, proposed in [11], reversely collects the network traffic in real time and then utilizes the variance of the wavelet coefficients. This method can not only improve the accuracy and the instantaneity of anomaly detection, but also reduce the computational complexity of solving the Hurst values. Moreover, the variances of different level wavelet coefficients compose Hurst parameters of different decomposition levels. Therefore, through only detecting marked change of variances of adjacent level wavelet coefficients, abnormalities can be determined.
2.3. Intelligent Detection Based on Machine Learning and Data Mining
In this type of algorithm, anomaly detection is usually regarded as a clustering or classification problem, and then a machine learning model can be established. Finally, judgment is made in real time. This intelligent method includes many segments branches, such as ARMA model, Markov model, support vector machine (SVM), Backpropagation (BP) Neural Networks, and Immune-Genetic Algorithm.
In [12], a series of Markov models, including tree-indexed Markov chains, are applied to characterize the network behavior. Moreover, optimal decision rules and large deviations techniques are made use of to identify anomalies. A community intrusion detection system on the strength of classification of support vector machine (SVM) is presented by Tian et al. in [13]. In [14], the researchers put forward two new clustering algorithms, namely, the supervised improved competitive learning network (SICLN) and the improved competitive learning network (ICLN). In [15], in order to maximize the detection rates, an enhanced method to detect DDoS attacks, the parameters of the traffic matrix of which are optimized by using a Genetic Algorithm (GA), is proposed.
In the last three sections, the current mainstream methods to detect traffic anomaly in WSNs are summarized. And Table 1, in which G means good, B means bad, H means high, L means low, N means normal, and R means relatively, presents advantages and disadvantages of their performance.
Performance of different detection methods.
Notations. G: good; B: bad; H: high; L: low; N: normal; R: relatively.
In Table 1, independence is the performance of the detection method, which is alone applied to detect anomalies. Usually, the methods with relatively bad and bad independence are optimization and assist methods [16]. The method, which is based on feature and behavior, demands that feature database is built, which needs abundant data. The method based on Markov model needs to get Markov prediction model, which requires a mass of data. Similarly, the last three methods also require plenty of data. Generally, the complexity is also increasing with the improvement of detection accuracy. A detection method with low complexity and high accuracy is our research goal.
3. Theoretical Analysis
3.1. WSN Traffic Characteristics
On the whole, there are two important properties, namely, imbalance and application correlation [16], for WSN traffic:
The imbalance is mainly reflected in traffic of sensor nodes and convergence nodes. A large proportion of data is transferred from sensor nodes to convergence nodes, but only a small proportion of data, namely, control messages, need to be transferred from convergence nodes to sensor nodes. Therefore, most of the data is aggregated at the base station and convergence nodes. The application correlation means that the network is full of unexpected traffic. WSN is associated with application, which means a full-time driver and periodic data inquiring. Therefore, its traffic data is cyclical. When tracking and collecting the target data, the traffic will increase sharply since a mass of data needs to be transferred in the very short period of time.
3.2. Causes of WSN Traffic Anomaly
The fact that nodes of WSN usually use radio to communicate and are deployed in an open area not only makes it vulnerable to malicious damage of people, but also brings about a series of security risks, such as disclosure of information.
Frequent attack methods, including resource depletion attack [17], sinkhole attack, and flooding attack, will cause the abnormal behavior of network traffic. The common attack methods of different layers of network are elaborated in Table 2, as well as their caused anomaly. As we can see, almost all the attacks will cause an exception. So monitoring network traffic in a network contributes to the judgment of whether the abnormality has happened and whether a network is suffering from the attack. These are in favor of making appropriate defensive measures in subsequence.
WSN traffic anomaly causes and traffic changes shape.
3.3. Definition of GM(1,1) Model
The grey systems theory, established by Julong Deng in 1982, is a new methodology that focuses on the study of problems involving small samples and poor information. It deals with uncertain systems with partially known information through generating, excavating, and extracting useful information from what is available. So, systems' operational behaviors and their laws of evolution can be correctly described and effectively monitored [18]. The grey model is abstracted from the grey system. GM(1,1) model, the simplest model of the grey model, represents a differential equation with one order and one variable. In the natural world, uncertain systems with small samples and poor information exist commonly. That fact determines the wide range of applicability of grey systems theory. GM(1,1) model has the characteristics of less data, less computation speed, accurate forecasting, and so forth. So, it is widely used in agriculture, forestry, water conservancy, energy, transportation, economy, and other fields. But, so far, no one has applied GM(1,1) model to WSN traffic anomaly detection.
Denote the original data sequence by
The 1-AGO (accumulated generating operation) formation is defined as
According to GM(1,1), we can get the following first-order grey differential equation:
Before building a grey GM(1,1) model, a proper α value is needed to be assigned for a better background value
Set
Solving the first-order grey differential equation, we can get the solution:
3.4. Prediction Steps of GM(1,1) Model
Step 1 (inspection and processing on the data sequence).
First, in order to guarantee the feasibility of the model, inspection and processing on the original data sequence are necessary.
Denote the original data sequence by
If all stepwise ratios
Step 2 (build GM(1,1) model).
Based on the data sequence which has passed inspection, GM(1,1) model can be established according to (2).
Step 3 (model checking).
(a) Residual test: set residual as
(b) Stepwise ratio deviation test: according to the stepwise ratio
If
Step 4 (predicting).
Based on GM(1,1) model which has passed the test, according to (8), we can predict the future value.
3.5. Advantages of GM(1,1) Model
Applying GM(1,1) model to detect traffic anomaly of WSN has three main advantages:
The modeling of GM(1,1) does not need a mass of data. Only four pieces of data are needed when establishing a GM(1,1) model. So GM(1,1) model can be used under the circumstances that the historical data is less and the integrity of sequence is poor. Using differential equation to build the model can fully tap the essence of the system and has a higher accuracy. It is quite suitable for the real-time traffic anomaly detection of WSN in which the energy and capability in calculation of the node are limited.
4. Improvement of Exploitation of GM(1,1) Model
4.1. Using a Sliding Window to Determine Historical Data for Modeling
The historical data, which is used to build GM(1,1) model and predict future data, is quite short. In order to ensure the real time and accuracy of the model, we design a fixed-size sliding window, which should be as short as possible under the premise of high accuracy. In addition to ensuring the real time of the model, this will also guarantee the effectiveness of the latest historical data. Therefore, more accurate predicative data (reasonable network traffic expectation) can be got.
4.2. Optimizing Initial Value of One-Order Grey Differential Equation
In the traditional GM(1,1) model, the first piece of data of historical data is used as the initial condition for first-order grey differential equation. But, in fact, the cognitive function of the new information is greater than the cognitive function of the old information. Therefore, in order to make GM(1,1) model more accurate, the last piece of data of historical data is used as the initial condition of the GM(1,1) model. That is to say, set the last piece of data as
4.3. Making Traffic Prediction by Short Step Exponential Weighted Average Method
The short step exponential weighted average method, which is mainly divided into two parts, short step prediction and predicted traffic value weighted average, is a vital step to perceive WSN traffic anomaly. To a certain degree, the method brings down the accuracy. However, it improves the capability of judging abnormal traffic.
Correlation exists in between data at different times. The shorter the interval between them is, the greater their relevance is; conversely, the longer the interval between them is, the smaller their relevance is. Therefore, when using several time series data as sample data to make traffic prediction, it has higher accuracy making shorter step forecast and lower accuracy making longer step forecast [16]. For GM(1,1) model, when
According to the analysis above, when
For the purpose of making detecting traffic anomalies easier, short step exponential weighted average method is brought in normal traffic. It is shown in Figure 1 and described in the following:
Using the data in the sliding window to establish the model, predicting the following L-step, and saving predictive values in corresponding position of timetable (column coordinate corresponds to different time). Producing a final determination value by making exponential weighted average on L values in the same column of timetable.

Exponential weighted average method.
4.4. Judging Whether the Traffic of the Next Moment Is Abnormal by Euclidean Distance
In traditional judgment method, relative error method is often used. But its effect is not ideal. So, we propose the Euclidean distance method. Set two W-size data sequences as
If we consider the final determination value sequence as
5. Design and Implementation of Traffic Anomaly Detection in WSN
Based on the improved exploitation of GM(1,1) model mentioned in the last two sections, a complete anomaly detection algorithm for WSN is designed. Furthermore, we introduce another traffic anomaly determination mechanism to assist anomaly detection. That is, first detected traffic anomaly value is regarded as a reference. Then, if the traffic is still fluctuating around the reference traffic value within the relative error judging threshold in this continuous time, it is considered abnormal and we send out warning signals.
The whole improved GM(1,1)-based traffic anomaly detection algorithm for WSN is described in Figure 2.

Flow chart of the whole proposed algorithm.
6. Simulations and Results Analysis
In this section, a simulated and a part of real WSN traffic data consisting of humidity measurement collected during 6-hour period at intervals of 5 seconds in 2010 gathered from the University of North Carolina are used to carry out simulations. We all set sliding window to 5 steps and prediction length to 3 steps. As for the Euclidean distance D, which depends on different WSN traffic properties, we consider W as 5 and choose 0.05 on simulation for simulated WSN traffic and 2.5 for real WSN traffic. In the end, the simulation results are shown in Figures 3(c) and 4(c). We also display the results by traditional GM(1,1)-based algorithm in Figures 3(b) and 4(b) as comparison.

Simulation results on simulated WSN traffic.

Simulation results on real WSN traffic.
From the simulation results, we could clearly see that a smoother predictive curve is obtained. This reflects the “inertia” (stability) of normal traffic. Consequently, when an exception takes place, in order to better detect the occurrence of abnormal traffic, the model will not quickly adapt to the abnormality. And the delay mechanism can well contribute to the detection of anomaly and send out an alert. As shown in Figures 3 and 4, compared with traditional methods, the improved algorithm raises the correct detection rate considerably, but the incorrect detection rate remains at quite low level. Therefore, the improved GM(1,1)-based algorithm outperforms the traditional GM(1,1)-based algorithm.
To clarify the conclusion from some measures, true positive (TP), false positive (FP), true negative (TN), and false negative (FN) are defined and explained in Table 3. Actually, positive/negative means that the model predicts that the data is abnormal/normal and true/false means that the prediction is right/wrong.
Definition of TP, FP, TN, and FN.
Now, we use the terms of false positive rate (FPR) and false negative rate (FNR) to measure traditional and improved GM(1,1)-based algorithm. FPR and FNR are explained in the following formulas:
Detection capabilities of different algorithms.
The results show that while FPR maintains 0, the improved algorithm sharply lowers the FNR, meaning reducing the undetected rate; thus, it improves the detection accuracy. Particularly note that the results shown in Table 4 were got from our implemented simulations. Different embodiments could get slightly variant consequence, but they all hold the same trend.
7. Conclusions
In this paper, we introduce the traffic anomaly detection technique in WSN and GM(1,1) model in detail. Then, through model improvements analysis and algorithm design, an improved GM(1,1)-based traffic anomaly detection algorithm for WSN is proposed. Finally, we use Matlab to simulate this algorithm and the simulation results demonstrate that this algorithm can reduce the undetected rate and improve the detection accuracy. In addition, this algorithm requires less computation and is efficient. So it is quite suitable for the real-time traffic anomaly detection of WSN in which the energy and capability in calculation of the node are limited.
Footnotes
Competing Interests
The authors declare that they have no competing interests.
Acknowledgments
This work is partly supported by the Chengdu Science and Technology Project (2014-HM01-00310-SF), the Information Technology Research Projects of Ministry of Transport of China (2014 364X14 040), and the National Natural Science Foundation of China (61104042 and 61273235).
