Sage Journals: Discover world-class research

Abstract

We investigate three aspects of dynamicity in ad hoc and wireless sensor networks and their impact on the efficiency of intrusion detection systems (IDSs). The first aspect is magnitude dynamicity, in which the IDS has to efficiently determine whether the changes occurring in the network are due to malicious behaviors or or due to normal changing of user requirements. The second aspect is nature dynamicity that occurs when a malicious node is continuously switching its behavior between normal and anomalous to cause maximum network disruption without being detected by the IDS. The third aspect, named spatiotemporal dynamicity, happens when a malicious node moves out of the IDS range before the latter can make an observation about its behavior. The first aspect is solved by defining a normal profile based on the invariants derived from the normal node behavior. The second aspect is handled by proposing an adaptive reputation fading strategy that allows fast redemption and fast capture of malicious node. The third aspect is solved by estimating the link duration between two nodes in dynamic network topology, which allows choosing the appropriate monitoring period. We provide analytical studies and simulation experiments to demonstrate the efficiency of the proposed solutions.

1. Introduction

Multihop ad hoc wireless networks are a set of nodes equipped with wireless interfaces, and data are forwarded through multiple nodes to reach the intended destinations. They include many types of networks such as mobile ad hoc networks (MANETs) [1], wireless sensor networks (WSNs) [2], and vehicular ad hoc networks (VANETs) [3].

In the last decade, there has been a substantial research in the area of security in ad hoc and wireless sensor networks [4, 5]. The security solutions have been designed with the goal of protecting the networks against some attacks such as selective forwarding, black hole, wormhole, sinkhole, and energy exhausting attack. Prevention mechanisms like key management and authentication, which represent the first line of defense, are not sufficient to provide an efficient security solution. Therefore, there is a need to deploy a second line of defence named intrusion detection system (IDS).

In general, intrusion detection systems are divided into two major approaches: misuse detection and anomaly detection [6]. Misuse detection performs signature analysis by comparing on-going activities with patterns representing known attacks, and those matched are labeled as intrusive attacks. The misuse approach is showing its limits as it cannot detect new attacks. Anomaly detection, on the other hand, builds profile of normal behavior and attempts to identify the patterns or activities that deviate from the normal profile. The main advantage of anomaly detection is that it can detect unknown attacks.

The detection model that we consider in the ad hoc and wireless sensor network is as follows. The IDS is implemented in a distributed manner; each node can act as a monitoring node that observes the behavior of its neighbors. Each observation lasts for a monitoring time interval of duration Δ, called the monitoring period. The IDS can judge whether the monitored node is normal or anomalous after one or multiple consecutive observations.

Although intrusion detection systems have received considerable attention in ad hoc and wireless sensor networks [7, 8], to the best of our knowledge, there are no studies on the impact of network dynamicity on IDS efficiency and how the IDS can react or adapt to these changes.

In this paper, we investigate the following three aspects of behavioral dynamicity that occur in the network and can negatively affect the IDS performance and efficiency.

(i)

Magnitude Dynamicity. Due to change of user requirements, a node changes the rate at which it generates data. For instance, a legitimate user wants to change (i.e., increase/decrease) the data collection rate received at the sink node. The challenge facing the IDS here is to be efficient at detecting attacks and distinguishing between the changes due to normal behaviors and the changes due to malicious attacks.

(ii)

Nature Dynamicity. In some detection models, a monitoring node has to observe the behavior of the monitored node during a set of consecutive monitoring periods before judging whether the monitored node is malicious or not. A monitored node might evade from IDS detection and confuses it by switching continuously its behavior between normal and anomalous. In this case, the malicious node strives to cause network disruption without being detected by the IDS.

(iii)

Spatiotemporal Dynamicity. The IDS detection mechanism is based on collecting a set of consecutive observations about the monitored node. An IDS is able to observe the behavior of the monitored node if the latter stays within the monitoring node's transmission range for a duration exceeding Δ. By knowing this fact, a malicious node can evade IDS detection by moving around in the network at a speed, which prevents it from being within the monitoring node's transmission range for a duration higher than Δ.

In this paper, we propose a solution for each aspect of dynamicity mentioned above. The contributions of the paper are threefold. Firstly, the magnitude dynamicity aspect is solved by defining a normal profile based on the invariants derived from the normal node behavior. This is achieved by generating a dependency graph consisting of strongly correlated features and then derives the high-level features from the graph. The high-level features are obtained by applying the divide-and-conquer strategy on the maximal cliques algorithm and the maximum weighted spanning tree algorithm. Secondly, to handle nature dynamicity aspect, we adopt the carrot and stick strategy (i.e., reward generously and punish severely) to prevent a malicious node from evading the IDS. To do so, we propose an adaptive reputation fading strategy to allow fast redemption and fast capture of malicious node. Thirdly, we use statistical analysis to estimate the link duration between two nodes in dynamic network topology. Based on this estimation, the monitoring node chooses the appropriate monitoring period, which allows it to observe the monitored node's behavior.

The rest of the paper is organized as follows. In Section 2, we describe the normal profile construction and the feature selection method. Section 3 presents the adaptive reputation fading strategy. In Section 4, we analyze link-node duration in a mobile wireless network and explain how the monitoring time period is estimated. Finally, Section 5 concludes the paper.

2. Magnitude Dynamicity

2.1. Background

2.1.1. One-Feature Profile

In the one-feature profile, we use a single feature to describe and detect anomalous behavior. To detect the network malicious behavior, a node can measure the following features, as shown in Table 1 [9]. The disadvantage of this profile structure is that there is a need to assign one feature for each known attack. In this case, the IDS has to measure each feature and check whether it has anomalous value. When the number of attacks increases, the detection speed of the IDS becomes slow. It also becomes slower when the size of rule set increases.

Table 1

Relation between attacks and features.

Attack	Feature
Packet sending rate	Energy exhausting attack
Packet dropping rate	Selective forwarding and black hole attacks
Packet receiving rate	Sinkhole attack
Packet sending power	Hello attack, wormhole attack

The one-feature profile might fail at distinguishing between normal and anomalous behaviors. Figure 1 shows that using some features individually to describe normal behavior is misleading and might make the detection system falsely accuse a legitimate node of being malicious. Figure 1(a) depicts a tree-based wireless network rooted at the sink B and it shows the normal traffic rates of the network. The value above each link indicates the flow rate traversing this link. Each node measures the flow rate coming from its upstream neighbors. Figure 1(b) (resp., Figure 1(c)) shows the state of the network when nodes D, H, and K become compromised and start behaving maliciously by dropping some packets (resp., generating more packets). As D, H, and K reduce (resp., increase) their sending rate, their respective downstream neighbors I and L have also to reduce (resp., increase) their sending rate accordingly. As a result, node B will falsely accuse nodes I and L of performing selective forwarding attack (resp., energy exhaustion attack), and hence a high false positive rate will be observed.

Figure 1

Impact of feature choice on false positive rate.

2.1.2. Multifeature Profile

In the multifeature profile, we describe the normal behavior by a d-feature vector and each element of the vector represents a feature. In this way, the IDS can determine whether some features together show an anomalous behavior. Experiments have shown that we can obtain better detection accuracy by combining related features rather than individually [10]. If node B in the example of Figure 1 considers two features: (a) the flow entering the monitored node and (b) the flow leaving the monitored node, it will conclude that nodes I and L are just forwarding what they received from their upstream neighbors, and hence they are not malicious.

Loo et al. [11] group the observed data into clusters and use a profile of 12 features to describe normal profile. To check whether a test instance belongs to a given cluster, they measure the Euclidian distance between the test point and the centroid of the cluster. If such a distance is higher than a threshold distance, the test point is considered anomalous. The following example shows that the Euclidian distance between two d-feature profiles reduces the detection accuracy. Let $(f t_{1}, f t_{2})$ be a vector profile such that each feature of the vector is used to detect one attack. $f t_{1}$ and $f t_{2}$ take values in $[0,10]$ . Let $(10,10)$ be the centroid vector. The first and the second attacks are detected when $f t_{1} \leq 7$ and $f t_{2} \leq 6$ , respectively. We take the distance between $(10,10)$ and $(7,6)$ , which is 5, as the threshold distance. Let a test vector be $(6,10)$ ; the distance between the two vectors is 4, which is lower than the distance threshold. In this case, the test point will be considered normal whereas the value of $f t_{1}$ individually indicates the occurrence of an attack. The above example shows that aggregating features through the use of Euclidian distance result in loss of detection accuracy.

In [9, 12], the normal profile of a monitored node i is defined by a q-feature vector $f_{i} = (f t_{i 1}, \dots, f t_{i q})$ . If a node monitors a set of n nodes, it forms a matrix $F = (f_{1}, \dots, f_{n})^{T}$ . Both schemes assume that all feature vectors $f_{i}$ follow the same multivariate normal distribution with mean μ and variance-covariance matrix $M$ . Node i is considered suspicious if the Mahalanobis distance between $f_{i}$ and the center of the set F is greater than a predefined threshold. The authors of both works use the orthogonalized Gnanadesikan-Kettenring estimation to find the center of the set F. Let $\hat{μ}$ and $\hat{M}$ denote the simple mean and the simple variance-covariance of F such that $\hat{μ} = (1 / n) \sum_{i = 1}^{n} ‍ f_{i}$ and $\hat{M} = (1 / (n - 1)) \sum_{i = 1}^{n} ‍ (f_{i} - \hat{μ}) (f_{i} - \hat{μ})^{T}$ . The Mahalanobis distance between $f_{i}$ and the vector $\hat{μ}$ is given by $\sqrt{(f_{i} - \hat{μ})^{T} {\hat{M}}^{- 1} (f_{i} - \hat{μ})}$ . The Mahalanobis distance differs from the Euclidian distance in that it takes into account the correlations between features. In [12], nodes are evaluated in terms of packet dropping rate, packet sending rate, forwarding delay time, and node readings. In [9], the attacks are detected by monitoring packet sending rate, packet dropping rate, packet mismatch rate, packet receiving rate, and received signal strength. As stated in [13], the works of [9, 12] suffer from two major criticisms: (1) the circumstances, under which the assumption of multivariate normal distribution holds, are not explained, and (2) the network features such as packet sending, packet dropping, and packet receiving rates do not follow the normal distribution for tree-based routing protocol.

2.2. Profile Construction Based on Strongly Correlated Features

When it comes to comparing distances, we find that the Mahalanobis distance is a powerful technique as it takes the covariances into account, which leads to elliptic decision boundaries in the 2D space. While the Euclidean distance builds circular boundaries and considers equal variances of the features, it appears that the Mahalanobis distance is more appropriate for multivariate data.

In our paper, we take a novel approach to select relevant features and construct the normal profile vector. We do not assume multivariate normal distribution and we feed only strongly correlated features to the distance measure unlike the Mahalanobis distance, which considers correlation between all features.

In the training phase, we investigate the significant associations between features. We are interested in identifying the level of correlation between those features, called Pearson's correlation coefficient, which measures the strength of the linear association between features. Pearson's correlation coefficient between two feature vectors X and Y is defined by

\begin{matrix} ρ (X, Y) = \frac{COV (X, Y)}{σ_{X} σ_{Y}} = \frac{E [(x - μ_{X}) (y - μ_{Y})]}{σ_{X} σ_{Y}}, \end{matrix}

(1)

where $μ_{X}$ (resp., $μ_{Y}$ ) and $σ_{X}$ (resp., $σ_{Y}$ ) are the mean and standard deviation values of feature X (resp., feature Y). If $ρ (X, Y) = 1$ , then X and Y have a linear correlation. If $0.7 \leq ρ (X, Y) < 1$ , then X and Y have a strong linear correlation, if $0.5 \leq ρ (X, Y) < 0.7$ , then X and Y have a modest linear correlation, and if $0 \leq ρ (X, Y) < 0.5$ , then X and Y are said to have a weak linear correlation.

The Pearson correlation indicates to what extent variables show a linear relationship (correlation) among them. The correlation takes its values in the range from −1 to +1. The extreme value +1 (resp., −1) informs about a perfect direct/increasing linear relationship (resp., inverse/decreasing). Indeed, strong relationship between variables is reflected by values close to the limits ( $- 1 \leq ρ \leq - 0.9$ or $0.9 \leq ρ \leq + 1$ ) [14]. Pearson's correlation coefficient takes value 0 if we are in presence of independent variables. However, the reverse is not true since this coefficient deals only with figuring out linear dependencies between variables.

In our approach, we first use the training dataset F represented by $n \times d$ . F consists of n profile instances $f_{i}$ such that $i = 1, \dots n$ , and each $f_{i} = (f t_{i 1}, \dots, f t_{i d})$ . From F, we construct a correlation matrix Ω. The latter is a $d \times d$ matrix, where $Ω_{i j} \in R$ and $- 1 \leq Ω_{i j} \leq + 1$ :

\begin{matrix} Ω = (\begin{pmatrix} Ω_{11} & Ω_{12} & \dots & Ω_{1 d} \\ Ω_{21} & Ω_{22} & \dots & Ω_{2 d} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ Ω_{d 1} & Ω_{d 2} & \dots & Ω_{d d} \end{pmatrix}) . \end{matrix}

(2)

We consider the set of d feature vectors ${F_{1}, \dots, F_{d}}$ such that $F_{i} = (\begin{smallmatrix} f t_{1 i} \\ ⋮ \\ f t_{n i} \end{smallmatrix})$ . For each pair of features $(F_{i}, F_{j})$ , we compute $Ω_{i j} = ρ (F_{i}, F_{j})$ . Then, we derive a weighted graph $G = (V, E, w)$ from matrix Ω, defined as follows:

(i)

$V = {v_{1} \dots v_{d}}$ , the set of vertices (features) where $|V| = d$ ;

(ii)

$E = {(v_{i}, v_{j})$ , where $Ω_{i j} \neq 0}$ and $|E| = m$ ;

(iii)

$w (v_{i}, v_{j}) = w_{i j} = Ω_{i j}$ .

A subgraph $G^{[Th]} = (V^{[Th]}, E^{[Th]}, w^{[Th]})$ is then induced from the graph G, where $0 < Th \leq 1$ , by removing all the edges $(v_{i}, v_{j})$ whose $w_{i j} < Th$ . $G^{[Th]}$ is defined as follows:

(i)

$E^{[Th]} = {(v_{i}, v_{j}), where w_{i j} \geq Th}$ ;

(ii)

$V^{[Th]} = {x \in V : \exists y \in V and (x, y) \in E^{[Th]}}$ , $| V^{[Th]} | \leq d$ ;

(iii)

$w_{i j}^{[Th]} = w_{i j}$ .

The induced graph $G^{[Th]}$ from G might be composed of a set of disjoint connected partitions. The more the $Th$ is close to 1, the stronger the correlations exist in $G^{[Th]}$ .

We aim at finding the set of features that increase and decrease altogether in order to avoid the missed detection problem as in [11]. The best way to do so is to extract from G the set of cliques composed of strongly correlated features. One of the widely adopted solutions [15] to compute maximal cliques in an arbitrary graph of d vertices runs in time $O (3^{d / 3}) = O (1.4 4^{d})$ . Instead of applying the maximal cliques algorithm on graph G, we propose to adopt the divide and conquer strategy by applying this algorithm on each connected component of the subgraph $G^{[Th]}$ . A clique ${CL}_{i}^{[Th]} = (V_{i}^{[Th]}, E_{i}^{[Th]})$ ( $i \geq 1$ ) of a graph $G^{[Th]}$ is a set of vertices $V_{i}^{[Th]} \subseteq V^{[Th]}$ such that all the pairs of $C_{i}^{[Th]}$ are adjacent. This strategy significantly reduces the computational complexity to find maximal strongly correlated cliques. Let us consider that $G^{[Th]}$ is composed of d vertices belonging to a set of M connected components. Each connected component $P_{i} i = 1 \dots M$ is composed of $S_{i}$ vertices. There are α singleton vertices and β partitions with two vertices, and the rest of connected components are composed of more than two vertices. The computational complexity incurred by applying the maximal cliques algorithm on graph G is

\begin{matrix} 1.4 4^{d} = 1.4 4^{(α + 2 β + \sum_{j : S_{j} > 2}^{} S_{j})} = 1.4 4^{α} 1.4 4^{2 β} \prod_{j : S_{j} > 2} ‍ 1.4 4^{S_{j}} . \end{matrix}

(3)

By applying the same algorithm on each connected partition of $G^{[Th]}$ , we notice that there is no need to apply it on isolated vertices and the partitions of two vertices are cliques by definition, and hence we get the following computational complexity: $\sum_{j : S_{j} > 2} ‍ 1.4 4^{S_{j}}$ . It is obvious that applying the divide and conquer strategy can significantly reduce the running time of the algorithm and make it suitable for resource-constrained nodes.

Let ϕ be the set of edges belonging to all cliques in $G^{[Th]}$ , and $| ϕ | = d^{'}$ . For each edge $(F_{l}, F_{k})$ , which is the tth element of ϕ ( $t = 1, \dots, d^{'}$ ), we define a high-level feature $H_{t} = F_{l} / F_{k}$ . From the training dataset F, we derive its high-level training dataset $H F$ defined as follows; for each d-profile vector $f_{i} \in F$ , we derive its $d^{'}$ -profile high-level vector $g_{i} = (g t_{i 1}, \dots, g t_{i d^{'}})$ , such that $g t_{i t} = f t_{i l} / f t_{i k}$ and $f t_{i k} \neq 0$ . If $f t_{i k} = 0$ , the high-level vector $g_{i}$ is then removed from the training dataset $H F$ . This choice is justified by the fact that the stronger the correlation between $F_{l}$ and $F_{k}$ is, the more the data instances of $(F_{l}, F_{k})$ fall on the same straight line $F_{l} = a F_{k} + b$ , where a is the slope and b is the intercept.

The high-level features belonging to the same clique ${CL}_{i}^{[Th]}$ are grouped into a single vector $ξ_{i}$ . We consider that K cliques are obtained from $G^{[Th]}$ . Thus, the normal profile is then defined as the set of vectors $ξ_{i}$ ( $i = 1, \dots, K$ ). To further reduce the number of features in each vector $ξ_{i}$ , we apply the maximum weighted spanning tree algorithm on each clique. To do so, we apply Kruskal's algorithm originally used to obtain the minimum spanning tree by negating the weight of each edge [16]. The high-level features, whose edges do not belong to the tree, are removed from the normal profile. The resulted profile is called the minimum normal profile. The time complexity of the maximum weighted spanning tree is $O (| E_{CL} | \log | E_{CL} |)$ , where $E_{CL}$ is the number of edges in the clique. As $| E_{CL} | = | V_{CL} | (| V_{CL} | - 1) / 2$ , the time complexity becomes proportional to $O (| E_{CL} | \log | V_{CL} |)$ . As the maximal cliques algorithm, the maximum weighted spanning tree is only applied on cliques with more than two vertices. The use of maximum weighted spanning tree is justified by the fact that all the low-level features of each clique in $G^{[Th]}$ have strong correlation between them. In each clique, if X and Y are strongly correlated and Y and Z are strongly correlated, then X and Z are strongly correlated. Hence, we can remove the redundant $(X, Z)$ edge from the clique.

To illustrate further the above method, we consider an example of seven network features, namely, $F_{1}$ , $F_{2}$ , $F_{3}$ , $F_{4}$ , $F_{5}$ , $F_{6}$ , and $F_{7}$ . The correlation coefficient matrices Ω between these features are

\begin{matrix} Ω = \begin{matrix} \begin{matrix} F_{1} & F_{2} & F_{3} & F_{4} & F_{5} & F_{6} & F_{7} \end{matrix} \\ \begin{matrix} F_{1} \\ F_{2} \\ F_{3} \\ F_{4} \\ F_{5} \\ F_{6} \\ F_{7} \end{matrix}  &  (\begin{pmatrix} 1 & 0.93 & 0.97 & 0.25 & 0.73 & 0.82 & 0.98 \\ 0.93 & 1 & 0.99 & 0.81 & 0.54 & 0.62 & 0.94 \\ 0.97 & 0.99 & 1 & 0.73 & 0.87 & 0.43 & 0.95 \\ 0.25 & 0.81 & 0.73 & 1 & 0.98 & 0.52 & 0.71 \\ 0.73 & 0.54 & 0.87 & 0.98 & 1 & 0.78 & 0.60 \\ 0.82 & 0.62 & 0.43 & 0.52 & 0.78 & 1 & 0.53 \\ 0.98 & 0.94 & 0.95 & 0.71 & 0.60 & 0.53 & 1 \end{pmatrix}) \end{matrix} . \end{matrix}

(4)

According to the correlation matrix, we generate the graph $G^{[Th]}$ , where $Th > 0.9$ as shown in Figure 2(a). In the graph, there are two cliques: ${F_{1}, F_{2}, F_{3}, F_{7}}$ and ${F_{4}, F_{5}}$ .

Figure 2

Graph-based normal behavioral model.

The network normal profile is defined as ${(F_{1} / F_{2}$ , $F_{1} / F_{3}$ , $F_{1} / F_{7}$ , $F_{2} / F_{3}$ , $F_{2} / F_{7}$ , $F_{3} / F_{7})$ , $(F_{4} / F_{5})}$ . After applying the maximum weighted spanning tree algorithm, the edges $(F_{1}, F_{2})$ , $(F_{2}, F_{7})$ , and $(F_{3}, F_{7})$ are removed, and the minimum normal profile becomes ${(F_{1} / F_{3}, F_{2} / F_{3}, F_{1} / F_{7})$ , $(F_{4} / F_{5})}$ .

Proposition 1.

For any data set of d low-level features, the number of high-level features induced by the graph-based generation method is upper-bounded by $d - K$ , such that K is the number of cliques in $G^{[T h]}$ .

Proof.

Consider $V^{[Th]} \subseteq V$ ; that is, in the worst case, each low-level feature belongs to a given clique ${CL}_{i}^{[Th]}$ ( $i \geq 1$ ). As a result, $\sum_{i = 1}^{K} ‍ | V_{i}^{[Th]} | \leq d$ . It is known that the number of edges induced by executing the maximum weighted spanning tree on the clique ${CL}_{i}^{[Th]}$ is $h_{i} = | V_{i}^{[Th]} | - 1$ . As $\sum_{i = 1}^{K} ‍ (h_{i} + 1) \leq d$ , $\sum_{i = 1}^{K} ‍ h_{i} \leq d - K$ . Thus, the number of edges (i.e., high-level features) induced by executing the maximum weighted spanning tree on all the cliques of $G^{[Th]}$ is upper-bounded by $d - K$ .

2.3. Detection Process

Each node constructs its local dataset represented by $n \times d$ matrix (i.e., n vector instances and d features). It then extracts K cliques from this dataset, as shown above, as well as its minimum profile composed of K vectors $ξ_{l}$ of size $m_{l}$ , where $l = 1, \dots, K$ . The node computes the centroid vector $C_{l}$ for all the n instances of $ξ_{l}$ .

To check whether a profile Z is normal or anomalous, we derive from Z its corresponding high-level profile $H Z$ and we execute the pseudocode depicted in Algorithm 1. In the algorithm, $Dis$ denotes the Euclidian distance between two vectors. ${Low}^{l}$ and ${Up}^{l}$ denote the lowest and highest values obtained from estimating $Dis (ξ_{l}, C_{l})$ for all the n instances of $ξ_{l}$ .

Algorithm 1: Intrusion detection algorithm.

(1) Let Z be the high-level test profile composed of $Z_{l}$ vectors ( $l = 1, \dots, K$ )

(2) forAll vectors $C_{l}$ such that: $l = 1, \dots, K$ do

(3) if $(D i s (Z_{l}, C_{l}) \notin [L o w^{l}, U p^{l}])$ then

(4) return Z is anomalous

(5) end if

(6) end for

(7) return Z is normal;

2.4. Simulation Results

We study the performance of the proposed IDS using GloMoSim simulator [17]. Each node sends one packet/sec toward the sink. A watchdog is implemented at each node and its role is to monitor the network activities of all the node's neighbors. At every 10 seconds (i.e., one time period), a monitoring node i measures the feature vector of its monitored node j. After a training phase of T time periods, testing phase lasts for 1800 seconds. The role of IDS, which is implemented at a node i, is not just to detect if i's neighbor (node j) is malicious or not but also to detect if node j is malicious during a given time period. We evaluate the performance of the IDS using two metrics: detection rate and false positive rate. We select the following five quantitative features:

(i)

number of generated packets (GEN),

(ii)

number of received packets (RCV),

(iii)

number of forwarded packets (FWD),

(iv)

number of sent packets (SENT),

(v)

number of lost packets (LOSS).

We generate then the correlation matrix Ω as well as the minimum normal profile after performing the maximal cliques algorithm and the maximum weighted spanning tree algorithm as shown in Figure 3:

\begin{matrix} Ω =  \begin{matrix} \begin{matrix} GEN & RCV & FWD & SENT & LOSS \end{matrix} \\   \begin{matrix} GEN \\ RCV \\ FWD \\ SENT \\ LOSS \end{matrix} &    (\begin{pmatrix} 1 & 0.4205 & 0.4205 & 0.7263 & 0.6032 \\ 0.4205 & 1 & 1 & 0.9289 & 0.9727 \\ 0.4205 & 1 & 1 & 0.9289 & 0.9727 \\ 0.7263 & 0.9289 & 0.9289 & 1 & 0.9828 \\ 0.6032 & 0.9727 & 0.9727 & 0.9828 & 1 \end{pmatrix}) \end{matrix} . \end{matrix}

(5)

Figure 3

Normal profile and minimum normal profile.

Figure 4 shows the detection rate of the proposed IDS as a function of dropping probability. The first observation that we can draw from the figure is that the detection rate is 100% when the dropping probability is higher than 0.05, and it is under 100% when the dropping probability is $\leq$ 0.02. This can be explained as follows: under very low dropping probabilities, the malicious nodes drop packets at low intensities and their activities become unnoticeable. This happens when the dropping probability becomes very close to or less than the normal packet loss, which is at most 2% during each time period. Figure 5 shows the detection rate of the IDS as a function of training period. The results are presented under the following levels of dropping probability $P = 1,0.5,0.1, 0.05,0.01$ . The results show that the detection rate does not depend on the training period but on the dropping probability. Under high dropping probabilities, the detection rate is 100% for all the training periods. Under low dropping probabilities, the detection rate decreases as the malicious behavior becomes very close to the normal one.

Figure 4

Detection rate versus dropping probability.

Figure 5

Detection rate versus training time.

Figure 6 shows the false positive rate of IDS as a function of training period under the following levels of dropping probability $P = 0.8,0.5,0.1,0.05,0.03,0.01$ . We can notice that the false positive becomes 0 when the training period $T = 30$ for all $P > 0.02$ . At $T = 30$ , the IDS has learned all the possible instances of the normal profile and can accurately distinguish between normal and anomalous traffic. When $T < 30$ , the IDS still has not learned all the instances of the normal profile. In other words, the normal profiles, which are not observed during the training phase, will be considered anomalous during the testing phase. Thus, the false positive rate depends in this case on the number of times unlearned normal profiles are observed during the testing phase, which itself depends on the number of lost packets that are due to $(1)$ normal packet loss and $(2)$ dropping activities. As packet loss is an event that occurs randomly, the false positive curves are also random when $T < 30$ . For $P = 0.01$ , the false positive becomes $0$ only when $T = 40$ . Given that the behavior of the malicious node becomes very close to the legitimate node, the IDS needs more time to learn about new instances of the normal profile.

Figure 6

False positive rate.

3. Nature Dynamicity

3.1. Background: Constant Fading Reputation Strategy

Reputation is defined as the general opinion of a society of nodes towards a certain node in a specific domain of interest, and it is the global perception on the future behavior of this node. In the IDS based on multiple observations, the IDS collects a series of consecutive observations, each of which occurs during a separate monitoring period.

Since reputation aggregates past experiences and dynamically evolves, it is similar to Bayesian analysis, which is a statistical procedure that estimates parameters of an underlying distribution based on observations. Starting with prior distribution, which is the initial state before any observation is made, Bayesian analysis continuously takes into account new experiences and derives posterior probability [18]. One of the used distributions in Bayesian analysis is Beta distribution.

Beta distribution has been recognized as a useful formal tool to model reputation [18–20]. A reputation value assumes a tuple of $(α, β \geq 1)$ such that α and β represent positive and negative observations, respectively.

The Beta distribution and its probability density function (PDF) are defined as follows:

\begin{array}{l} B (α, β) = \int_{0}^{1} ‍ t^{α - 1} {(1 - t)}^{β - 1} d t \\ f (p ∣ α, β) = \frac{1}{B (α, β)} p^{α - 1} {(1 - p)}^{β - 1}, \\ where 0 \leq p \leq 1, α, β \geq 0 . \end{array}

(6)

The reputation, denoted by R, is defined as the expectation (denoted by $E$ ) of the Beta distribution, and it takes the following simple form:

\begin{matrix} R = E (B (α, β)) = \frac{α}{α + β} . \end{matrix}

(7)

We model the reputation of a node with a Beta distribution ( $α, β$ ). Initially, $α = 1$ and $β = 1$ .

The standard Bayesian procedure is as follows. Initially, the prior is $Beta (1,1)$ , the uniform distribution on $[0,1]$ . Then, when a new observation is made, say with n observed misbehaviors and p observed correct behaviors, the prior is updated according to $α : = α + p$ and $β : = β + n$ . The reputation relies on the node's direct observation. When the monitoring node makes one individual observation about the monitored node, it updates α and β as follows.

(i)

If the observation is qualified as misbehavior, β is set to $β + 1$ .

(ii)

If the observation is qualified as correct behavior, α is set to $α + 1$ .

The standard Bayesian method is modified in [19] to give less weight to the observations received in the past so as to allow reputation fading and prevent any node from capitalizing on its previous good behavior forever. To achieve this aim, a discount factor for past observations is used. When a new observation $(p, n)$ is made, α and β are updated as follows:

\begin{array}{l} α : = ω α + p, \\ β : = ω β + n, \\ where 0 \leq ω \leq 1 . \end{array}

(8)

The weight ω is a constant discount factor for past observations, which serves as the fading mechanism. We refer hereafter to the reputation system described above as the constant fading reputation strategy.

3.2. Adaptive Fading Reputation Strategy

The constant fading reputation mechanism uses the same discount factor for all types of observations and during all the time. The higher (resp., lower) the value of ω is, the slower (resp., quicker) the histories are forgotten. By knowing the value of ω, a malicious node can evade from IDS detection by misbehaving for a given time and goes back to normal behavior. Under high discount factor, the change of node behavior (from well-behaved to misbehaved and vice versa) will be detected after a long time. During this time, well-behaved nodes can count on their good histories and act maliciously. In addition, misbehaved nodes will have to wait a longer time to redeem themselves. On the other hand, a low discount factor permits a quicker detection redemption of nodes. However, it might raise false alarms especially when network faults and attacks both share the same failure symptoms. For instance, a misbehavior is detected if the observed node is not forwarding a packet. This rule is set to detect black hole and selective forwarding attacks. In addition, this rule is applied when packets are not forwarded due to collisions, which means that a well-behaved observed node might be falsely considered malicious.

To deal with this issue, we propose an adaptive fading reputation mechanism. This mechanism uses the carrot and stick strategy; that is, reward the well-behaved node and punish the misbehaved node. The adaptive mechanism uses two types of discount factors: one for past positive observations and the second one for past negative observations. The value of the discount factors is adjusted as function of reputation R as shown in Figure 7.

Figure 7

Positive and negative discount factors.

In the adaptive fading reputation mechanism, when a new observation $(p, n)$ is made, α and β are updated as follows:

\begin{array}{l} α : = ψ (R) α + p, \\ β : = φ (R) β + n, \\ where 0 \leq ψ (R), φ (R) \leq 1 . \end{array}

(9)

$ψ (R)$ and $φ (R)$ denote the discount factors for past positive and negative histories, respectively, whose values fall into the range of $[0,1]$ . According to the value of R, a reputation system executes the following two fading strategies.

(i)

Reward Strategy. It is applied when the reputation $R \geq th$ , such that $th \in [0,1]$ . The IDS forgets the negative history more quickly than the positive one (i.e., $ψ (R) > φ (R)$ ); this strategy is used when a node is well-behaved.

(ii)

Punishment Strategy. It is applied when the reputation $R < th$ . The IDS forgets the positive history more quickly than the negative one (i.e., $ψ (R) < φ (R)$ ); this strategy is used when a node is misbehaved.

Formally, $ψ (R)$ and $φ (R)$ are written as follows:

\begin{matrix} ψ (R) = \{\begin{cases} (\frac{{PR}_{\max} - {PR}_{\min}}{1 - t}) R + \frac{{PR}_{\min} - {PR}_{\max} \times t}{1 - t}, \\ when R \geq t, \\ (\frac{{PP}_{\max} - {PP}_{\min}}{t}) R + {PP}_{\min}, \\ when R < t, \end{cases} \\ φ (R) = \{\begin{cases} ({PR}_{\max} + {NR}_{\min}) - ψ (R), & when R \geq t, \\ ({NP}_{\max} + {PP}_{\min}) - ψ (R), & when R < t, \end{cases} \end{matrix}

(10)

where ${PR}_{\max}$ and ${PR}_{\min}$ are the upper and the lower bounds of the positive discount factor, respectively, under reward strategy. ${NR}_{\max}$ and ${NR}_{\min}$ are the upper and the lower bounds of the negative discount factor, respectively, under reward strategy. ${PP}_{\max}$ and ${PP}_{\min}$ are the upper and the lower bounds of the positive discount factor, respectively, under punishment strategy. ${NP}_{\max}$ and ${NP}_{\min}$ are the upper and the lower bounds of the negative discount factor, respectively, under punishment strategy.

For new nodes, positive and negative histories are kept with a discount factor equal to 1 when the number of observations is less than a given value named experience threshold.

From the above upper and lower bounds, we define the following two distance metrics.

(i)

Punish-to-Reward (PTR) Distance. It is defined by ${PR}_{\min} - {PP}_{\max}$ , and it shows to what extent the node is rewarded by the IDS when it transits from the misbehaved state to the well-behaved state; that is, the higher the PTR is, the slower the positive histories are forgotten.

(ii)

Reward-to-Punish (RTP) Distance. It is defined by ${NP}_{\min} - {NR}_{\max}$ , and it shows to what extent the node is punished by the IDS when it transits from the well-behaved state to the misbehaved state; that is, the higher the RTP is, the slower the negative histories are forgotten.

3.3. Performance of Adaptive Discount Factor Strategy

We evaluate the performance of the constant and adaptive discount factor strategies in terms of detection time. To do so, we implement three behavioral models.

(i)

Deterministic redemption model: in this model, a node with reputation $R = 0$ behaves correctly in the network.

(ii)

Deterministic evasion model: in this model, a node with reputation $R = 1$ behaves maliciously in the network.

(iii)

Probabilistic evasion model: the node's behavior is modeled with a two-state Markov chain as depicted in Figure 8. In state N, the node is well-behaved, and, in state M, the node is misbehaved. Initially, the node's reputation $R = 1$ . The node transits towards state N with probability $P_{N}$ and towards state M with probability $P_{M}$ , such that $P_{N} + P_{M} = 1$ . $P_{M}$ is called the evasion probability. The time spent in state N and state M is the monitoring time period.

Figure 8

Probabilistic evasion model.

The parameters for the experiment are shown in Table 2. We define three settings for the adaptive fading reputation.

(i)

Setting 1: $PTR$ and $RTP$ are high; for example, they equal 0.7.

(ii)

Setting 2: $PTR$ and $RTP$ are medium; for example, they equal 0.3.

(iii)

Setting 3: $PTR$ and $RTP$ are low; for example, they equal 0.1.

Table 2

Experiment parameters.

Parameter	Setting 1	Setting 2	Setting 3
$N P_{\max}$ , $P R_{\max}$	$1$	$1$	$1$
$P R_{\min}$ , $N P_{\min}$	$0.9$	$0.9$	$0.9$
$P P_{\max}$ , $N R_{\max}$	$0.2$	$0.6$	$0.8$
$N R_{\min}$ , $P P_{\min}$	$0.1$	$0.5$	$0.7$
ω	$0.2,0.5,0.8$
$t h$	$0.5$

As for constant fading reputation, we define three levels of discount factor $ω = 0.2,0.5,0.8$ .

We study the evolution of reputation over time when applying constant and adaptive discount factor. In Figure 9(a), the convergence time increases as ω increases. This is because higher (resp., lower) values of ω mean that the negative histories are forgotten at slower (resp., faster) rate, which leads to longer (resp., shorter) time to converge to $R = 1$ . In Figure 9(b), we observe that the deterministic redemption model under adaptive discount factor strategy requires less converge time than the constant one. It ranges between 3 and 9 observations under setting 1 and setting 3, respectively. The reason for this is that a node under setting 1 is rewarded more generously as long as it is well-behaving; that is, its positive histories are forgotten slower than those of setting 2 and setting 3.

Figure 9

Deterministic redemption model.

In Figure 10, we also notice that the malicious node that follows the deterministic evasion is detected more quickly when the adaptive discount factor strategy is applied. The time to converge to $R = 0$ is between 3 and 9 observations under the adaptive discount factor strategy and between 4 and 14 observations under the constant discount factor strategy. For instance, let $R = 0.1$ be the boundary between malicious behavior and normal behavior; the malicious node can evade IDS detection for a time required to collect only 3 observations if the IDS adopts the adaptive discount factor strategy under setting 3. Under the constant discount factor strategy and if $ω = 0.8$ , IDS can detect the malicious after a time period of 5 observations.

Figure 10

Deterministic evasion model.

By knowing the required number of observations to detect a malicious node, the latter can adopt the probabilistic evasion model, which do discontinuous harm to the network to confuse the IDS and hence evade detection. Figures 11, 12, and 13 show that the adaptive discount factor strategy can quickly detect this type of behavior. In the figures, we consider that a node is malicious when $R = 0.1$ . When the evasion probability $P_{M} = 0.5$ , the adaptive strategy succeeds at detecting the malicious node after a time between 2 and 37 observations. On the other hand, the malicious node can evade the IDS adopting the constant strategy for a time of 751 observations when $ω = 0.8$ . This value decreases to 10 and 2 when $ω = 0.5$ and $ω = 0.2$ , respectively. When $P_{M} = 0.6$ , the detection time decreases to 40 and 27 under $ω = 0.8$ and setting 3, respectively. When $P_{M}$ is between $0.7$ and $0.9$ , the adaptive strategy (resp., constant strategy) achieves a detection time between 2 and 4 (resp., between 2 and 5) observations.

Figure 11

Probabilistic evasion model ( $P_{M} = 0.5$ ).

Figure 12

Probabilistic evasion model ( $P_{M} = 0.6$ ).

Figure 13

Probabilistic evasion model ( $P_{M} = 0.7,0.8,0.9$ ).

4. Spatiotemporal Dynamicity

A monitoring node i can make at least one observation about a monitored node j if the wireless link lasts for a duration higher than the monitoring period Δ. The malicious node j, which knows this fact, can move around in the network to create links with its neighbors of duration less than Δ.

As shown in Figure 14, the nodes start operating at time $t_{0}$ . A wireless link between the monitoring node i and monitored node j is created at time $t_{1}$ when node j comes within the transmission range of node i. Node i loses its link with node j either $(1)$ when node j moves out of the transmission range of node i at time $t_{2}$ or $(2)$ when node j runs out of its battery power at time $t_{3}$ . Therefore, node i estimates the link-node lifetime by the following equation: $\min (t_{2} - t_{1}, t_{3} - t_{1})$ . $(t_{2} - t_{1})$ is the estimation of the link lifetime and $(t_{3} - t_{1})$ is the residual node lifetime after node j has been in existence for $(t_{1} - t_{0})$ time units.

Figure 14

Link-node lifetime.

In this section, we statistically analyze the link-node distribution. Based on this analysis, we choose appropriate values for the monitoring period so that the mobile monitored node cannot evade IDS detection. We use the random waypoint mobility model, in which each mobile node randomly selects a location, within an area of 100 m × 100 m, with a random speed uniformly distributed between 0 and a certain maximum speed $V_{\max}$ ; then it stays stationary during a pause time of $1$ second before moving to a new random location. In our analysis, we consider two different numbers of nodes ( $NN$ ), that is, 10 and 20 nodes.

4.1. Link Lifetime Distribution

We obtain from our simulation the frequency of link durations and plot them into a histogram, as shown in Figures 15 and 16. The EasyFit software [21, 22] is used to measure the compatibility of a random sample with the theoretical probability distribution functions. As shown in the figures, the software approximates the simulation data to a Weibull distribution [23] with two parameters $α = 1.031$ and $β = 28.74$ (resp., $α = 1.029$ and $β = 32.85$ ) when $V_{\max} = 20$ and $NN = 10$ (resp., $NN = 20$ ). Weibull distribution has a PDF as shown in the following equation:

\begin{matrix} f (x; α, β) = \frac{α}{β} {(\frac{x}{β})}^{α - 1} e^{- {(x / β)}^{α}} . \end{matrix}

(11)

Figure 15

Link lifetime distribution under $NN = 10$ and $V_{\max} = 20$ .

Figure 16

Link lifetime distribution under $NN = 20$ and $V_{\max} = 20$ .

Based on the properties of the Weibull distribution, the mean (expected value) is

\begin{matrix} Mean = β \times Γ (\frac{α + 1}{α}) . \end{matrix}

(12)

On the other hand, Samar and Wicker [24, 25] have described the expected link lifetime as a function of node velocity, say $v_{1}$ , with the following equation:

\begin{array}{l} F_{link}^{v_{1}} = \frac{R}{2 (b - a)} \\ \times (\int_{π}^{0} ‍ \log |\frac{b + \sqrt{b^{2} - v_{1}^{2} \sin^{2} ϕ}}{v_{1} + v_{1} \cos ϕ}| d ϕ \\ - \int_{π}^{ϕ_{0}} ‍ \log |\frac{a + \sqrt{a^{2} - v_{1}^{2} \sin^{2} ϕ}}{a - \sqrt{a^{2} - v_{1}^{2} \sin^{2} ϕ}}| d ϕ), \end{array}

(13)

where R is the radius of the circle centered at the node. $v_{1}$ is uniformly distributed between a and b, expressed in meters/second. ϕ is the direction of motion: $ϕ_{0} = π - \sin^{- 1} (a / v_{1})$ .

Since (12) and (13) are both describing the expected value of the link lifetime, we can write

\begin{array}{l} β Γ (\frac{α + 1}{α}) = \frac{R}{2 (b - a)} \\ \times (\int_{π}^{0} ‍ \log |\frac{b + \sqrt{b^{2} - v_{1}^{2} \sin^{2} ϕ}}{v_{1} + v_{1} \cos ϕ}| d ϕ \\ - \int_{π}^{ϕ_{0}} ‍ \log |\frac{a + \sqrt{a^{2} - v_{1}^{2} \sin^{2} ϕ}}{a - \sqrt{a^{2} - v_{1}^{2} \sin^{2} ϕ}}| d ϕ) . \end{array}

(14)

We derive then β as a function of velocity $v_{1}$ as follows:

\begin{array}{l} β = \frac{R}{2 (b - a) Γ ((α + 1) / α)} \\ \times (\int_{π}^{0} ‍ \log |\frac{b + \sqrt{b^{2} - v_{1}^{2} \sin^{2} ϕ}}{v_{1} + v_{1} \cos ϕ}| d ϕ \\ - \int_{π}^{ϕ_{0}} ‍ \log |\frac{a + \sqrt{a^{2} - v_{1}^{2} \sin^{2} ϕ}}{a - \sqrt{a^{2} - v_{1}^{2} \sin^{2} ϕ}}| d ϕ) . \end{array}

(15)

Simulations have been conducted to compare between the theoretical β obtained from (15) and the Weibull approximative one obtained from simulations, as shown in Table 3. The results show that the Weibull distribution fits well simulation data.

Table 3

Comparison between theoretical and approximative β.

Number ofnodes ( $NN$ )	Node velocity(m/s)	Approximative β	Theoretical β
10	20	28.74	28.36
	15	35.53	35.83
	10	53.63	50.17
	5	88.20	88.55

20	20	34.57	32.85
	15	40.04	39.44
	10	56.07	52.29
	5	84.50	80.386

4.2. Residual Node Lifetime Distribution

We assume that the node lifetime follows an exponential distribution with a parameter λ. This distribution is similar to the one used to model “time to failure” in reliability engineering. We consider that λ is the rate at which node's battery is discharged. The probability density function is then

\begin{matrix} f (t) = \{\begin{cases} 0, & if t < 0, \\ λ e^{- λ t}, & if t \geq 0 . \end{cases} \end{matrix}

(16)

The probability density function of the residual node lifetime for a node of age a is given by the following equation [26]:

\begin{matrix} r_{a} (t) = \frac{f (t + a)}{1 - F (a)} = λ e^{- λ t}, \end{matrix}

(17)

where F is the cumulative density function (CDF) of the exponential distribution. Thus, the residual node lifetime also follows an exponential distribution. The expected value for the random variable X following an exponential distribution is

\begin{matrix} E (X) = \frac{1}{λ} . \end{matrix}

(18)

4.3. Link-Node Lifetime Distribution

Consider a random variable Z, where $Z = \min (X, Y)$ . X (resp., Y) is a random variable related to link lifetime (resp., residual node lifetime) following a Weibull distribution (resp., exponential distribution) with a joint cumulative distribution function $I_{X, Y} (x, y)$ . Then, since X and Y are independent, we have

\begin{matrix} P (Z > t) = P (\min (X, Y) > t) = P (X > t, Y > t) . \end{matrix}

(19)

Therefore,

\begin{matrix} P (Z > t) = 1 - P (X \leq t) - P (Y \leq t) + P (X \leq t, Y \leq t) . \end{matrix}

(20)

Consequently, the cumulative distribution function (CDF) of Z is

\begin{array}{l} H_{Z} (t) = 1 - P (Z > t) \\ = P (X \leq t) + P (Y \leq t) - P (X \leq t, Y \leq t) . \end{array}

(21)

Thus,

\begin{matrix} H_{Z} (t) = F_{X} (t) + G_{Y} (t) - I_{X, Y} (t, t) . \end{matrix}

(22)

The approximated density function for the combined variables X and Y is a Phased Bi-Weibull distribution [27], which has a PDF as shown in

\begin{matrix} g (t) = \{\begin{cases} \frac{α_{1}}{β_{1}} {(\frac{t - γ_{1}}{β_{1}})}^{α_{1} - 1} e^{- {((t - γ_{1}) / β_{1})}^{α_{1}}} & if γ_{1} \leq t \leq γ_{2} \\ \frac{α_{2}}{β_{2}} {(\frac{t - γ_{2}}{β_{2}})}^{α_{2} - 1} e^{- {((t - γ_{2}) / β_{1})}^{α_{2}}} & if γ_{2} < t < \infty . \end{cases} \end{matrix}

(23)

EasyFit software [22] approximates the simulation data to the Phased Bi-Weibull distribution, as shown in Figure 17 (resp., Figure 18), with parameters $α_{1} = 0.87118$ , $β_{1} = 19.482$ , $γ_{1} = 0$ , $α_{2} = 0.68969$ , $β_{2} = 31.875$ , and $γ_{2} = 3$ (resp., $α_{1} = 0.90481$ , $β_{1} = 22.976$ , $γ_{1} = 0$ , $α_{2} = 0.71509$ , $β_{2} = 14.819$ , and $γ_{2} = 4$ ).

Figure 17

Link-node lifetime distribution under $NN = 10$ and $V_{\max} = 20$ .

Figure 18

Link-node lifetime distribution under $NN = 20$ and $V_{\max} = 20$ .

Remark 2 (see [28]).

For real values $x, y \in R$ , $\min (x, y) = x + y - \max (x, y)$ .

The result of this remark is extended to random variables by the following theorem.

Theorem 3 (see [28]).

Given two real-valued continuous random variables X, Y $\in Ω \to R$ , then the expected value of the minimum of the two variables is $E (\min (X, Y)) = E (X) + E (Y) - E (\max (X, Y))$ .

Lemma 4 (see [28]).

Given two real-valued continuous random variables X, Y $\in Ω \to R$ , then the expected value of the maximum of the two variables is $E (\max (X, Y)) = \int_{- \infty}^{\infty} ‍ x f_{X} (x) F_{Y} (x) d x + \int_{- \infty}^{\infty} ‍ y f_{Y} (y) F_{X} (y) d y$ .

Based on Theorem 3 and Lemma 4, the expected link-node lifetime is given by

\begin{matrix} E (Z) = E (X) + E (Y) - E (\max (X, Y)), \end{matrix}

(24)

where $E (X)$ is given in (12) and $E (Y)$ in (18). Figure 19 shows that the expected link-node lifetime resulted from simulation as a function of node velocity. The results show that the expected link-node lifetime decreases rapidly as its velocity is increased and it shows a significant decrease when $V_{\max} \in [1,5]$ . The results also show that under higher network density the expected link-node lifetime becomes longer. The reason for this is that a node in this case shares links with larger number of neighbors, and consequently links with longer durations will be observed.

Figure 19

Expected link-node lifetime.

4.4. Monitoring Period Estimation

Based on the above statistical analysis, we propose a method to choose the appropriate value for the monitoring period. This method is low-cost and more appropriate for resource-constrained networks like sensor networks. We also propose another method that requires some communication cost and can be implemented on nodes with higher capabilities such as mobile sinks or mobile ad hoc networks and vehicular ad hoc networks.

4.4.1. Low-Cost Method

We assume that the monitoring node has no information about the monitored node's velocity, position, or residual battery, and it wants to ensure that $l %$ of its links are observable; that is, they exist for a $duration > Δ$ . As the link-node lifetime follows a Phased Bi-Weibull distribution, the minimum value of Δ, which ensures this requirement, is t such that $P (Z \leq t) = l / 100$ .

4.4.2. High-Cost Method

We assume that each node i can estimate its remaining battery power $E_{i}$ and its rate of energy dissipation ${EDisip}_{i}$ for every time period Δ; an ultraconservative estimate of the residual node lifetime is derived as shown in the following equation:

\begin{matrix} ϑ_{i} = \frac{E_{i}}{\max ({EDisip}_{i})} (s) . \end{matrix}

(25)

Each node i periodically broadcasts a beacon message containing its residual node lifetime $ϑ_{i}$ and its position obtained from GPS. Upon receiving such a message from node i, node j first calculates $d_{i j}$ , that is, the distance separating it from its neighbor i. The relative velocity of node i with respect to node j is $\sqrt{v_{i}^{2} + v_{j}^{2} - 2 v_{i} v_{j} \cos θ}$ , where $v_{i}$ and $v_{j}$ are node i's and node j's velocity, respectively. θ denotes the angle between vectors $\vec{v_{i}}$ and $\vec{v_{j}}$ in the Cartesian coordinate system. The relative velocity is maximum when $v_{i} = v_{j} = V_{\max}$ and $θ = 180$ °, and it equals then to $2 V_{\max}$ . Node j then calculates a conservative estimate of the residual link lifetime, that is, the minimum time for node i to move out of the transmission range of node j. The residual link lifetime, $ξ_{i j}$ , is given by the following equation, where $TR$ is the transmission range:

\begin{matrix} ξ_{i j} = \frac{TR - d_{i j}}{2 V_{\max}} (s) . \end{matrix}

(26)

After that, each node j estimates the residual link-node lifetime given by

\begin{matrix} χ_{i j} = \min (ϑ_{i}, ξ_{i j}) . \end{matrix}

(27)

Therefore, the monitoring period required to observe the monitored node i must be less than $χ_{i j}$ .

5. Conclusion

In this paper, we have proposed IDS solutions for three aspects of dynamicity in ad hoc and wireless sensor networks. The magnitude dynamicity aspect is solved by defining a normal profile based on the invariants derived from the normal node behavior. We have generated a dependency graph consisting of strongly correlated features, and we have derived the high-level features from the graph. The high-level features are obtained by applying the divide-and-conquer strategy on the maximal cliques algorithm and the maximum weighted spanning tree algorithm. Simulation results show that the IDS can achieve a detection rate of 100% when the malicious behavior is not similar to the normal one. In addition, it can also achieve a false positive rate of 0% when the duration of the training time exceeds a given value. To handle nature dynamicity aspect, we have adopted the carrot and stick strategy to prevent a malicious node from evading the IDS. To do so, we have proposed an adaptive reputation fading strategy to allow fast redemption and fast capture of malicious node. We have analytically studied link-node lifetime distribution and have shown that it can be approximated to the Phased Bi-Weibull distribution. Based on this analysis, we have proposed a low-cost method to estimate the minimum monitoring period required to observe the monitored node's behavior. In addition, based on some topology information, we have proposed a high-cost method designed for network having nodes less constrained with resource limitations.

Footnotes

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgment

The authors would like to extend their sincere appreciation to the Deanship of Scientific Research at King Saud University for funding this research through Research Group Project (RG no. 1435-051).

References

Perkins

C. E.

Ad hoc Networking 2008

Reading, Mass, USA

Addison-Wesley Professional

Akyildiz

I. F.

Sankarasubramaniam

Cayirci

Wireless sensor networks: a survey

Computer Networks 2002 38 4 393 422

10.1016/s1389-1286(01)00302-4

2-s2.0-0037086890

Al-Sultan

Al-Doori

M. M.

Al-Bayatti

A. H.

Zedan

A comprehensive survey on vehicular Ad Hoc network

Journal of Network and Computer Applications 2014 37 1 380 392

10.1016/j.jnca.2013.02.036

2-s2.0-84890430168

Djenouri

Khelladi

Badache

A survey of security issues in mobile ad hoc and sensor networks

IEEE Communications Surveys and Tutorials 2005 7 4 2 28

10.1109/COMST.2005.1593277

2-s2.0-84892591410

Gillani

Shahzad

Qayyum

Mehmood

A survey on security in vehicular ad hoc networks

Communication Technologies for Vehicles 2013

New York, NY, USA

Springer

59 74

García-Teodoroa

Díaz-Verdejoa

Maciá-Fernándeza

Vázquezb

Anomaly-based network intrusion detection: techniques, systems and challenges

Computers & Security 2009 28 1-2 18 28

10.1016/j.cose.2008.08.003

Xie

Han

Tian

Parvin

Anomaly detection in wireless sensor networks: a survey

Journal of Network and Computer Applications 2011 34 4 1302 1325

10.1016/j.jnca.2011.03.004

2-s2.0-79956116601

Sun

Osborne

Xiao

Guizani

Intrusion detection techniques in mobile ad hoc and wireless sensor networks

IEEE Wireless Communications 2007 14 5 56 63

10.1109/MWC.2007.4396943

2-s2.0-36849062699

Group-based intrusion detection system in wireless sensor networks

Computer Communications 2008 31 18 4324 4332

10.1016/j.comcom.2008.06.020

2-s2.0-56949096891

10.

Zhang

Meratnia

Havinga

Outlier detection techniques for wireless sensor networks: a survey

IEEE Communications Surveys and Tutorials 2010 12 2 159 170

10.1109/surv.2010.021510.00088

2-s2.0-77955082590

11.

Loo

C. E.

M. Y.

Leckie

Palaniswami

Intrusion detection for routing attacks in sensor networks

International Journal of Distributed Sensor Networks 2006 2 4 313 332

10.1080/15501320600692044

2-s2.0-35348928856

12.

Liu

Cheng

Chen

Insider attacker detection in wireless sensor networks

Proceedings of the 26th IEEE International Conference on Computer Communications (INFOCOM '07)

May 2007

1937 1945

2-s2.0-34548364110

10.1109/infcom.2007.225

13.

Stetsko

Folkman

Matyáš

Neighbor-based intrusion detection for wireless sensor networks

Proceedings of the 6th International Conference on Wireless and Mobile Communications (ICWMC '10)

September 2010

IEEE

420 425

10.1109/icwmc.2010.61

2-s2.0-79952060290

14.

Dowdy

Wearden

Chilko

Statistics for Research 2004 3rd

New York, NY, USA

John Wiley & Sons

15.

Tomita

Tanaka

Takahashi

The worst-case time complexity for generating all maximal cliques and computational experiments

Theoretical Computer Science 2006 363 1 28 42

10.1016/j.tcs.2006.06.015

MR2263489

2-s2.0-33749264452

16.

Sriram

Skiena

Computational discrete mathematics: combinatorics and graph theory with mathematica

Computing Reviews 2004 45 12 775

17.

Zeng

Bagrodia

Gerla

GloMoSim: a library for parallel simulation of large-scale wireless networks

Proceedings of the 12th Workshop on Parallel and Distributed Simulation (PADS '98)

May 1998

154 161

2-s2.0-0031652485

18.

Liu

Issarny

Enhanced reputation mechanism for mobile ad hoc networks

Proceedings of 2nd International Conference on Trust Management

2004

New York, NY, USA

Springer

48 62

19.

Buchegger

Boudec

J.-Y. L.

A robust reputation system for peer-to-peer and mobile ad-hoc networks

Proceedings of the 2nd Workshop on the Economics of Peer-to-Peer Systems (P2PEcon '04)

2004

Cambridge, Mass, USA

20.

Michiardi

Molva

Core: a collaborative reputation mechanism to enforce node cooperation in mobile ad hoc networks

Advanced Communications and Multimedia Security 2002

New York, NY, USA

Springer

107 121

10.1007/978-0-387-35612-9_9

21.

Mathwave data analysis & simulation

http://www.mathwave.com/products/easyfit.html

22.

Schittkowski

EASY-FIT: a software system for data fitting in dynamical systems

Structural and Multidisciplinary Optimization 2002 23 2 153 169

10.1007/s00158-002-0174-6

2-s2.0-0038776040

23.

Forbes

Evans

Hastings

Peacock

Statistical Distributions 2011

John Wiley & Sons

MR2964192

24.

Samar

Wicker

S. B.

On the behavior of communication links of a node in a multi-hop mobile environment

Proceedings of the 5th ACM International Symposium on Mobile Ad Hoc Networking and Computing (MoBiHoc '04)

May 2004

ACM

145 156

2-s2.0-4444252440

25.

Samar

Wicker

S. B.

Link dynamics and protocol design in a multihop mobile environment

IEEE Transactions on Mobile Computing 2006 5 9 1156 1172

10.1109/TMC.2006.131

2-s2.0-33746876749

26.

Gerharz

de Waal

Frank

Martini

Link stability in mobile wireless ad hoc networks

Proceedingsof the 27th Annual IEEE Conference on Local Computer Networks (LCN '02)

2002

IEEE

30 39

27.

Louzada-Neto

Davison

A. C.

A note on bayesian analysis of the poly-weibull model, 1998

28.

Lewellen

Expected maximum and minimum of real-valued continuous random variables, 2013, https://antimatroid.wordpress.com/2013/01/

Fortifying Intrusion Detection Systems in Dynamic Ad Hoc and Wireless Sensor Networks

Abstract

1. Introduction

2. Magnitude Dynamicity

2.1. Background

2.1.1. One-Feature Profile

2.1.2. Multifeature Profile

2.2. Profile Construction Based on Strongly Correlated Features

Proposition 1.

Proof.

2.3. Detection Process

Algorithm 1: Intrusion detection algorithm.

2.4. Simulation Results

3. Nature Dynamicity

3.1. Background: Constant Fading Reputation Strategy

3.2. Adaptive Fading Reputation Strategy

3.3. Performance of Adaptive Discount Factor Strategy

4. Spatiotemporal Dynamicity

4.1. Link Lifetime Distribution

4.2. Residual Node Lifetime Distribution

4.3. Link-Node Lifetime Distribution

Remark 2 (see [28]).

Theorem 3 (see [28]).

Lemma 4 (see [28]).

4.4. Monitoring Period Estimation

4.4.1. Low-Cost Method

4.4.2. High-Cost Method

5. Conclusion

Footnotes

Conflict of Interests

Acknowledgment

References