Detection of selective forwarding attacks based on adaptive learning automata and communication quality in wireless sensor networks

Abstract

Wireless sensor networks face threats of selective forwarding attacks which are simple to implement but difficult to detect. It is difficult to distinguish between malicious packet dropping and the normal packet loss on unstable wireless channels. For this situation, a selective forwarding attack detection method is proposed based on adaptive learning automata and communication quality; the method can eliminate the impact of normal packet loss on selective forwarding attack detection and can detect ordinary selective forwarding attack and special cases of selective forwarding attack. The current and comprehensive communication quality of nodes are employed to reflect the short- and long-term forwarding behaviors of nodes, and the normal packet loss caused by unstable channels and medium-access-control layer collisions is considered. The adaptive reward and penalty parameters of a detection learning automata are determined by the comprehensive communication quality of the node and the voting of its neighbors to reward normal nodes or punish malicious ones. Simulation results indicate the effectiveness of the proposed method in detecting ordinary selective forwarding attacks, black-hole attacks, on-off attacks, and energy exhaustion attacks. In addition, the communication overhead of the method is lower than that of other methods.

Keywords

Selective forwarding attack packet loss rate learning automata wireless sensor network

Introduction

Wireless sensor networks (WSNs) have been widely used in various fields for the purpose of event monitoring and data gathering, including environment monitoring, forest fire monitoring, traffic data collection, and battlefield data gathering.^1–3 However, many WSNs are deployed in harsh or hostile environments that are open and dangerous for the operation of the networks. Due to the openness of wireless communication and lack of physical protection, sensor nodes (SNs) can be easily captured or attacked by malicious attackers; hence, WSNs face various attacks.⁴ One of the most challenging attacks is the selective forwarding attack (SFA). A node experiencing an SFA intentionally discards received packets that should be forwarded, to interfere with data transmission between nodes.^5,6 This seriously affects data collection and data fusion, especially for data-centric networks. According to research,^7,8 some attacks are special cases of SFA, such as black-hole attacks, under which a malicious node discards received packets with a probability of 100%, that is, the node drops all packets and prevents cooperation with other nodes in data transmission. On-off attacks are another type of SFA, under which the node alternatively behaves well and badly, to hide itself and avoid detection during the attack process. In addition, the inherent instability and unreliability of the wireless channel and collisions on the medium access control (MAC) layer inevitably result in normal packet loss, making it difficult to distinguish between normal and malicious packet losses. Therefore, the ability to discover an SFA is useful and necessary for the security, reliability, and energy efficiency of WSNs.

Related works

Some effective methods for SFA detection have been developed.^5–15 The typical mechanisms include reputation-, acknowledgment-, and learning automata (LA)-based schemes.

Reputation-based detection techniques use reputation values to evaluate the forwarding behaviors of nodes during each evaluation period. An adaptive and channel-aware detection of SFAs was proposed in Ren et al.,⁵ which employed long-term reputation values composed of firsthand and secondhand short-term reputations to evaluate the forwarding behaviors of nodes. After a series of evaluation periods, the reputation values of malicious nodes reduce sharply; a node is denoted as malicious when its average reputation value evaluated by neighbors is lower than a threshold.⁵ During the behavior evaluation process, the wireless channel and normal packet loss are taken into consideration to recognize malicious dropping. However, the method cannot efficiently detect on-off SFA attacks, because it does not consider the historical reputation values of a node. An audit-based misbehavior detection (AMD) method for selective packet droppers was put forward in Zhang et al.,⁹ which combined reputation management, trustworthy route discovery, and recognition of malicious nodes on the basis of behavior audits in ad hoc networks. The method also considers firsthand and secondhand reputations; a secondhand reputation is used only if a firsthand reputation is not available. The firsthand reputation evaluation of a node depends only on its last reputation with a multiplicative or additive factor according to the behavior audits, whereas a secondhand reputation evaluation relies on the secondhand information provided by a set of nodes within the last t₀ epochs. Because of the priority of the firsthand reputation, it is not effective for detecting on-off attacks with SFA.

Acknowledgment-based detection techniques depend on acknowledgment packets received from other nodes to measure the number of packets reaching the next hop. An enhanced adaptive acknowledgment (EAACK) for mobile ad hoc networks (MANETs) was designed in Shakshuki et al.,¹⁰ which employed a hop-by-hop acknowledgment and an improved two-hop acknowledgment mechanisms (three consecutive nodes work in a group) to detect malicious nodes to reduce the overhead. The ACK is transmitted through the original data-delivery path. A per-hop acknowledgment (PHACK)-based scheme was proposed in Liu et al.⁷ for WSNs to detect SFA in which each ACK was delivered along a path different from the original data transmission path. However, ACK-based methods cause a high overhead due to more ACKs transmitted.

LA-based detection techniques rely on the rewards and penalties of actions of LA to detect malicious behavior. There are two categories: sample LA-based^11–13 and action LA-based.^8,14,15

The LAID and SLAID mechanisms were proposed in Misra et al.,^11,13 which applied the LA to packet sampling to detect attacks. The approaches examine the samples to discover malicious packets of any intruders rather than detect the behaviors of nodes. Due to resource constraints, it is not possible to examine all the packets passing through a node; thus, the LA adjusts the sampling rate automatically to balance the detection rate and the WSN energy consumptions. These approaches detect malicious packets according to key characteristics, and they are suitable for both external and internal attacks. However, while they are effective for detecting malicious traffic or packets, they are not effective for detecting selective forwarding behaviors.

An anomalous node detection system based on LA for MANETs called CLAIDS was proposed in Fathinavid and Ansari⁸; it consists of four modules: a data collection module (DCM), a data transmission quality (DTQ) module, a cluster aggregation and fusion module (CAFM), and an intrusion response module. The system determines the threshold of a suspect node by considering the stability of the nodal behavior for packet forwarding and energy level. The action probability of an LA is rewarded or penalized depending on the voting results from its neighbors. Another protocol for SFA detection based on LA in WSNs was put forward in Fathinavid and Aghababa,¹⁵ which used the number of ACKs of neighbors and the energy levels of nodes in the network to determine the feedback to the LA from the environment. The reward and penalty parameters are adaptive for different occasions. Both systems can detect black-hole attacks, packet-dropping attacks, and malicious flooding attacks.

However, there are some aspects to these contributions that need improvement: (1) the initialization of action probability considers only the energy level, which may lead to a next hop far from the cluster head (CH); (2) the evaluation of forwarding behavior relies on acknowledgements from the neighbor nodes, which does not consider the losses of acknowledgment packets caused by channel conditions or malicious dropping, and thus may result in statistical error; (3) on-off attacks with SFA cannot be detected effectively because the historical communication quality of nodes is not considered; and (4) the reward and penalty parameters of LA in Fathinavid and colleagues^8,14 are fixed with artificial assignments depending on the voting of neighbors, which could affect the flexibility of the system.

Therefore, it is necessary to design a flexible SFA detection mechanism with adaptive parameters that takes the normal packet loss caused by unstable channels and MAC layer collisions into account to eliminate the impact of normal packet loss on SFA detection. Furthermore, it could detect multiple SFAs, including ordinary SFAs and typical special cases such as black-hole attacks and on-off attacks. Finally, energy exhaustion attacks should also be considered to maintain the network lifetime.

Major contributions

Based on the above analysis and existing problems, we propose an SFA detection method using LA with adaptive reward or penalty parameters depending on nodal comprehensive communication quality. During the detection process, the action probability initialization of LA is improved by introducing a new consideration. While evaluating the communication quality and the forwarding behavior of a node, the wireless channel conditions are taken into account to estimate normal packet loss, thereby allowing to eliminate the impact of normal packet loss on SFA detection. In addition, the method can detect continuous SFA and on-off attack behaviors effectively.

The major contributions and innovations of this work are as follows:

The initial route based on the initial action probability of each LA is more accurate and effective. When initializing the action probability, the hop count from a node to its CH is a new consideration. Both the energy levels of nodes and their hop counts to CHs are taken into account when initializing the action probability of each LA for next-hop routing; this improves the routing accuracy and effectiveness compared to equal probabilities and a probability that considers only energy level.

The evaluation of node forwarding behavior is more objective and practical. Wireless channel conditions and normal packet loss are taken into account to evaluate the forwarding behavior of an intermediate node, which is closer to the actual application. In addition, the evaluation is from the perspective of the sender, rather than that of the receiver, avoiding statistical errors caused by packet loss or by a receiver that drops the acknowledgment packets deliberately.

The proposed method can detect on-off attacks more effectively. The comprehensive communication quality of nodes is introduced to detect malicious behavior; this considers the historical communication quality of nodes by means of sliding time windows with dynamic aging factors. Under these circumstances, bad behavior will be remembered for a longer time. Thus, malicious nodes suffering from on-off attacks can be detected effectively.

The flexibility and applicability of the detection method have been improved. The adaptive reward and penalty parameters of LA are adjusted according to nodal comprehensive communication quality or voting by neighbors rather than fixed by artificial assignment; this increases the learning speed of the LA and improves its flexibility.

The remainder of this article is organized as follows. Section “LA concept” presents the concept of LA. A network model and assumptions are described in section “System model and assumptions.” Next, the method for SFA detection is proposed in section “SFA detection based on adaptive LA and communication quality.” Section “Simulation experiment and performance evaluation” describes a simulation and evaluates its results. Finally, section “Conclusion” presents the study’s conclusions.

LA concept

An LA is a self-learning model that obtains knowledge from the execution of an action and feedback from the environment and then decides the future action that should be taken. The working process of an LA is as follows: (1) the automaton randomly selects an action from its finite set of actions to execute in the environment, (2) the environment evaluates the action of the automaton and returns positive or negative feedback to it, and (3) the feedback rewards or penalizes the automaton, and then the automaton updates its action probability and selects the next action. The three steps above are looped to obtain an optimal action. Therefore, the LA consists of three components: the automaton, the environment, and the reward or penalty principles, as shown in Figure 1. More details of LA can be found in the classic work of Narendra and Thathachar¹⁶ and the book by Oommen and Misra.¹⁷

Figure 1.

Structure of a learning automaton.

Figure 1 describes the relationship of the three components. According to Narendra and Thathachar,¹⁶ an LA can be classified into two categories: a fixed-structure stochastic LA and a variable-structure stochastic LA; the latter is considered in this article. The definition of a variable-structure LA is a quadruple {α, β, p, T}, where α = {α₁, α₂, …, α_r} represents the finite set of actions of the LA, β = {β₁, β₂, …, β_r} is the input set of LA from the environment, p = {p₁,p₂, …, p_r} denotes the set of action probabilities in LA, and T indicates the learning algorithm to update the action probabilities, that is, p(n + 1) = T [α_n, β_n, p_n], where n represents any instant.

The definition of the environment is a triple {α, β, c}, where α = {α₁, α₂, …, α_r} represents a finite input set for environment, β = {β₁, β₂, …, β_r} is the output set of the environment, and c = { c₁, c₂, …, c_r} denotes the penalty probabilities where each c_i corresponds to an action α_i. According to Figure 1 and the description above, we can see that the working process is the set of interactions between an LA and its environment, where the input (output) set of an LA is the output (input) set of the environment. Therefore, there are two identical sets α and β for the LA and the environment.

At instant n, the LA selects an action α_i from the finite set of actions with the probability p_i and performs it in the environment. The environment evaluates the action and returns a feedback β_i; then, the LA updates its action probability set for the next instant n + 1 according to equations (1) or (2) for favorable or unfavorable feedback, respectively

P_{j} (n + 1) = {\begin{matrix} P_{j} (n) + a (1 - P_{j} (n)) j = i \\ P_{j} (n) - a P_{j} (n) \forall j, j \neq i \end{matrix}

(1)

P_{j} (n + 1) = {\begin{matrix} (1 - b) P_{j} (n) j = i \\ \frac{b}{r - 1} + (1 - b) P_{j} (n) \forall j, j \neq i \end{matrix}

(2)

where a and b are the reward and penalty parameters, respectively, and r is the total number of actions in the finite action set. Following Thathachar and colleagues,^16,18 for a = b, the LA is called the reward–penalty algorithm (L_R–P); for a << b, it is called the reward–ε penalty algorithm (L_RεP); and for b = 0, it is called the reward–inaction algorithm (L_R–I).

System model and assumptions

This section describes the network model, and the assumptions are demonstrated in the model.

A clustered WSN for collecting observation data from the environment consists of SNs, CHs, and a base station (BS), as shown in Figure 2. Once the network is randomly deployed in the environment, the positions of the members are stationary. The network is divided into different clusters, and nodes are allocated into corresponding clusters. Each cluster has a CH and several SNs. The selection of the CH employs an LA-based method depending on the residual energy and the number of neighbor nodes; for details, refer to Fathinavid et al.¹⁴ After clustering, each node belongs to a unique cluster. The SNs transmit data to their respective CHs directly or through other SNs in the same cluster, and CHs communicate with the BS directly or indirectly through other CHs in the ranges of their radios. Due to the openness of wireless communication, the radio environment is unstable in the network, and packet loss is common in communications between members. The packet loss rate (PLR) constantly changes because of the unstable channel. Each member in the network possesses a unique ID. We assume that no attack has occurred between network deployment and initial route establishment.

Figure 2.

Model of a clustered WSN.

Figure 2 displays a clustered WSN. An intra-cluster network consists of a CH and all SNs in the same cluster, and a higher level backbone network comprises the BS and all the CHs.

SFA detection based on adaptive LA and communication quality

This section describes in detail the detection of SFA based on adaptive LA and communication quality (DSFLACQ). The proposed method is based on L_R–P LA in each node of the network. The action set of an LA consists of its neighbor nodes; the environment is the comprehensive communication quality of a selected neighbor node and the voting of its neighbors. The details of process of the method include three phases: (1) initialization of the network and action probabilities in each LA, (2) evaluation of the communication quality of the neighbor nodes, and (3) detection and punishment of selective forwarding behaviors of nodes.

The method is conducted separately in the intra-cluster networks and backbone network to discover malicious SNs and malicious CHs. The description below applies to the intra-cluster network; the method is almost identical in the backbone network.

Initialization of the network and LA

The initialization phase includes the network initialization and initializations of the action probabilities of each LA.

Network initialization

After the network deployment is complete, the BS generates a “neighbor discovery” message throughout the network. Each node receiving the message rebroadcasts it to its neighbors and establishes the neighbor list. The message consists of a unique node ID and energy level. Then, the clustering process starts, conducted using an LA method described in Fathinavid et al.¹⁴ According to the energy levels and number of neighbors of nodes, the selection of CHs is executed to ensure the CHs possess more energy and to avoid sparse clusters. When the construction of clusters is finished, the CH broadcasts a message to update the neighbor lists of nodes in each cluster, and each node obtains the hop count to its corresponding CH. The next hop of routing is decided based on the probabilities of neighbor nodes, which will be demonstrated below.

Action probabilities in LA initialization

The action set of the LA in each node is its neighbors, from which the next hop can be selected to establish the local routing table. The set of probabilities {p₁, p₂, …, p_r} in each node is constructed according to the features of neighbors, where r represents the number of neighbors and p_i denotes the probability of selecting neighbor node i as the next hop. In previous work such as Misra et al.,¹¹ the initialization of the probabilities was to assign a common value to ensure equal probabilities, that is, p_i = 1/r. To improve the routing accuracy in this work, we calculate the preference of the next hop according to the energy level and the hop count to the CH as follows

\begin{matrix} p_{i} = w_{1} \times \frac{energyleve l_{i}}{\sum_{j = 1}^{r} energyleve l_{j}} + w_{2} \\ \times \frac{\sum_{j = 1}^{r} ho p_{j} - ho p_{i}}{(r - 1) \times \sum_{j = 1}^{r} ho p_{j}} i, j = 1, 2, 3, \dots \end{matrix}

(3)

where p_i represents the probability of selecting neighbor node i, energylevel_i is the residual energy of neighbor node i, hop_i denotes the hop count from the neighbor node i to the CH, r indicates the number of neighbor nodes, and w₁ and w₂ between 0 and 1 weight the relative importance of energy level and hop count, respectively. The reason for taking the energy level and hop count into consideration is to maximize the probability of nodes with more energy and fewer hops to CH. If we considered only energy level, the next hop might be selected as one “far” from the CH.

Evaluation of the communication quality of nodes

The communication quality of a given neighbor node is the basis for favorable or unfavorable feedback for its LA action probability. The current communication quality and the comprehensive communication quality of a node are considered in this work, where the former is the communication quality of a node at the current instant and the latter is the communication quality of a node that takes its historical quality into account. A sliding time window is introduced to select several recent instants to calculate the comprehensive communication quality.

Evaluation of current communication quality

The current communication quality of a node is evaluated according to its forwarding behaviors and energy deviation rate.

Forwarding behaviors are often measured by the packet forwarding rate of an intermediate node, which is usually defined as the ratio of the number of forwarded packets of a node to the number of its received packets. For example, if a node S delivers packets to node D through node F, the packet forwarding rate of node F is evaluated by node S according to the above definition. The number of forwarded packets of node F is obtained by monitoring of node S, and the number of packets received by F is acquired by the number of received acknowledgment packets from F.^8,14 However, this assumes that the channel and the communications are stable, that is, normal packet loss is not considered. If some acknowledgment packets are lost, the number of received packets obtained by node S will not be accurate, and thus the packet forwarding rate of node F is not reliable. In addition, if node F deliberately discards some acknowledgment packets, the evaluation of its packet forwarding rate by node S will not be credible.

Therefore, considering the communication channel conditions, the forwarding behavior of an intermediate node should be evaluated from the perspective of its last source node. The forwarding rate of a node j denoted as FR_j is calculated by its last source node i as follows

F R_{j} = min (\frac{N F_{j}}{N S_{i} - N S_{i} \times p_{ij}}, 1)

(4)

where NF_j represents the number of forwarded packets of node j monitored by node i, NS_i denotes the number of packets sent from node i to j for forwarding, and p_ij indicates the estimated normal PLR between nodes i and j, which will be detailed below. The min() function avoids a numerator greater than the denominator. In this way, the method can evaluate malicious packet loss behaviors.

In WSNs, the instability of the physical layer wireless channel and collisions in MAC layer are two main reasons for normal packet loss. The instability of the physical channel is expressed as the fluctuation of wireless link quality with time and location. Therefore, link quality is a main factor to estimate packet loss. According to Liu and Cerpa,¹⁹ there is a close correlation between the link quality and the physical (PHY) parameters including received signal strength indicator (RSSI), link quality indicator (LQI), and signal-to-noise ratio (SNR) which are captured from the received packets and environment by radio chips, such as the CC2420. The PLR can be estimated based on the PHY parameters; we adopted LQI and SNR for calculations because RSSI reduces the correlations between PLR and SNR/LQI significantly. Therefore, the PLR caused by channel conditions from node i to j denoted as $p_{ij}^{C}$ can be estimated by LQI and SNR based on Liu and Cerpa.¹⁹

Packet transmission is based on the IEEE 802.11 distributed coordination function (DCF) MAC protocol,²⁰ and packet loss may occur due to collisions in the MAC layer. The number of neighbors of a node (denoted as r) is the number of nodes competing for the wireless channel. Let p_t be the probability that a node transmits data in a time slot. When the channel is stable, the probabilities of an idle, successful, and colliding slot denoted as p_idle, p_suc, and p_col, respectively, are as follows²¹

{\begin{matrix} p_{idle} = {(1 - p_{t})}^{r} \\ p_{suc} = r \times p_{t} \times {(1 - p_{t})}^{r - 1} \\ p_{col} = 1 - p_{idle} - p_{suc} \end{matrix}

(5)

The channel busyness ratio expressed as C_b is defined as the proportion of time that the channel is in successful or collision status, which can be calculated as

C_{b} = 1 - \frac{p_{idle} \times σ}{p_{idle} \times σ + p_{suc} \times T_{suc} + p_{col} \times T_{col}}

(6)

where σ, T_suc, and T_col indicate the duration of an idle slot, a successful transmission slot, and a collision slot, respectively.²⁰

The number of interfering neighbor nodes is r − 1 while competing for the channel, and the normal PLR caused by MAC collisions denoted as $p_{ij}^{M}$ is the probability that there is at least one interfering node transmitting packets, which is expressed as

p_{ij}^{M} = 1 - (1 - p_{t})^{r - 1}

(7)

The channel busyness ratio C_b can be expressed by p_t according to equations (5) and (6), which is easy for a node to acquire by monitoring;²¹ therefore, p_t is available to calculate $p_{ij}^{M}$ based on equation (7).

Considering the channel conditions and the MAC layer collisions, the normal PLR from node i to j is estimated as $p_{ij} = p_{ij}^{C} + p_{ij}^{M} - p_{ij}^{C} p_{ij}^{M} \approx p_{ij}^{C} + p_{ij}^{M}$ .

The energy deviation rate is another important factor to evaluate the quality of communication. A typical energy consumption model introduced in Liu et al.⁷ is employed in this work. An energy deviation rate denoted as ER is defined to measure the quality of node j as follows

E R_{j} = \frac{energyleve l_{j}}{Ienerg y_{j} - E_{js} (l)}

(8)

where energylevel_j represents the actual residual energy of a node j, Ienergy_j denotes the initial energy of node j, and E_js(l) indicates the estimated energy consumption of node j while sending l packets. The ratio of actual residual energy to the estimated energy is used to evaluate the quality of the node in energy.

Finally, the current community quality of a node j denoted as q_j is obtained by combining the evaluation of forwarding behavior and energy deviation rate of j as follows

q_{j} = η_{1} \times F R_{j} + η_{2} \times E R_{j}

(9)

where q_j∈ [0, 1], and η₁ and η₂ weight the importance of forwarding behavior and energy deviation rate, respectively.

Comprehensive communication quality evaluation

The historical quality of a node should be considered to evaluate the comprehensive quality to resist on-off attacks with SFA, because the current communication quality is a short-term quality that focuses only on the current instant. Therefore, a sliding time window with a dynamic aging factor is taken into consideration during the evaluation. A sliding time window ΔT consists of a number of time units, and the current quality of a neighbor node in each time unit is recorded. The time window moves forward 1 unit as a time unit consumption and the quality in the first unit is discarded with the latest quality value joining in. An example of a sliding time window with 5 units is shown in Figure 3. In the first unit of time window Δt₁, the current quality of a node was q₁. As time elapses, q₁ is dropped from the new time window Δt₂, and q₆ is added simultaneously.

Figure 3.

Sliding time window with 5 units.

An aging factor²² is introduced to indicate the importance of each time unit in the sliding time window; the closer the time unit to the latest current time, the more important it is in the evaluation of the comprehensive communication quality. The comprehensive communication quality Q_j of a node j is evaluated as follows

Q_{j} = \frac{\sum_{i = 1}^{m} g_{i} \times (1 - q_{j}^{i}) \times q_{j}^{i}}{\sum_{i = 1}^{m} g_{i} \times (1 - q_{j}^{i})}

(10)

where Q_j∈ [0, 1], m is the length of the window, $q_{j}^{i}$ is the current quality of node j in time unit i, and g_i∈ [0, 1] with g₁ < g₂ < ⋯ < g_m is the aging factor described above. Here, an exponential function is adopted to express the aging factor: g_i = φ^m−i, where φ∈ [0, 1]. The comprehensive communication quality of a node can make bad behaviors be remembered for a longer time than good behaviors, which is effective for detecting the node alternatively behaving badly and well.

Detection and punishment of selective forwarding behaviors

After the LA of node i selects node j as the next hop according to the action probability, the environment provides a response to the LA according to the comprehensive communication quality of node j evaluated by node i or the voting of its neighbors. Node j is regarded as suspicious while its comprehensive communication quality Q_ij evaluated by node i is lower than a threshold Q_T, which can be predefined based on the network requirements. If node j is suspicious, node i will report this to the CH which can initiate a voting process for node j according to the comprehensive communication quality evaluated by each neighbor that interacts with it. Otherwise, this action in LA will be rewarded with a = 0.6 according to the L_R–P model.

The malicious node confirmation and the dynamic reward or penalty parameters are decided by the result of voting process. A normal proportion denoted as R_n is defined as the ratio of the number of nodes that considers the suspicious node to be normal to the number of nodes that participates in the voting. The decision is made as follows:

If R_n < 20%, more than 80% of the nodes participating in the voting regard the suspicious node as malicious; therefore, the suspicious node will be isolated and eliminated from routing. Furthermore, the CH notifies all neighbors of this malicious node to add it to their blacklists. This action probability in each LA is set to 0, and the LA that selected this malicious node as the next hop will reselect the next hop according to the action probability.

If 20% ≤ R_n < 70%, more than 30% and fewer than 80% of the participating nodes consider the suspicious node to be malicious; the environment will give an unfavorable response to the LA, that is, β = 1, and this action in LA will be penalized with a dynamic parameter b = (1–R_n)/2 according to L_R–P model.

If R_n ≥ 70%, fewer than 30% of the participating nodes regard the suspicious node as malicious, and the environment will give favorable feedback to LA, that is, β = 0. This action in the LA will be rewarded with a dynamic parameter a = Q_em×R_n according to the L_R–P model, where Q_em represents the mean value of the comprehensive communication quality of the suspicious node evaluated by the voting nodes.

Algorithm 1 summarizes the procedures of SFA detection and updating the action probabilities in LA.

Algorithm 1. Detection of SFA and updating LA action probabilities
1 Phase 1: Initialization 2 neighbors discovery and clustering; 3 for the LA in each node do 4 for each neighbor of a node do 5 initialize action probabilities according to (3) 6 end 7 end 8 Phase 2: Evaluate the quality 9 for each node ido 10 //node j is the next hop selected by LA in node i; 11 evaluate the forwarding behavior of node j FR_ij with (4); 12 evaluate the energy deviation rate of node j ER_ij with (8); 13 calculate the current quality of node j q_ij with (9); 14 calculate the comprehensive quality of node j Q_ij with (10) 15 end 16 Phase 3: SFA detection and punishment 17 for each node ido 18 ifQ_ij < Q_T 19 initial voting for node j; 20 ifR_n < 20% 21 confirm the malicious node and eliminate it; 22 this action probability is set to 0 in the LA; 23 end 24 if 20% ≤ R_n < 70% 25 β = 1 and penalize this action with b = (1–R_n)/2 according to (2); 26 end 27 ifR_n ≥ 70% 28 β = 0 and reward this action with a = Q_em×R_n according to (1); 29 end 30 else 31 β = 0 and reward this action with a = 0.6; 32 end 33 end

Algorithm 1. Detection of SFA and updating LA action probabilities

1 Phase 1: Initialization
2 neighbors discovery and clustering;
3 for the LA in each node do
4 for each neighbor of a node do
5 initialize action probabilities according to (3)
6 end
7 end
8 Phase 2: Evaluate the quality
9 for each node ido
10 //node j is the next hop selected by LA in node i;
11 evaluate the forwarding behavior of node j FR_ij with (4);
12 evaluate the energy deviation rate of node j ER_ij with (8);
13 calculate the current quality of node j q_ij with (9);
14 calculate the comprehensive quality of node j Q_ij with (10)
15 end
16 Phase 3: SFA detection and punishment
17 for each node ido
18 ifQ_ij < Q_T
19 initial voting for node j;
20 ifR_n < 20%
21 confirm the malicious node and eliminate it;
22 this action probability is set to 0 in the LA;
23 end
24 if 20% ≤ R_n < 70%
25 β = 1 and penalize this action with b = (1–R_n)/2 according to (2);
26 end
27 ifR_n ≥ 70%
28 β = 0 and reward this action with a = Q_em×R_n according to (1);
29 end
30 else
31 β = 0 and reward this action with a = 0.6;
32 end
33 end

Simulation experiment and performance evaluation

This process was simulated experimentally, and the performance of the proposed method is evaluated in this section. We describe the experimental settings and simulated attacks, declare the performance evaluation metrics, and reveal the results of the evaluation; in addition, we compare the proposed method with the LA method CLAIDS proposed in Fathinavid and Ansari.⁸

The evaluation of the proposed method is mainly of two aspects: performance and overhead. The performance evaluation includes the detection rate, false positive rate (FPR), and false negative rate (FNR), which are detailed in sections “Performance evaluation metrics” and “Experimental results and analysis”; the overhead evaluation considers communication overhead and storage overhead illustrated in section “Experimental results and analysis.”

During the evaluation, we first determined the key parameters of the method based on their effects on the detection rate, such as η₁ and η₂ in equation (9), and the detection threshold. Then, CLAIDS⁸ and our method are compared and analyzed in terms of the detection rate, FPR, and FNR. Finally, the communication and storage overheads of the two methods are compared and analyzed.

Experiment description

The evaluation of the proposed method was conducted on OMNeT++. The members of a clustered WSN were divided into three categories: SNs, CHs, and the BS; all members were stationary and uniformly distributed. The area of the network was set according to the number of nodes, that is, there was a node deployed in an area of 50 m × 50 m on average, and the BS was located at the center of the area. The communication range of a node was 80 m. The weights of the initial LA action probabilities in equation (3) was set to w₁ = 0.4 and w₂ = 0.6 to obtain accurate routing. The malicious nodes were simulated according to the attacks described in subsection “Simulated attacks” of section “Simulation experiment and performance evaluation” and the proportion of malicious nodes was set randomly from 10% to 50%. Different attacks could occur on a node simultaneously, and the probability of packets dropping in ordinary SFA was set randomly from 20% to 80%. The unstable wireless channel was modeled by Gilbert.²² The length of the sliding time window for comprehensive communication quality evaluation was set to 5 in this work. The simulation time lasted for 1000 s. The simulation parameters and the range of values used in the experiments are listed in Table 1.

Table 1.

Simulation parameters and the range of values in experiments.

Items	Values or ranges
Area of WSNs deployed (m)	600 × 600
Number of members in WSNs	151
Number of members in a cluster	10–30
Number of clusters	5–15
Initial weights of LA action probabilities	w₁ = 0.4, w₂ = 0.6
Probability of packets dropping in SFA	20%–80%
Proportion of malicious nodes	10%–50%
Transmission range of nodes (m)	80
Initial energy of a node (J)	0.5
Length of sliding time window	5
Simulation time (s)	1000

WSN: wireless sensor network; LA: learning automata; SFA: selective forwarding attack.

Simulated attacks

In this article, we focus on SFAs that are simple to implement but difficult to discover. Furthermore, there are some attacks associated with SFA, for example, black-hole attacks and on-off attacks (the node alternatively behaves well and badly). In addition, energy exhaustion attacks were considered. These four attacks were simulated in the experiments.

Ordinary SFA

The node discards received packets deliberately with a certain probability to interfere with the communications between nodes. For example, the probability of a random packet discard could be set in the range of 20%–80%.

Some predefined malicious nodes were deployed randomly in the network to simulate ordinary SFA. For ease of implementation, we set the probability of malicious discarding to 50%–70%. The selective forwarding behavior was conducted by the preset counter, for example, the node forwards one received packet based on routing table and then discards the next two received packets. According to the value of the counter, the node determines whether to forward or discard the received packet.

Black-hole attacks

This is a special case of SFA in which all the received packets are dropped to prevent the node from transmitting data.

The implementation of black-hole attacks relies on deploying some black-hole attack nodes in the network simulation scenario. The main difference between a normal node and a black-hole attack node lies in the routing process at the network layer. The model of the routing process of a black-hole attack node is established as follows:

The node does not send a routing request;

The node does not transmit packets;

For all routing-request messages received from other nodes, the node replies with the response message “I have the shortest path to the destination node”;

The node discards all received packets without forwarding.

On-off attacks with SFA

A node under this attack behaves alternatively well and badly to conceal its misbehavior. In our simulations, the on-off attacks were simulated as a node behaving badly on packet forwarding for at least 1 time unit in each sliding time window.

On-off attacks were simulated by changing the route executor of the node, adding a timing function. When the node started, it executed its normal routing (normal routing mode) and timing programs; after 4 time units, the node switched to the routing mode of ordinary SFA for 1 time unit. According to the timing program, the route executor switches between the two modes. To give a special example, the node could sleep for 1 time unit every 4 time units to simulate a special on-off attack.

Energy exhaustion attacks

The malicious node prevents data transmission by exhausting the resources and energy of another node to damage the network. A node under this attack cannot transmit packets normally and its energy decreases sharply.

In the simulation, some malicious nodes were deployed in the network to exhaust the resources and energy of other nodes. The energy consumption of an SN is extremely large when it keeps in a communication state. Therefore, the malicious node constantly makes and initiates communication requests to the target node and sends a large number of fake messages, consuming the target’s energy and occupying its communication processing capability.

Performance evaluation metrics

We evaluated the performance of the method in terms of three metrics: detection rate, FPR, and FNR.

The detection rate is defined as the ratio of accurate detections to all detections. The FPR is denoted as the ratio of the number of normal nodes recognized as malicious to the actual number of normal nodes. The FNR is the ratio of the number of malicious nodes recognized as normal to the actual number of malicious nodes.

Experimental results and analysis

Here, we describe the performance and overhead evaluation and analysis, and compare two methods.

Key parameter analysis

The effects of η₁ and η₂ parameters in equation (9) for communication quality evaluation should be analyzed in the simulation, because they affect the detection rates of different attacks. Figures 4 and 5 demonstrate the detection rates of different attacks for different values of η₁ and η₂.

Figure 4.

Detection rate of different attacks with η₁ = 0.7 and η₂ = 0.3, 30% of nodes were malicious, and the packet-dropping rate was 50%.

Figure 5.

Detection rate of different attacks with η₁ = 0.3 and η₂ = 0.7, 30% of nodes were malicious, and the packet-dropping rate was 50%.

The simulated attacks can be classified into two categories: behavior-related attacks and energy-related attacks. The former includes ordinary SFA, black-hole attacks, and on-off attacks; the latter refers to our simulated energy exhaustion attacks. In the simulations, the proportion of malicious nodes was set to 30%, and the SFA packet-dropping rate was set to 50%. In both situations, the detection rate decreased with the increase in the number of nodes. In Figure 4, however, the behavior-related attacks are detected more accurately with the parameters η₁ = 0.7 and η₂ = 0.3, which means that forwarding behaviors are more influential than energy consumption during the process of communication quality evaluation. Therefore, the detection rates of behavior-related attacks are higher than those of the energy-related attacks on the whole, and vice versa. In Figure 5, the detection rates of energy-related attacks are higher than those of others because more emphasis is placed on the energy consumption in communication quality evaluation, that is, η₁ = 0.3 and η₂ = 0.7. The parameters can be adjusted according to the detection requirements; we used η₁ = 0.7 and η₂ = 0.3 in subsequent simulations due to more emphasis on forwarding behaviors.

The detection performance is determined by the malicious detection threshold, and the comprehensive communication quality is taken as the malicious detection threshold in our method. If the comprehensive communication quality of a node is lower than the threshold, it will be recognized as a suspect node that should be reported for a further decision with the voting process. The purpose of the detection threshold is to maintain the FPR and FNR at a lower level. In the simulation, the parameters were set as follows: the number of nodes was set to 150, the proportion of malicious nodes was set to 30%, η₁ = 0.7, and η₂ = 0.3. Figure 6 depicts the trends of FPR and FNR with different detection thresholds, from which we can see that FPR increases, whereas FNR decreases with the increase in the threshold. Moreover, FPR increases significantly in the threshold range of [0.7, 1] and FNR decreases sharply in the threshold range of [0.3, 0.8]. The reason is that the comprehensive communication quality values of most normal nodes are in the range of [0.7, 1], whereas that of the vast majority of malicious nodes are in the range of [0.3, 0.8]. The intersection of two curves is the target of lower FPR and FNR, from which we can infer a reasonable threshold to be approximately 0.72.

Figure 6.

Trends of FPR and FNR with different detection thresholds.

Comparison of performance and analysis

We compared the performance of our method with the CLAIDS intrusion detection mechanism: cellular LA-based approach for anomaly nodes detection,⁸ which is described in section “Introduction” and is similar to our work.

We compared the malicious detection rates of different attacks under the same conditions: 100 nodes, 30% malicious nodes, 60% SFA packet-dropping rate, and weight parameters η₁ = 0.7 and η₂ = 0.3. Figure 7 displays the malicious detection rates of four attacks, from which we can observe that our method is better than CLAIDS in detecting ordinary SFA and black-hole attacks. This improvement is observed because of the improved evaluation of forwarding behavior, and not relying on the number of ACKs. An additional reason is the employment of adaptive reward and penalty parameters in the LA. Our method is slightly weaker than CLAIDS in detecting energy exhaustion attacks, because we consider only the energy consumption of sending packets in evaluating the energy deviation rate. However, our method has an advantage in detecting on-off attacks, the detection rate of which was 92% whereas that of CLAIDS was 83%. This is because of the adoption of comprehensive communication quality to detect malicious behaviors, which considers the historical communication quality of a node.

Figure 7.

Comparison of detection rates of different attacks using DSFLACQ and CLAIDS.

The difference between the comprehensive and current communication quality of a node in the process of detection is demonstrated in Figure 8. Three cases were simulated in our experiments: (1) a normal node with no attack, (2) a node under on-off attacks with SFA (packet-dropping rate was 60%) that behaves badly for 1 unit in a sliding time window, and (3) a node suffering from SFA with a continuous packet-dropping rate of 60%. We discovered that the comprehensive and current communication quality of a node are quite similar in the first and the third cases (refer to the left and right areas of Figure 8), which means that either quality measures can be used for malicious detection. However, in the second case, there are significant differences between current communication quality and comprehensive communication quality. When a node is attacked at an instant, both quality measures reduce sharply. If the node returns to its normal state for the next instant, its current communication quality will recover to a normal level immediately; however, its comprehensive communication quality will remain at a lower level rather than recover instantly (refer to the middle area of Figure 8). Therefore, comprehensive communication quality is more effective than current communication quality for detecting on-off attacks, which is why our method has an advantage in this case.

Figure 8.

Difference between comprehensive and current communication quality.

A comparison of the overall detection rates of the two methods is depicted in Figure 9, which indicates that the detection rates of our method and CLAIDS decrease from 97.1% to 85.9% and from 96% to 85.5%, respectively, as the number of nodes increases. We note that the overall detection rate of our method is higher than that of CLAIDS because of the better performance of our method in detecting behavior-related attacks described in Figure 7, including its improvement of evaluating forwarding behavior and adaptive parameters in the LA.

Figure 9.

Comparison of overall detection rates of DSFLACQ and CLAIDS.

Figures 10 and 11 manifest the false positive and negative rates of the DSFLACQ and CLAIDS methods with the increase in the number of nodes, from which we observe that the FPR of our method varies from 2% to 13.9%, whereas that of CLAIDS varies from 3% to 15%. The FNR varies from 1.2% to 15.5% for DSFLACQ and from 1.3% to 16.75% for CLAIDS. For both FPR and FNR, our method performs better than CLAIDS because we take comprehensive communication quality as the detection threshold, and select the threshold that keeps the FPR and FNR lower.

Figure 10.

Comparison of false positive rates of DSFLACQ and CLAIDS.

Figure 11.

Comparison of false negative rates of DSFLACQ and CLAIDS.

The LA reward and penalty parameters of our method are dynamic and adaptive, depending on the quality of the selected node or the voting of the node’s neighbors. The adaption parameters are effective for learning efficiency. The update of action probability in an LA with adaptive and fixed parameters is indicated in Figure 12. Taking three actions as examples, p₁, p₂, and p₃ are probabilities of three actions, respectively. If action 1 is penalized from instant 1, the LA with fixed parameters will receive the optimal choice at instant 5, whereas the LA with adaptive parameters will receive the optimal choice at instant 3, which improves the efficiency.

Figure 12.

Comparison of action probability in an LA with adaptive and fixed parameters.

Comparison of overhead and analysis

The additional communication and storage overhead needed for malicious detection are conducted in this section. We also compare the overhead of our method with that of CLAIDS.

Suppose there are C clusters in the network, and each cluster has N nodes. A node sends 10 packets in each time interval, and in the worst case, each node has N − 1 neighbors. In CLAIDS, the forwarding behaviors are evaluated by the number of forwarding packets and ACKs from the neighbor nodes.⁸ In our method, the forwarding behaviors are evaluated by the number of sending packets and the normal PLR.

In CLAIDS, a node sends 10 packets to each neighbor, and each neighbor returns 10 ACKs; therefore, in the worst case, the overall communication overhead is 20 ×N× (N − 1) ×C. In our method, a node sends 10 packets to each neighbor, and only one ACK is received for calculating the normal PLR; thus, the worst-case overall communication overhead is C×N× [10 × (N − 1) + (N − 1)].

The comparison of the maximum additional communication overhead with the number of clusters is manifested in Figure 13, in which we note that additional communication overhead decreases with more clusters, because the number of nodes in a cluster decreases as the number of clusters increases. The additional communication overhead of our method is lower than that of CLAIDS due to the sharp decrease in ACKs from neighbors.

Figure 13.

Comparison of the maximum communication overhead of DSFLACQ and CLAIDS.

The additional storage overhead of nodes for both methods is also evaluated. Each node maintains a table for each neighbor, including energy level (4 bytes), the number of forwarded packets (2 bytes), the number of received ACKs (2 bytes), the communication quality (4 bytes), and the threshold (4 bytes). In the worst case, the additional storage overhead for CLAIDS is 16 × (N − 1).

Based on the storage above, our method requires the addition of the normal PLR (4 bytes) and communication quality in the entire sliding time window (4m bytes, for window length m). Therefore, in the worst case, the additional storage overhead for our method is (20 + 4m) × (N − 1).

The comparison of the maximum additional storage overheads of DSFLACQ and CLAIDS is displayed in Figure 14. In this work, m was set to 5. The larger the value m, the better the comprehensive communication quality, and the higher the storage overhead. From Figure 14, we observe that the storage overhead decreases with more clusters, because the number of nodes in a cluster decreases as the number of clusters increases. However, the storage overhead of our method is higher than that of CLAIDS because of normal PLR and comprehensive communication quality. Therefore, the improvement of performance is at the expense of storage overhead.

Figure 14.

Comparison of the maximum storage overhead of DSFLACQ and CLAIDS.

Conclusion

In this article, we proposed a detection method for SFA and its special cases, including black-hole and on-off attacks. The method is based on LA with adaptive parameters using the comprehensive communication quality of nodes and the voting of neighbors. Communication quality was introduced to reflect the forwarding behaviors of nodes. It is divided into current and comprehensive communication quality, which, respectively, reflects short- and long-term forwarding behaviors. When evaluating communication quality, normal packet loss caused by unstable channels and MAC layer collisions is taken into consideration to eliminate its impact on SFA detection. The comprehensive communication quality of a node is obtained from its historical communication quality with a sliding time window with a dynamic aging factor, which is suitable for detecting continuous ordinary SFA and on-off SFA. A comparison of our proposed DSFLACQ method and CLAIDS was conducted in simulations. Experimental results demonstrate that the method we proposed performs better in detecting SFA and its related attacks; the detection rates exceed 90%, and particularly for on-off attacks, the detection rate reached 92%, 9 percentage points higher than other methods. Furthermore, the FPR and FNR of our method are 2 percentage points lower than other methods, and its communication overhead is 40% lower than other work, but it requires more storage overhead to track historical communication quality. Therefore, the performance improvement is at the expense of storage overhead.

Footnotes

Handling Editor: Vinod Kumar Verma

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported in part by the National Key R&D Program of China under grant number 2016QY06X1205, and in part by the National Natural Science Foundation of China under grant numbers U1536119 and U153610079.

ORCID iD

Zhihua Zhang

References

Zhang

Chen

Data gathering optimization by dynamic sensing and routing in rechargeable sensor networks. Sens Mesh Ad Hoc Commun Netw 2013; 24(3): 273–281.

Liu

An efficient heuristic subtraction deployment strategy to guarantee quality of event detection for WSNs. Comput J 2015; 58(8): 1747–1762.

Jiang

Liu

et al . Lifetime maximization through dynamic ring-based routing scheme for correlated data collecting in WSNs. Comput Electr Eng 2015; 41(1): 191–215.

Butun

Morgera

Sankar

A survey of intrusion detection systems in wireless sensor networks. IEEE Commun Surv Tut 2014; 16(1): 266–282.

Ren

Zhang

et al . Adaptive and channel-aware detection of selective forwarding attacks in wireless sensor networks. IEEE T Wirel Commun 2016; 15(5): 3718–3731.

Khan

Yang

Aalsalem

et al . Comprehensive study of selective forwarding attack in wireless sensor networks. Int J Comput Netw Inf Secur 2011; 3(1): 1–10.

Liu

Dong

Ota

et al . PHACK: an efficient scheme for selective forwarding attack detection in WSNs. Sensors 2015; 15(12): 30942–30963.

Fathinavid

Ansari

CLAIDS: cellular learning automata based approach for anomaly nodes detection in clustered mobile ad hoc networks. Ad Hoc Sens Wirel Netw 2015; 29(1): 31–51.

Zhang

Lazos

Kozma

AMD: audit-based misbehavior detection in wireless ad hoc networks. IEEE T Mobile Comput 2016; 15(8): 1893–1907.

10.

Shakshuki

Kang

Sheltami

EAACK: a secure intrusion-detection system for MANETs. IEEE T Ind Electron 2013; 60(3): 1089–1098.

11.

Misra

Abraham

Obaidat

et al . LAID: a learning automata-based approach for intrusion detection in wireless sensor networks. Secur Commun Netw 2009; 2(2): 105–115.

12.

Misra

Abraham

Obaidat

et al . Intrusion detection in wireless sensor networks: the S-model learning automata approach. In: Proceedings of the international conference on wireless and mobile computing, networking and communications, Avignon, 12–14 October 2008, pp.603–607. New York: IEEE.

13.

Misra

Krishna

Abraham

A simple learning automata-based solution for intrusion detection in wireless sensor networks. Wirel Commun Mob Comput 2011; 11(3): 426–441.

14.

Fathinavid

Aghababa

Eraghi

et al . CADLA: an efficient cluster-based anomaly nodes detection for mobile ad-hoc networks: a learning automata approach. J Inform Assur Secur 2013; 8: 250–259.

15.

Fathinavid

Aghababa

. A protocol for intrusion detection based on learning automata in forwarding packets for distributed wireless sensor networks. In: Proceedings of the international conference on cyber-enabled distributed computing and knowledge discovery, Sanya, China, 10–12 October 2012, pp.373–380. New York: IEEE.

16.

Narendra

Thathachar

Learning automata: an introduction. Upper Saddle River, NJ: Prentice-Hall, 1989.

17.

Oommen

Misra

Chapter 12: Cybernetics and learning automata. In: Nof

(ed.) Handbook of automation. New York: Springer, 2009, pp.221–234.

18.

Thathachar

Sastry

Varieties of learning automata: an overview. IEEE T Syst Man Cy B 2002; 32(6): 711–722.

19.

Liu

Cerpa

AE.

Data-driven link quality prediction using link features. ACM T Sensor Netw 2014; 10(2): 37.

20.

IEEE Standard 802.11-2016. IEEE standard for information technology—telecommunications and information exchange between systems local and metropolitan area networks—specific requirements–part 11: wireless LAN medium access control (MAC) and physical layer (PHY) specifications, pp.1–51.

21.

Shila

Cheng

Anjali

Mitigating selective forwarding attacks with a channel-aware approach in WMNs. IEEE T Wirel Commun 2010; 9(5): 1661–1675.

22.

Liu

Yin

Leung

et al . FADE: forwarding assessment based detection of collaborative grey-hole attacks in WMNs. IEEE T Wirel Commun 2013; 12(10): 5124–5137.