Abstract
Wireless sensor networks face threats of selective forwarding attacks which are simple to implement but difficult to detect. It is difficult to distinguish between malicious packet dropping and the normal packet loss on unstable wireless channels. For this situation, a selective forwarding attack detection method is proposed based on adaptive learning automata and communication quality; the method can eliminate the impact of normal packet loss on selective forwarding attack detection and can detect ordinary selective forwarding attack and special cases of selective forwarding attack. The current and comprehensive communication quality of nodes are employed to reflect the short- and long-term forwarding behaviors of nodes, and the normal packet loss caused by unstable channels and medium-access-control layer collisions is considered. The adaptive reward and penalty parameters of a detection learning automata are determined by the comprehensive communication quality of the node and the voting of its neighbors to reward normal nodes or punish malicious ones. Simulation results indicate the effectiveness of the proposed method in detecting ordinary selective forwarding attacks, black-hole attacks, on-off attacks, and energy exhaustion attacks. In addition, the communication overhead of the method is lower than that of other methods.
Introduction
Wireless sensor networks (WSNs) have been widely used in various fields for the purpose of event monitoring and data gathering, including environment monitoring, forest fire monitoring, traffic data collection, and battlefield data gathering.1–3 However, many WSNs are deployed in harsh or hostile environments that are open and dangerous for the operation of the networks. Due to the openness of wireless communication and lack of physical protection, sensor nodes (SNs) can be easily captured or attacked by malicious attackers; hence, WSNs face various attacks. 4 One of the most challenging attacks is the selective forwarding attack (SFA). A node experiencing an SFA intentionally discards received packets that should be forwarded, to interfere with data transmission between nodes.5,6 This seriously affects data collection and data fusion, especially for data-centric networks. According to research,7,8 some attacks are special cases of SFA, such as black-hole attacks, under which a malicious node discards received packets with a probability of 100%, that is, the node drops all packets and prevents cooperation with other nodes in data transmission. On-off attacks are another type of SFA, under which the node alternatively behaves well and badly, to hide itself and avoid detection during the attack process. In addition, the inherent instability and unreliability of the wireless channel and collisions on the medium access control (MAC) layer inevitably result in normal packet loss, making it difficult to distinguish between normal and malicious packet losses. Therefore, the ability to discover an SFA is useful and necessary for the security, reliability, and energy efficiency of WSNs.
Related works
Some effective methods for SFA detection have been developed.5–15 The typical mechanisms include reputation-, acknowledgment-, and learning automata (LA)-based schemes.
Reputation-based detection techniques use reputation values to evaluate the forwarding behaviors of nodes during each evaluation period. An adaptive and channel-aware detection of SFAs was proposed in Ren et al., 5 which employed long-term reputation values composed of firsthand and secondhand short-term reputations to evaluate the forwarding behaviors of nodes. After a series of evaluation periods, the reputation values of malicious nodes reduce sharply; a node is denoted as malicious when its average reputation value evaluated by neighbors is lower than a threshold. 5 During the behavior evaluation process, the wireless channel and normal packet loss are taken into consideration to recognize malicious dropping. However, the method cannot efficiently detect on-off SFA attacks, because it does not consider the historical reputation values of a node. An audit-based misbehavior detection (AMD) method for selective packet droppers was put forward in Zhang et al., 9 which combined reputation management, trustworthy route discovery, and recognition of malicious nodes on the basis of behavior audits in ad hoc networks. The method also considers firsthand and secondhand reputations; a secondhand reputation is used only if a firsthand reputation is not available. The firsthand reputation evaluation of a node depends only on its last reputation with a multiplicative or additive factor according to the behavior audits, whereas a secondhand reputation evaluation relies on the secondhand information provided by a set of nodes within the last t0 epochs. Because of the priority of the firsthand reputation, it is not effective for detecting on-off attacks with SFA.
Acknowledgment-based detection techniques depend on acknowledgment packets received from other nodes to measure the number of packets reaching the next hop. An enhanced adaptive acknowledgment (EAACK) for mobile ad hoc networks (MANETs) was designed in Shakshuki et al., 10 which employed a hop-by-hop acknowledgment and an improved two-hop acknowledgment mechanisms (three consecutive nodes work in a group) to detect malicious nodes to reduce the overhead. The ACK is transmitted through the original data-delivery path. A per-hop acknowledgment (PHACK)-based scheme was proposed in Liu et al. 7 for WSNs to detect SFA in which each ACK was delivered along a path different from the original data transmission path. However, ACK-based methods cause a high overhead due to more ACKs transmitted.
LA-based detection techniques rely on the rewards and penalties of actions of LA to detect malicious behavior. There are two categories: sample LA-based11–13 and action LA-based.8,14,15
The LAID and SLAID mechanisms were proposed in Misra et al.,11,13 which applied the LA to packet sampling to detect attacks. The approaches examine the samples to discover malicious packets of any intruders rather than detect the behaviors of nodes. Due to resource constraints, it is not possible to examine all the packets passing through a node; thus, the LA adjusts the sampling rate automatically to balance the detection rate and the WSN energy consumptions. These approaches detect malicious packets according to key characteristics, and they are suitable for both external and internal attacks. However, while they are effective for detecting malicious traffic or packets, they are not effective for detecting selective forwarding behaviors.
An anomalous node detection system based on LA for MANETs called CLAIDS was proposed in Fathinavid and Ansari 8 ; it consists of four modules: a data collection module (DCM), a data transmission quality (DTQ) module, a cluster aggregation and fusion module (CAFM), and an intrusion response module. The system determines the threshold of a suspect node by considering the stability of the nodal behavior for packet forwarding and energy level. The action probability of an LA is rewarded or penalized depending on the voting results from its neighbors. Another protocol for SFA detection based on LA in WSNs was put forward in Fathinavid and Aghababa, 15 which used the number of ACKs of neighbors and the energy levels of nodes in the network to determine the feedback to the LA from the environment. The reward and penalty parameters are adaptive for different occasions. Both systems can detect black-hole attacks, packet-dropping attacks, and malicious flooding attacks.
However, there are some aspects to these contributions that need improvement: (1) the initialization of action probability considers only the energy level, which may lead to a next hop far from the cluster head (CH); (2) the evaluation of forwarding behavior relies on acknowledgements from the neighbor nodes, which does not consider the losses of acknowledgment packets caused by channel conditions or malicious dropping, and thus may result in statistical error; (3) on-off attacks with SFA cannot be detected effectively because the historical communication quality of nodes is not considered; and (4) the reward and penalty parameters of LA in Fathinavid and colleagues8,14 are fixed with artificial assignments depending on the voting of neighbors, which could affect the flexibility of the system.
Therefore, it is necessary to design a flexible SFA detection mechanism with adaptive parameters that takes the normal packet loss caused by unstable channels and MAC layer collisions into account to eliminate the impact of normal packet loss on SFA detection. Furthermore, it could detect multiple SFAs, including ordinary SFAs and typical special cases such as black-hole attacks and on-off attacks. Finally, energy exhaustion attacks should also be considered to maintain the network lifetime.
Major contributions
Based on the above analysis and existing problems, we propose an SFA detection method using LA with adaptive reward or penalty parameters depending on nodal comprehensive communication quality. During the detection process, the action probability initialization of LA is improved by introducing a new consideration. While evaluating the communication quality and the forwarding behavior of a node, the wireless channel conditions are taken into account to estimate normal packet loss, thereby allowing to eliminate the impact of normal packet loss on SFA detection. In addition, the method can detect continuous SFA and on-off attack behaviors effectively.
The major contributions and innovations of this work are as follows:
The initial route based on the initial action probability of each LA is more accurate and effective. When initializing the action probability, the hop count from a node to its CH is a new consideration. Both the energy levels of nodes and their hop counts to CHs are taken into account when initializing the action probability of each LA for next-hop routing; this improves the routing accuracy and effectiveness compared to equal probabilities and a probability that considers only energy level.
The evaluation of node forwarding behavior is more objective and practical. Wireless channel conditions and normal packet loss are taken into account to evaluate the forwarding behavior of an intermediate node, which is closer to the actual application. In addition, the evaluation is from the perspective of the sender, rather than that of the receiver, avoiding statistical errors caused by packet loss or by a receiver that drops the acknowledgment packets deliberately.
The proposed method can detect on-off attacks more effectively. The comprehensive communication quality of nodes is introduced to detect malicious behavior; this considers the historical communication quality of nodes by means of sliding time windows with dynamic aging factors. Under these circumstances, bad behavior will be remembered for a longer time. Thus, malicious nodes suffering from on-off attacks can be detected effectively.
The flexibility and applicability of the detection method have been improved. The adaptive reward and penalty parameters of LA are adjusted according to nodal comprehensive communication quality or voting by neighbors rather than fixed by artificial assignment; this increases the learning speed of the LA and improves its flexibility.
The remainder of this article is organized as follows. Section “LA concept” presents the concept of LA. A network model and assumptions are described in section “System model and assumptions.” Next, the method for SFA detection is proposed in section “SFA detection based on adaptive LA and communication quality.” Section “Simulation experiment and performance evaluation” describes a simulation and evaluates its results. Finally, section “Conclusion” presents the study’s conclusions.
LA concept
An LA is a self-learning model that obtains knowledge from the execution of an action and feedback from the environment and then decides the future action that should be taken. The working process of an LA is as follows: (1) the automaton randomly selects an action from its finite set of actions to execute in the environment, (2) the environment evaluates the action of the automaton and returns positive or negative feedback to it, and (3) the feedback rewards or penalizes the automaton, and then the automaton updates its action probability and selects the next action. The three steps above are looped to obtain an optimal action. Therefore, the LA consists of three components: the automaton, the environment, and the reward or penalty principles, as shown in Figure 1. More details of LA can be found in the classic work of Narendra and Thathachar 16 and the book by Oommen and Misra. 17

Structure of a learning automaton.
Figure 1 describes the relationship of the three components. According to Narendra and Thathachar, 16 an LA can be classified into two categories: a fixed-structure stochastic LA and a variable-structure stochastic LA; the latter is considered in this article. The definition of a variable-structure LA is a quadruple {α, β, p, T}, where α = {α1, α2, …, αr} represents the finite set of actions of the LA, β = {β1, β2, …, βr} is the input set of LA from the environment, p = {p1,p2, …, pr} denotes the set of action probabilities in LA, and T indicates the learning algorithm to update the action probabilities, that is, p(n + 1) = T [αn, βn, pn], where n represents any instant.
The definition of the environment is a triple {α, β, c}, where α = {α1, α2, …, αr} represents a finite input set for environment, β = {β1, β2, …, βr} is the output set of the environment, and c = { c1, c2, …, cr} denotes the penalty probabilities where each ci corresponds to an action αi. According to Figure 1 and the description above, we can see that the working process is the set of interactions between an LA and its environment, where the input (output) set of an LA is the output (input) set of the environment. Therefore, there are two identical sets α and β for the LA and the environment.
At instant n, the LA selects an action αi from the finite set of actions with the probability pi and performs it in the environment. The environment evaluates the action and returns a feedback βi; then, the LA updates its action probability set for the next instant n + 1 according to equations (1) or (2) for favorable or unfavorable feedback, respectively
where a and b are the reward and penalty parameters, respectively, and r is the total number of actions in the finite action set. Following Thathachar and colleagues,16,18 for a = b, the LA is called the reward–penalty algorithm (LR–P); for a << b, it is called the reward–ε penalty algorithm (LRεP); and for b = 0, it is called the reward–inaction algorithm (LR–I).
System model and assumptions
This section describes the network model, and the assumptions are demonstrated in the model.
A clustered WSN for collecting observation data from the environment consists of SNs, CHs, and a base station (BS), as shown in Figure 2. Once the network is randomly deployed in the environment, the positions of the members are stationary. The network is divided into different clusters, and nodes are allocated into corresponding clusters. Each cluster has a CH and several SNs. The selection of the CH employs an LA-based method depending on the residual energy and the number of neighbor nodes; for details, refer to Fathinavid et al. 14 After clustering, each node belongs to a unique cluster. The SNs transmit data to their respective CHs directly or through other SNs in the same cluster, and CHs communicate with the BS directly or indirectly through other CHs in the ranges of their radios. Due to the openness of wireless communication, the radio environment is unstable in the network, and packet loss is common in communications between members. The packet loss rate (PLR) constantly changes because of the unstable channel. Each member in the network possesses a unique ID. We assume that no attack has occurred between network deployment and initial route establishment.

Model of a clustered WSN.
Figure 2 displays a clustered WSN. An intra-cluster network consists of a CH and all SNs in the same cluster, and a higher level backbone network comprises the BS and all the CHs.
SFA detection based on adaptive LA and communication quality
This section describes in detail the detection of SFA based on adaptive LA and communication quality (DSFLACQ). The proposed method is based on LR–P LA in each node of the network. The action set of an LA consists of its neighbor nodes; the environment is the comprehensive communication quality of a selected neighbor node and the voting of its neighbors. The details of process of the method include three phases: (1) initialization of the network and action probabilities in each LA, (2) evaluation of the communication quality of the neighbor nodes, and (3) detection and punishment of selective forwarding behaviors of nodes.
The method is conducted separately in the intra-cluster networks and backbone network to discover malicious SNs and malicious CHs. The description below applies to the intra-cluster network; the method is almost identical in the backbone network.
Initialization of the network and LA
The initialization phase includes the network initialization and initializations of the action probabilities of each LA.
Network initialization
After the network deployment is complete, the BS generates a “neighbor discovery” message throughout the network. Each node receiving the message rebroadcasts it to its neighbors and establishes the neighbor list. The message consists of a unique node ID and energy level. Then, the clustering process starts, conducted using an LA method described in Fathinavid et al. 14 According to the energy levels and number of neighbors of nodes, the selection of CHs is executed to ensure the CHs possess more energy and to avoid sparse clusters. When the construction of clusters is finished, the CH broadcasts a message to update the neighbor lists of nodes in each cluster, and each node obtains the hop count to its corresponding CH. The next hop of routing is decided based on the probabilities of neighbor nodes, which will be demonstrated below.
Action probabilities in LA initialization
The action set of the LA in each node is its neighbors, from which the next hop can be selected to establish the local routing table. The set of probabilities {p1, p2, …, pr} in each node is constructed according to the features of neighbors, where r represents the number of neighbors and pi denotes the probability of selecting neighbor node i as the next hop. In previous work such as Misra et al., 11 the initialization of the probabilities was to assign a common value to ensure equal probabilities, that is, pi = 1/r. To improve the routing accuracy in this work, we calculate the preference of the next hop according to the energy level and the hop count to the CH as follows
where pi represents the probability of selecting neighbor node i, energyleveli is the residual energy of neighbor node i, hopi denotes the hop count from the neighbor node i to the CH, r indicates the number of neighbor nodes, and w1 and w2 between 0 and 1 weight the relative importance of energy level and hop count, respectively. The reason for taking the energy level and hop count into consideration is to maximize the probability of nodes with more energy and fewer hops to CH. If we considered only energy level, the next hop might be selected as one “far” from the CH.
Evaluation of the communication quality of nodes
The communication quality of a given neighbor node is the basis for favorable or unfavorable feedback for its LA action probability. The current communication quality and the comprehensive communication quality of a node are considered in this work, where the former is the communication quality of a node at the current instant and the latter is the communication quality of a node that takes its historical quality into account. A sliding time window is introduced to select several recent instants to calculate the comprehensive communication quality.
Evaluation of current communication quality
The current communication quality of a node is evaluated according to its forwarding behaviors and energy deviation rate.
Forwarding behaviors are often measured by the packet forwarding rate of an intermediate node, which is usually defined as the ratio of the number of forwarded packets of a node to the number of its received packets. For example, if a node S delivers packets to node D through node F, the packet forwarding rate of node F is evaluated by node S according to the above definition. The number of forwarded packets of node F is obtained by monitoring of node S, and the number of packets received by F is acquired by the number of received acknowledgment packets from F.8,14 However, this assumes that the channel and the communications are stable, that is, normal packet loss is not considered. If some acknowledgment packets are lost, the number of received packets obtained by node S will not be accurate, and thus the packet forwarding rate of node F is not reliable. In addition, if node F deliberately discards some acknowledgment packets, the evaluation of its packet forwarding rate by node S will not be credible.
Therefore, considering the communication channel conditions, the forwarding behavior of an intermediate node should be evaluated from the perspective of its last source node. The forwarding rate of a node j denoted as FRj is calculated by its last source node i as follows
where NFj represents the number of forwarded packets of node j monitored by node i, NSi denotes the number of packets sent from node i to j for forwarding, and pij indicates the estimated normal PLR between nodes i and j, which will be detailed below. The min() function avoids a numerator greater than the denominator. In this way, the method can evaluate malicious packet loss behaviors.
In WSNs, the instability of the physical layer wireless channel and collisions in MAC layer are two main reasons for normal packet loss. The instability of the physical channel is expressed as the fluctuation of wireless link quality with time and location. Therefore, link quality is a main factor to estimate packet loss. According to Liu and Cerpa,
19
there is a close correlation between the link quality and the physical (PHY) parameters including received signal strength indicator (RSSI), link quality indicator (LQI), and signal-to-noise ratio (SNR) which are captured from the received packets and environment by radio chips, such as the CC2420. The PLR can be estimated based on the PHY parameters; we adopted LQI and SNR for calculations because RSSI reduces the correlations between PLR and SNR/LQI significantly. Therefore, the PLR caused by channel conditions from node i to j denoted as
Packet transmission is based on the IEEE 802.11 distributed coordination function (DCF) MAC protocol, 20 and packet loss may occur due to collisions in the MAC layer. The number of neighbors of a node (denoted as r) is the number of nodes competing for the wireless channel. Let pt be the probability that a node transmits data in a time slot. When the channel is stable, the probabilities of an idle, successful, and colliding slot denoted as pidle, psuc, and pcol, respectively, are as follows 21
The channel busyness ratio expressed as Cb is defined as the proportion of time that the channel is in successful or collision status, which can be calculated as
where σ, Tsuc, and Tcol indicate the duration of an idle slot, a successful transmission slot, and a collision slot, respectively. 20
The number of interfering neighbor nodes is r − 1 while competing for the channel, and the normal PLR caused by MAC collisions denoted as
The channel busyness ratio Cb can be expressed by pt according to equations (5) and (6), which is easy for a node to acquire by monitoring;
21
therefore, pt is available to calculate
Considering the channel conditions and the MAC layer collisions, the normal PLR from node i to j is estimated as
The energy deviation rate is another important factor to evaluate the quality of communication. A typical energy consumption model introduced in Liu et al. 7 is employed in this work. An energy deviation rate denoted as ER is defined to measure the quality of node j as follows
where energylevelj represents the actual residual energy of a node j, Ienergyj denotes the initial energy of node j, and Ejs(l) indicates the estimated energy consumption of node j while sending l packets. The ratio of actual residual energy to the estimated energy is used to evaluate the quality of the node in energy.
Finally, the current community quality of a node j denoted as qj is obtained by combining the evaluation of forwarding behavior and energy deviation rate of j as follows
where qj∈ [0, 1], and η1 and η2 weight the importance of forwarding behavior and energy deviation rate, respectively.
Comprehensive communication quality evaluation
The historical quality of a node should be considered to evaluate the comprehensive quality to resist on-off attacks with SFA, because the current communication quality is a short-term quality that focuses only on the current instant. Therefore, a sliding time window with a dynamic aging factor is taken into consideration during the evaluation. A sliding time window ΔT consists of a number of time units, and the current quality of a neighbor node in each time unit is recorded. The time window moves forward 1 unit as a time unit consumption and the quality in the first unit is discarded with the latest quality value joining in. An example of a sliding time window with 5 units is shown in Figure 3. In the first unit of time window Δt1, the current quality of a node was q1. As time elapses, q1 is dropped from the new time window Δt2, and q6 is added simultaneously.

Sliding time window with 5 units.
An aging factor 22 is introduced to indicate the importance of each time unit in the sliding time window; the closer the time unit to the latest current time, the more important it is in the evaluation of the comprehensive communication quality. The comprehensive communication quality Qj of a node j is evaluated as follows
where Qj∈ [0, 1], m is the length of the window,
Detection and punishment of selective forwarding behaviors
After the LA of node i selects node j as the next hop according to the action probability, the environment provides a response to the LA according to the comprehensive communication quality of node j evaluated by node i or the voting of its neighbors. Node j is regarded as suspicious while its comprehensive communication quality Qij evaluated by node i is lower than a threshold QT, which can be predefined based on the network requirements. If node j is suspicious, node i will report this to the CH which can initiate a voting process for node j according to the comprehensive communication quality evaluated by each neighbor that interacts with it. Otherwise, this action in LA will be rewarded with a = 0.6 according to the LR–P model.
The malicious node confirmation and the dynamic reward or penalty parameters are decided by the result of voting process. A normal proportion denoted as Rn is defined as the ratio of the number of nodes that considers the suspicious node to be normal to the number of nodes that participates in the voting. The decision is made as follows:
If Rn < 20%, more than 80% of the nodes participating in the voting regard the suspicious node as malicious; therefore, the suspicious node will be isolated and eliminated from routing. Furthermore, the CH notifies all neighbors of this malicious node to add it to their blacklists. This action probability in each LA is set to 0, and the LA that selected this malicious node as the next hop will reselect the next hop according to the action probability.
If 20% ≤ Rn < 70%, more than 30% and fewer than 80% of the participating nodes consider the suspicious node to be malicious; the environment will give an unfavorable response to the LA, that is, β = 1, and this action in LA will be penalized with a dynamic parameter b = (1–Rn)/2 according to LR–P model.
If Rn ≥ 70%, fewer than 30% of the participating nodes regard the suspicious node as malicious, and the environment will give favorable feedback to LA, that is, β = 0. This action in the LA will be rewarded with a dynamic parameter a = Qem×Rn according to the LR–P model, where Qem represents the mean value of the comprehensive communication quality of the suspicious node evaluated by the voting nodes.
Algorithm 1 summarizes the procedures of SFA detection and updating the action probabilities in LA.
Simulation experiment and performance evaluation
This process was simulated experimentally, and the performance of the proposed method is evaluated in this section. We describe the experimental settings and simulated attacks, declare the performance evaluation metrics, and reveal the results of the evaluation; in addition, we compare the proposed method with the LA method CLAIDS proposed in Fathinavid and Ansari. 8
The evaluation of the proposed method is mainly of two aspects: performance and overhead. The performance evaluation includes the detection rate, false positive rate (FPR), and false negative rate (FNR), which are detailed in sections “Performance evaluation metrics” and “Experimental results and analysis”; the overhead evaluation considers communication overhead and storage overhead illustrated in section “Experimental results and analysis.”
During the evaluation, we first determined the key parameters of the method based on their effects on the detection rate, such as η1 and η2 in equation (9), and the detection threshold. Then, CLAIDS 8 and our method are compared and analyzed in terms of the detection rate, FPR, and FNR. Finally, the communication and storage overheads of the two methods are compared and analyzed.
Experiment description
The evaluation of the proposed method was conducted on OMNeT++. The members of a clustered WSN were divided into three categories: SNs, CHs, and the BS; all members were stationary and uniformly distributed. The area of the network was set according to the number of nodes, that is, there was a node deployed in an area of 50 m × 50 m on average, and the BS was located at the center of the area. The communication range of a node was 80 m. The weights of the initial LA action probabilities in equation (3) was set to w1 = 0.4 and w2 = 0.6 to obtain accurate routing. The malicious nodes were simulated according to the attacks described in subsection “Simulated attacks” of section “Simulation experiment and performance evaluation” and the proportion of malicious nodes was set randomly from 10% to 50%. Different attacks could occur on a node simultaneously, and the probability of packets dropping in ordinary SFA was set randomly from 20% to 80%. The unstable wireless channel was modeled by Gilbert. 22 The length of the sliding time window for comprehensive communication quality evaluation was set to 5 in this work. The simulation time lasted for 1000 s. The simulation parameters and the range of values used in the experiments are listed in Table 1.
Simulation parameters and the range of values in experiments.
WSN: wireless sensor network; LA: learning automata; SFA: selective forwarding attack.
Simulated attacks
In this article, we focus on SFAs that are simple to implement but difficult to discover. Furthermore, there are some attacks associated with SFA, for example, black-hole attacks and on-off attacks (the node alternatively behaves well and badly). In addition, energy exhaustion attacks were considered. These four attacks were simulated in the experiments.
Ordinary SFA
The node discards received packets deliberately with a certain probability to interfere with the communications between nodes. For example, the probability of a random packet discard could be set in the range of 20%–80%.
Some predefined malicious nodes were deployed randomly in the network to simulate ordinary SFA. For ease of implementation, we set the probability of malicious discarding to 50%–70%. The selective forwarding behavior was conducted by the preset counter, for example, the node forwards one received packet based on routing table and then discards the next two received packets. According to the value of the counter, the node determines whether to forward or discard the received packet.
Black-hole attacks
This is a special case of SFA in which all the received packets are dropped to prevent the node from transmitting data.
The implementation of black-hole attacks relies on deploying some black-hole attack nodes in the network simulation scenario. The main difference between a normal node and a black-hole attack node lies in the routing process at the network layer. The model of the routing process of a black-hole attack node is established as follows:
The node does not send a routing request;
The node does not transmit packets;
For all routing-request messages received from other nodes, the node replies with the response message “I have the shortest path to the destination node”;
The node discards all received packets without forwarding.
On-off attacks with SFA
A node under this attack behaves alternatively well and badly to conceal its misbehavior. In our simulations, the on-off attacks were simulated as a node behaving badly on packet forwarding for at least 1 time unit in each sliding time window.
On-off attacks were simulated by changing the route executor of the node, adding a timing function. When the node started, it executed its normal routing (normal routing mode) and timing programs; after 4 time units, the node switched to the routing mode of ordinary SFA for 1 time unit. According to the timing program, the route executor switches between the two modes. To give a special example, the node could sleep for 1 time unit every 4 time units to simulate a special on-off attack.
Energy exhaustion attacks
The malicious node prevents data transmission by exhausting the resources and energy of another node to damage the network. A node under this attack cannot transmit packets normally and its energy decreases sharply.
In the simulation, some malicious nodes were deployed in the network to exhaust the resources and energy of other nodes. The energy consumption of an SN is extremely large when it keeps in a communication state. Therefore, the malicious node constantly makes and initiates communication requests to the target node and sends a large number of fake messages, consuming the target’s energy and occupying its communication processing capability.
Performance evaluation metrics
We evaluated the performance of the method in terms of three metrics: detection rate, FPR, and FNR.
The detection rate is defined as the ratio of accurate detections to all detections. The FPR is denoted as the ratio of the number of normal nodes recognized as malicious to the actual number of normal nodes. The FNR is the ratio of the number of malicious nodes recognized as normal to the actual number of malicious nodes.
Experimental results and analysis
Here, we describe the performance and overhead evaluation and analysis, and compare two methods.
Key parameter analysis
The effects of η1 and η2 parameters in equation (9) for communication quality evaluation should be analyzed in the simulation, because they affect the detection rates of different attacks. Figures 4 and 5 demonstrate the detection rates of different attacks for different values of η1 and η2.

Detection rate of different attacks with η1 = 0.7 and η2 = 0.3, 30% of nodes were malicious, and the packet-dropping rate was 50%.

Detection rate of different attacks with η1 = 0.3 and η2 = 0.7, 30% of nodes were malicious, and the packet-dropping rate was 50%.
The simulated attacks can be classified into two categories: behavior-related attacks and energy-related attacks. The former includes ordinary SFA, black-hole attacks, and on-off attacks; the latter refers to our simulated energy exhaustion attacks. In the simulations, the proportion of malicious nodes was set to 30%, and the SFA packet-dropping rate was set to 50%. In both situations, the detection rate decreased with the increase in the number of nodes. In Figure 4, however, the behavior-related attacks are detected more accurately with the parameters η1 = 0.7 and η2 = 0.3, which means that forwarding behaviors are more influential than energy consumption during the process of communication quality evaluation. Therefore, the detection rates of behavior-related attacks are higher than those of the energy-related attacks on the whole, and vice versa. In Figure 5, the detection rates of energy-related attacks are higher than those of others because more emphasis is placed on the energy consumption in communication quality evaluation, that is, η1 = 0.3 and η2 = 0.7. The parameters can be adjusted according to the detection requirements; we used η1 = 0.7 and η2 = 0.3 in subsequent simulations due to more emphasis on forwarding behaviors.
The detection performance is determined by the malicious detection threshold, and the comprehensive communication quality is taken as the malicious detection threshold in our method. If the comprehensive communication quality of a node is lower than the threshold, it will be recognized as a suspect node that should be reported for a further decision with the voting process. The purpose of the detection threshold is to maintain the FPR and FNR at a lower level. In the simulation, the parameters were set as follows: the number of nodes was set to 150, the proportion of malicious nodes was set to 30%, η1 = 0.7, and η2 = 0.3. Figure 6 depicts the trends of FPR and FNR with different detection thresholds, from which we can see that FPR increases, whereas FNR decreases with the increase in the threshold. Moreover, FPR increases significantly in the threshold range of [0.7, 1] and FNR decreases sharply in the threshold range of [0.3, 0.8]. The reason is that the comprehensive communication quality values of most normal nodes are in the range of [0.7, 1], whereas that of the vast majority of malicious nodes are in the range of [0.3, 0.8]. The intersection of two curves is the target of lower FPR and FNR, from which we can infer a reasonable threshold to be approximately 0.72.

Trends of FPR and FNR with different detection thresholds.
Comparison of performance and analysis
We compared the performance of our method with the CLAIDS intrusion detection mechanism: cellular LA-based approach for anomaly nodes detection, 8 which is described in section “Introduction” and is similar to our work.
We compared the malicious detection rates of different attacks under the same conditions: 100 nodes, 30% malicious nodes, 60% SFA packet-dropping rate, and weight parameters η1 = 0.7 and η2 = 0.3. Figure 7 displays the malicious detection rates of four attacks, from which we can observe that our method is better than CLAIDS in detecting ordinary SFA and black-hole attacks. This improvement is observed because of the improved evaluation of forwarding behavior, and not relying on the number of ACKs. An additional reason is the employment of adaptive reward and penalty parameters in the LA. Our method is slightly weaker than CLAIDS in detecting energy exhaustion attacks, because we consider only the energy consumption of sending packets in evaluating the energy deviation rate. However, our method has an advantage in detecting on-off attacks, the detection rate of which was 92% whereas that of CLAIDS was 83%. This is because of the adoption of comprehensive communication quality to detect malicious behaviors, which considers the historical communication quality of a node.

Comparison of detection rates of different attacks using DSFLACQ and CLAIDS.
The difference between the comprehensive and current communication quality of a node in the process of detection is demonstrated in Figure 8. Three cases were simulated in our experiments: (1) a normal node with no attack, (2) a node under on-off attacks with SFA (packet-dropping rate was 60%) that behaves badly for 1 unit in a sliding time window, and (3) a node suffering from SFA with a continuous packet-dropping rate of 60%. We discovered that the comprehensive and current communication quality of a node are quite similar in the first and the third cases (refer to the left and right areas of Figure 8), which means that either quality measures can be used for malicious detection. However, in the second case, there are significant differences between current communication quality and comprehensive communication quality. When a node is attacked at an instant, both quality measures reduce sharply. If the node returns to its normal state for the next instant, its current communication quality will recover to a normal level immediately; however, its comprehensive communication quality will remain at a lower level rather than recover instantly (refer to the middle area of Figure 8). Therefore, comprehensive communication quality is more effective than current communication quality for detecting on-off attacks, which is why our method has an advantage in this case.

Difference between comprehensive and current communication quality.
A comparison of the overall detection rates of the two methods is depicted in Figure 9, which indicates that the detection rates of our method and CLAIDS decrease from 97.1% to 85.9% and from 96% to 85.5%, respectively, as the number of nodes increases. We note that the overall detection rate of our method is higher than that of CLAIDS because of the better performance of our method in detecting behavior-related attacks described in Figure 7, including its improvement of evaluating forwarding behavior and adaptive parameters in the LA.

Comparison of overall detection rates of DSFLACQ and CLAIDS.
Figures 10 and 11 manifest the false positive and negative rates of the DSFLACQ and CLAIDS methods with the increase in the number of nodes, from which we observe that the FPR of our method varies from 2% to 13.9%, whereas that of CLAIDS varies from 3% to 15%. The FNR varies from 1.2% to 15.5% for DSFLACQ and from 1.3% to 16.75% for CLAIDS. For both FPR and FNR, our method performs better than CLAIDS because we take comprehensive communication quality as the detection threshold, and select the threshold that keeps the FPR and FNR lower.

Comparison of false positive rates of DSFLACQ and CLAIDS.

Comparison of false negative rates of DSFLACQ and CLAIDS.
The LA reward and penalty parameters of our method are dynamic and adaptive, depending on the quality of the selected node or the voting of the node’s neighbors. The adaption parameters are effective for learning efficiency. The update of action probability in an LA with adaptive and fixed parameters is indicated in Figure 12. Taking three actions as examples, p1, p2, and p3 are probabilities of three actions, respectively. If action 1 is penalized from instant 1, the LA with fixed parameters will receive the optimal choice at instant 5, whereas the LA with adaptive parameters will receive the optimal choice at instant 3, which improves the efficiency.

Comparison of action probability in an LA with adaptive and fixed parameters.
Comparison of overhead and analysis
The additional communication and storage overhead needed for malicious detection are conducted in this section. We also compare the overhead of our method with that of CLAIDS.
Suppose there are C clusters in the network, and each cluster has N nodes. A node sends 10 packets in each time interval, and in the worst case, each node has N − 1 neighbors. In CLAIDS, the forwarding behaviors are evaluated by the number of forwarding packets and ACKs from the neighbor nodes. 8 In our method, the forwarding behaviors are evaluated by the number of sending packets and the normal PLR.
In CLAIDS, a node sends 10 packets to each neighbor, and each neighbor returns 10 ACKs; therefore, in the worst case, the overall communication overhead is 20 ×N× (N − 1) ×C. In our method, a node sends 10 packets to each neighbor, and only one ACK is received for calculating the normal PLR; thus, the worst-case overall communication overhead is C×N× [10 × (N − 1) + (N − 1)].
The comparison of the maximum additional communication overhead with the number of clusters is manifested in Figure 13, in which we note that additional communication overhead decreases with more clusters, because the number of nodes in a cluster decreases as the number of clusters increases. The additional communication overhead of our method is lower than that of CLAIDS due to the sharp decrease in ACKs from neighbors.

Comparison of the maximum communication overhead of DSFLACQ and CLAIDS.
The additional storage overhead of nodes for both methods is also evaluated. Each node maintains a table for each neighbor, including energy level (4 bytes), the number of forwarded packets (2 bytes), the number of received ACKs (2 bytes), the communication quality (4 bytes), and the threshold (4 bytes). In the worst case, the additional storage overhead for CLAIDS is 16 × (N − 1).
Based on the storage above, our method requires the addition of the normal PLR (4 bytes) and communication quality in the entire sliding time window (4m bytes, for window length m). Therefore, in the worst case, the additional storage overhead for our method is (20 + 4m) × (N − 1).
The comparison of the maximum additional storage overheads of DSFLACQ and CLAIDS is displayed in Figure 14. In this work, m was set to 5. The larger the value m, the better the comprehensive communication quality, and the higher the storage overhead. From Figure 14, we observe that the storage overhead decreases with more clusters, because the number of nodes in a cluster decreases as the number of clusters increases. However, the storage overhead of our method is higher than that of CLAIDS because of normal PLR and comprehensive communication quality. Therefore, the improvement of performance is at the expense of storage overhead.

Comparison of the maximum storage overhead of DSFLACQ and CLAIDS.
Conclusion
In this article, we proposed a detection method for SFA and its special cases, including black-hole and on-off attacks. The method is based on LA with adaptive parameters using the comprehensive communication quality of nodes and the voting of neighbors. Communication quality was introduced to reflect the forwarding behaviors of nodes. It is divided into current and comprehensive communication quality, which, respectively, reflects short- and long-term forwarding behaviors. When evaluating communication quality, normal packet loss caused by unstable channels and MAC layer collisions is taken into consideration to eliminate its impact on SFA detection. The comprehensive communication quality of a node is obtained from its historical communication quality with a sliding time window with a dynamic aging factor, which is suitable for detecting continuous ordinary SFA and on-off SFA. A comparison of our proposed DSFLACQ method and CLAIDS was conducted in simulations. Experimental results demonstrate that the method we proposed performs better in detecting SFA and its related attacks; the detection rates exceed 90%, and particularly for on-off attacks, the detection rate reached 92%, 9 percentage points higher than other methods. Furthermore, the FPR and FNR of our method are 2 percentage points lower than other methods, and its communication overhead is 40% lower than other work, but it requires more storage overhead to track historical communication quality. Therefore, the performance improvement is at the expense of storage overhead.
Footnotes
Handling Editor: Vinod Kumar Verma
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported in part by the National Key R&D Program of China under grant number 2016QY06X1205, and in part by the National Natural Science Foundation of China under grant numbers U1536119 and U153610079.
