Abstract
Due to the limit spectrum resource in the underwater acoustic networks, underwater cognitive acoustic communication is a promising technique. The channel sharing mechanism in cognitive networks can improve the communication capacity efficiently. Jamming attack is a common deny of service attack in cognitive networks. In the underwater cognitive acoustic networks, the anti-jamming problem is quite different from cognitive radio networks. It calls for an effective anti-jamming strategy in the cognitive acoustic channel access. In this article, we propose an online learning anti-jamming algorithm called multi-armed bandit–based acoustic channel access algorithm to achieve the jamming-resilient cognitive acoustic communication. The imperfect channel sensing and the constraints of underwater acoustic communication are considered in the anti-jamming game. Under different kinds of jamming attacks, the channel utilization can be improved with our jamming-resilient approach.
Keywords
Introduction
During the past decade, underwater wireless sensor networks (UWSNs) have attracted significant interests due to a variety of applications such as ocean environment monitoring, underwater target tracking, oceanography data collection. Underwater sensor nodes need to transmit the collected data in the underwater environment. Because the electronic wave decays quickly in the water, acoustic communication is the mainstream communication method in the UWSNs at current time. It calls for reliable and high data rate data transmission over an underwater acoustic channel. The attenuation of underwater acoustic signal is related to the frequency. In the underwater acoustic networks, the available communication frequencies are limited in a small range, which usually range from tens of hertz to hundreds of kilohertz. 1 Today, most underwater acoustic systems utilize the frequency band from 1 to 100 kHz, making the acoustic channel crowded.2–4 In the scenario that multiple underwater systems such as autonomous underwater vehicles (AUVs), underwater fleet, and UWSNs are deployed in the same area, the scarce acoustic spectrum should be shared among them. The limited underwater acoustic spectrum resource points out the requirement for efficient spectrum utilization schemes. Underwater cognitive acoustic network (UCAN) is an emerging technique for the underwater acoustic networks. 5 The idea of cognitive acoustic is brought from the cognitive radio networks (CRNs).6,7 In cognitive networks, there are primary users (PUs) and secondary users (SUs). SUs are permitted to sense which portion of the PUs’ spectrum is available and select the best available channel to access. Since there exists some states when the PUs are idle, some frequency bands might be vacant for considerable periods. SUs can utilize the resource of PUs at their vacant periods. By this way, the communication capacity can be increased. In the works,8,9 it is verified that UCAN can improve acoustic communication capacity. The UCAN will be a promising technique in underwater acoustic communications.
Although some proposed approaches8–10 have been shown to be able to bring gains for the users, the attackers are not considered in the channel allocation process. If the SUs are deployed in a hostile environment, there exist malicious attackers who try to prevent the efficient spectrum utilization of legitimate SUs. An attacker can interfere or jam the acoustic signal at the receiver by simply emitting a random signal in the water. 11 The interfered or jammed signal becomes unidentifiable at the receiver. Therefore, the channel allocation schemes which aim at maximizing the spectrum utilization may not perform well due to the jamming caused by attackers. Hence, it calls for an anti-jamming channel allocation scheme for UCAN to efficiently utilize the spectrum. Jamming attack is a denial-of-service (DoS) attack. There have been some traditional anti-jamming techniques such as direct-sequence spread spectrum (DSSS) and frequency-hopping spread spectrum (FHSS). 12 But these solutions are not suitable for cognitive networks. SUs can only access the free spectrum of PUs, and the availability of the spectrum is time varying. So the traditional spread spectrum communication techniques cannot be applied in cognitive networks.
Today, most of jamming-resilient methods for CRNs employ opportunistic channel sensing and utilization techniques.13–20 In the works,14–19 the opportunistic channel access strategy is designed through online learning algorithms. These algorithms need to know the all channels’ status and perfect channel sensing is assumed. The channel access strategy is made after the channel sensing. However, this requirement is difficult to be satisfied in the underwater acoustic channel sensing. Because of the long propagation delay, it takes a long time to sense all the channels. It is not efficient to sense all the channels in UCAN. The perfect channel sensing is also not practical in underwater acoustic communication. In Zhao et al. 21 and Ahmad et al., 22 SUs determine a sensing strategy and access the channels that are sensed idle. But the jamming attacks are not considered in Zhao et al. 21 and Ahmad et al. 22 The authors in Wang et al. 13 modeled the anti-jamming problem in cognitive networks as a multi-armed bandit (MAB) problem. The opportunistic channel sensing strategy is made through solving the MAB problem. But the false channel state detection, which is quite common in UCANs, is not considered seriously. The existing anti-jamming approaches for CRNs are not suitable in the underwater environment. It calls for a novel jamming-resilient approach for UCANs.
A multi-armed bandit–based acoustic channel access (MAB-ACA) algorithm is proposed in this article. The algorithm consists of two phases. In the first phase, we build the hidden Markov model (HMM) model of each channel, and the potential reward of each channel can be estimated based on the HMM model. In the second phase, SUs make the channel sensing and access strategy through online learning. Each underwater acoustic channel will be sensed with a given probability. For the channels prone to be jammed, small probability will be assigned. The probability is determined through online learning based on the current reward. Although this approach shows some similarities with the anti-jamming algorithms in the CRNs, the properties of UCANs are considered in our approach. The major contributions of this article can be concluded as follows:
We build the model of cognitive acoustic channel. Considering the channel state transition and imperfect channel sensing in the underwater environment, an HMM is employed to describe the underwater cognitive acoustic channel.
We combine the Baum–Welch algorithm and genetic algorithm in the HMM estimation process, which solves the sensitive dependence on initial values of the traditional Baum–Welch algorithm.
The posterior probability of a cognitive channel is deduced based on the HMM model.
We formulate the anti-jamming problem in UCANs as an MAB problem. A heuristic algorithm to solve such MAB problem is proposed. The algorithm can improve the channel utilization while considering the latent underwater jammers.
The article is organized as follows: in section “Related works,” we briefly describe the related works; in section “System and adversary model,” we introduce the system and adversary model; in section “Jamming-resilient opportunistic acoustic channel access,” a jamming-resilient opportunistic acoustic channel access is proposed; in section “MAB-based acoustic channel access algorithm,” a heuristic approach to solve the jamming-resilient acoustic channel access problem is proposed; the simulation results are shown in section “Performance evaluation and discussions”; in section “Conclusion,” we conclude the article.
Related works
The UCAN provides a solution to improve the spectrum utilization for the underwater acoustic networks. The authors in Baldo et al. 8 and Bicen et al. 9 proposed cognitive spectrum access approaches for underwater acoustic communications, and the communication efficiency can be improved with the opportunistic spectrum access protocol. However, the works8,9 do not consider the possible jammers in the underwater environment.
To defense the jamming attack, anti-jamming mechanisms in wireless communication have been extensively studied.13–20,23 In Wood and Stankovic, 23 FHSS and DSSS techniques are used in the underwater acoustic communications. These techniques are resistant to interference from jammers, but they are not directly suitable for UCANs. First, such schemes do not consider the influence of PUs, who are important participants in cognitive networks. Second, such schemes require a wide range of frequencies, which are not available in UCANs. In the works,13,16,17,20 the anti-jamming problem in cognitive networks is described as an MAB problem, and heuristic algorithms are proposed to solve the MAB problem. In Wang et al., 13 authors formulated the anti-jamming channel access problem in cognitive networks as a non-stochastic MAB problem. Authors also proposed a heuristic algorithm to solve the MAB problem. In Wang et al., 20 the same group of authors improved the theoretical performance analysis and experiments for the MAB-based channel access algorithm. Su and colleagues16,17 applied the MAB-based anti-jamming approach in the smart gird scenario. In the works,14,15,18,19 authors modeled the anti-jamming problem as a stochastic game. In Wang and colleagues,14,15 authors used the stochastic zero-sum game to describe the anti-jamming scenario in cognitive networks. Minimax-Q learning is employed to deal with anti-jamming game. In Singh and Trivedi, 18 authors replaced the minimax-Q learning with QV learning and state-action-reward-state-action learning for the anti-jamming stochastic game. In Yang et al., 19 authors studied the power control problem in wireless networks. They modeled the jamming defense problem as a Stackelberg game and derived the optimal transmission power.
Although the existing works can deal with the jamming attacks in cognitive networks, we cannot directly use these methods in the UCANs. The MAB-ACA algorithm, which is proposed in this article, is designed for the UCANs. The characteristics of underwater acoustic channel are considered in our approach. We use HMMs, instead of Markov chains, to describe the channel state transition. The channel state can be estimated through HMM. We formulate MAB problem for the jamming-resilient acoustic channel access game, and we solve the MAB problem based on the Exp3 algorithm. 24
System and adversary model
In this section, we introduce the model of UCAN and the possible adversaries existing in the underwater system. Some licensed channels of PUs may be available for SUs, and SUs utilize these channels to increase communication capacity. The jammers, on the other hand, jam the idle channels of PUs to block the communication of SUs.
System model
In UCANs, PUs may share their spectrum with SUs. The available acoustic spectrum of PUs will be partitioned into channels with equal bandwidth, and an underwater SU will be assigned to an idle channel. Although SUs may have their own spectrum, we focus on the utilization of PUs’ spectrum in UCANs. SUs are permitted to access the channels only when not interfering with PUs, and their spectrum utilization is not legally protected. Assume PUs own a spectrum consisting of

Markov model of channel state: (a) channel state transition and (b) HMM of cognitive acoustic channel.
The SUs seek spectrum opportunities in PUs’
The spectrum sensing can be performed in time, frequency, and code domains.10,25–28 The spectrum sensing is out of range of this article, but we should note that the false alarm and miss detection are common in UCANs. It is mainly caused by the imperfect acoustic spectrum sensing on SUs. Moreover, the long propagation delay may also lead to false sensing results. Since the signal of PUs suffers long propagation delay, the sensing results at SU side are always out-of-date. The delayed channel usage information increases the probability of false sensing. Although the channel sensing results are not reliable, they can still reflect the actual channel status. Such channel state transition and channel sensing process can be modeled as an HMM as shown in Figure 1(b). The channel state transition is a hidden Markov chain, whose states can be observed by SUs. The observed state of the
Different to CRNs, underwater nodes are usually equipped with single modem. An underwater acoustic SU can sense and access one channel each slot. The job of an SU is to choose which channel to sense and access the available channel based on the sensing result.
Adversary model
Jamming is a DoS attack which targets at disrupting communications at the physical and link layers. The jammers will sense the activity of the PUs and jam the idle channels. By injecting packets to the licensed channels, a jammer can prevent the legitimate users from accessing the available spectrum of PUs. In UCANs, PUs are protected by law and usually well physically protected. Due to the heavy penalty of being located and captured, the jammers will not jam the PUs’ communications.7,13,17 Instead, the SUs’ access to the PUs’ spectrum is not legally protected. A jammer can attack the SUs and prevent them from using the spectrum for communication.
Similar to legitimate users in UCANs, jammers are also equipped with single modem. So one jammer can only jam one channel each slot. In a UCAN, multiple jammers may be deployed to disrupt the communications. Due to the limitation of cost, we assume
The jammers will adopt an attacking strategy to prevent SUs from accessing the idle channels. In this article, a strategy means a channel sensing and access decision. Both SUs and jammers should choose their strategies in UCANs. A jammer or an SU will choose a channel to sense and access the idle channel. Such channel sensing and access action is called a strategy. The jammers can be classified into four types: 13
Under the deterministic channel sensing and access policy, an SU senses the channels in a fixed order. Once a channel is sensed idle, the SU will access the channel. Such policy is vulnerable to jamming attack, especially for adaptive jamming. The SUs need to choose an accessing strategy to alleviate the potential damage. By hopping among idle channels, the access pattern of SUs is not easy to be predicted, and the UCAN will be more resistant to the jamming attack.
Channel model
The attenuation of underwater acoustic channel can be modeled as a function of distance and frequency 1
where
In the underwater environment, there exist multiple kinds of noise. The ambient noise major consists of four components: turbulence, shipping, waves, and thermal noise. The power spectrum density (p.s.d.) of these noise components can be expressed as 1
where
We denote the p.s.d. of transmitted signal for the distance
where
Jamming-resilient opportunistic acoustic channel access
Channel state transition
At the beginning of a time slot, some channels are reserved by PUs for their communication in the current time slot. Then SUs can temporarily access the idle channels that belong to PUs. A certain gain can be achieved by an SU through utilizing the channels of PUs. The gain is a proper quality of service (QoS) measure in underwater acoustic networks. In this article, we use channel capacity to evaluate the gain achieved. The gain of channel
The availability of a licensed channel is only determined by the PUs. In cognitive acoustic networks, PUs are licensed users and their rights are protected by law. Only when the channel is idle can SUs or jammers access the channel. The state of the channels at time
In each slot, SUs and jammers will select some channels to sense at first. After observing the state of the licensed channels, both SUs and jammers will access the available channels. SUs try to utilize the channel to send data, and jammers try to block the communication of SUs. We consider a vector space
When
Opportunistic channel access
In each slot, three parties make their own actions. Since an underwater SUs want to utilize the PUs’ channels to transmit data, it needs to sense these channels and access them if available. In the channel sensing process, SUs cannot distinguish between PUs and jammers. As long as a channel is occupied, the SUs will skip it and choose another channel to sense. Current researches about anti-jamming CRN assume the perfect channel sensing,7,15–18 but it cannot be achieved in the underwater acoustic communication. Due to the long propagation delay, false alarm and miss detection cannot be avoided in CANs. We denote the false alarm probability by
where
where
However, due to the existence of the intelligent jammers, the actions of an SU at
where
The problem equation (14) is a stochastic game, and it is known that such a problem has a nonempty set of optimal policies, and at least one of them is stationary. 30 The problem (14) may be solved using its respective dynamic programming (DP) representations, 22 which can be described as
Here
Channel state estimation
While solving the DP problem (15), we should get the mean reward in each iteration. According to equation (11), three parameters
The Baum–Welch algorithm
31
is a common method to estimate
where
With the channel sensing sequence
Theorem 1
Building the HMM of a channel has time complexity
Proof
In the forward-backward algorithm,32,33 the time complexity to calculate
In the HMM build process, we needs to collect
However, the Baum–Welch algorithm cannot guarantee the global convergence. In the two-state HMM estimation process, there are a lot of local extremums. The final result of the Baum–Welch will always converge near the initial value. Without any priori knowledge, it is impossible to determine a proper initial value. To improve the convergence of HMM estimation, genetic algorithm can be employed for the global search.34,35 In this article, we combine the Baum–Welch algorithm with genetic algorithm for the two-state HMM estimation. Genetic algorithm implements the global model matching at first, and the Baum–Welch algorithm will use the result of genetic algorithm as the initial value.
Before applying genetic algorithm to solve the optimization problem, we should find a proper encoding method. For the HMM of cognitive acoustic channel
where
where
where
Once the parameters of the HMM are estimated, the probability that the
The posterior probability with channel sensing result is
MAB-based acoustic channel access algorithm
In this section, we will give the channel accessing strategy while considering the possible jammers. Recall the jamming-resilient acoustic channel access problem (14), finding the optimal solution needs to know the channel state and jammer’s actions at first. The channel state can be estimated through the Baum–Welch algorithm, but the jammer’s action remains unknown. For the static jammers and random jammers, we can further express the problem (14) as
because these jammers are independent to legitimate users’ actions. The myopic jammers and adaptive jammers, on the other hand, will change their strategies based on legitimate users’ actions. It is difficult for us to estimate the mean reward, even though the channel state is known. Therefore, the DP-based method (15) cannot be solved directly in jamming-resilient acoustic channel access problem.
An anti-jamming strategy for UCANs is to access idle channels to maximize the reward. In this article, we use an online learning approach to approximate the solution of (15). The algorithm solves an MAB problem to achieve jamming-resilient acoustic channel access. Each channel can be considered as an arm. Communication through each channel is associated with a sequence of rewards. The PUs’ actions and jammers’ actions will affect the reward. During the slot
where
where
In MAB problem, each strategy will be assigned a probability. The SU will choose a strategy based on the probability. In an UCAN with
In the existing researches, the reward will be counted only when the Ack is received through the channel. SUs can learn the properties of PUs and jammers through the reward. With the knowledge of the jammers’ actions, jamming-resilient approach can be designed for the jamming avoidance and channel capacity optimization. For the channel sensed, the conditional probability that the channel is idle can be described as
However, the state of channels that are not sensed is not evaluated due to lack of information. As a result, even though some channels are quite good for the SUs, they may not be covered by the algorithm during several iterations. Moreover, because of the imperfect spectrum sensing in the underwater acoustic communication, the sensing results are not so accurate. Therefore, direct using the Ack information to formulate the reward may not correctly reflect the channels’ availability. The idea of virtual channel reward can be employed to ensure that each channel is sensed sufficiently. We formulate the virtual channel reward as
The channels with low probability to be sensed will get higher reward. Such operation can improve the sampling sufficiency in the channel sensing process.
We denote the weight of the
The probability of strategy
where
The detail steps of our MAB-based acoustic channel access (MAB-ACA) algorithm are shown in Algorithm 1. Since MAB-ACA algorithm just gives a heuristic solution to the jamming-resilient acoustic channel access problem, it is expected that our strategy can track the optimal strategy. Theorem 2 shows the effectiveness of our MAB-ACA algorithm.
Lemma 1
For any
holds for any
Theorem 2
The normalized regret
Proof
We denote the upper bound of channel gain by
within
based on Lemma 1. Thus, the normalized regret satisfies
It converges to 0 at rate
Theorem 3
Algorithm 1 has time complexity
Proof
In Algorithm 1, we should build the HMM of each channel at first. As shown in Theorem 1, building the HMM of a channel has time complexity
For the
Performance evaluation and discussions
The channel utilization transition of PU can be described in Figure 1. The false alarm probability and miss detection probability in the channel sensing process are set to random value from 0.05 to 0.25. Four types of jammers are introduced in section “System and adversary model.” The static jammer is not considered in the simulations because it is too easy to be avoided by the SUs. Myopic jammer employs the myopic algorithm in Ahmad et al., 22 and adaptive jammer utilizes the algorithm in Wu et al. 15
The performance of jamming-resilient underwater cognitive communication is evaluated in terms of the normalized average throughput, cumulative distribution function (CDF) of expected time to achieve message delivery and jamming probability. The normalized average throughput is used to evaluate the spectrum utilization. Normalized throughput is the ratio of actual throughput to saturated throughput. We repeat the simulations for 1000 times and obtain the normalized average throughput. The CDF of expected time to achieve message delivery describes the probability when a message is successfully received.
Jamming probability is one metric to evaluate the anti-jamming performance in UCANs. Large jamming probability means the SUs are vulnerable to jamming attack. The jamming probability under three kinds of jammers is shown in Figure 2. Compared to random access, the jamming probability of MAB-ACA algorithm under myopic jammer is much smaller. Compared to myopic access, the jamming probability of MAB-ACA algorithm under myopic jammer and adaptive jammer is much smaller. It shows our approach has better anti-jamming performance in different scenarios. From Figure 2, we can also find the jamming probability will be smaller if there are more channels. For the random jammer, the jamming probability drops quickly when the channel number increases. For the myopic jammer and adaptive jammer, the influence of channel number is not so obvious.

Jamming probability under different approaches: (a) jamming probability under MAB-ACA algorithm, (b) jamming probability under random access, and (c) jamming probability under myopic algorithm.
Since the goal of cognitive acoustic network is to improve utilization of underwater channels, we use the normalized average throughput to evaluate the channel utilization in this article. High throughput means more data are transmitted through these channels. A good jamming-resilient algorithm in UCANs needs to achieve higher throughput. The normalized average throughput of the MAB-ACA algorithm is shown in Figure 3(a). As comparisons, the throughput of random channel access and myopic channel access is also plotted in Figure 3(b) and (c). For the random channel access strategy, the throughput of three kinds of jammers is similar. Such simulation result is obvious because random access actions will not be influenced by the jammers’ strategies. Under random jamming attack, myopic channel access gives a better performance than random jammer in terms of throughput. However, the throughput drops fast under myopic jamming and adaptive jamming. Compared to random channel access, the MAB-ACA algorithm can achieve higher throughput among all three kinds of jammers. The enhancement is quite obvious under myopic jammer. Compared to myopic channel access, the MAB-ACA algorithm achieves similar throughput under random jammer, but the throughput under myopic jammer and adaptive jammer is much higher. For the myopic jammer, the throughput of MAB-ACA algorithm is 7.83 times higher than myopic channel access. For the adaptive jammer, the throughput of MAB-ACA algorithm is 7.75 times higher than myopic channel access.

Normalized average throughput under different approaches: (a) the normalized average throughput under MAB-ACA algorithm, (b) the normalized average throughput under random access, and (c) the normalized average throughput under myopic algorithm.
Among three kinds of jammers, the throughput under myopic jammer is the highest because the myopic jammer always tries to maximize current reward. Such attacking strategy is easy to learn through our MAB-ACA algorithms. The throughput under random jammer and adaptive jammer is similar. It shows the anti-jamming game between SUs and adaptive jammers ends in a draw. The effect of adaptive learning strategy is just the same as the simple random access.
In UCANs, it takes a long time for an SU receiver to receive a complete message. The message delay performance is evaluated by analyzing the CDF of expected time to achieve message delivery. Assume a message is divided into 10 packets, Figure 4 shows the CDF of expected time to achieve message delivery. For our MAB-ACA algorithm, the whole message can be delivered with high probabilities within 40 time slots. When random or myopic jamming adopted, the CDF performance of random access is similar to MAB-ACA algorithm. Our MAB-ACA algorithm gives better CDF performance under adaptive jamming. When random jamming adopted, the CDF performance of myopic algorithm is similar to MAB-ACA algorithm. Our MAB-ACA algorithm outperforms myopic algorithm under myopic and adaptive jamming.

CDF of expected time to achieve message delivery under different approaches: (a) CDF of expected time to achieve message delivery under MAB-ACA algorithm, (b) CDF of expected time to achieve message delivery under random access, and (c) CDF of expected time to achieve message delivery under myopic algorithm.
In the anti-jamming game, it is possible that more than one jammer will participate to jamming the idle acoustic channels. Compared to one jammer scenario, the throughput of our MAB-ACA algorithm is lower. Such performance degradation will be more serious as the number of jammers increases. Figure 5 shows the performance degradation in the multi-jammer scenario. The throughput drops a lot when multiple random jammers exist. For the myopic jammers and adaptive jammers, such performance degradation is not so obvious. The reason lies in the similar actions in the myopic jamming and adaptive jamming strategies. Since this kind of jammers hope to maximize their jamming effect, the actions of different jammers may be identical. As a result, our MAB-ACA can learn the actions of different jammers in a same way, which impairs the effect of multiple jammers.

Normalized average throughput of multiple jammers.
In UCANs, the number of channels that the SUs can access also affects the throughput performance. Figure 6 shows that the normalized average throughput will increase when there are more channels. In the anti-jamming game, the SUs hop between different channels to avoid jamming attack. More channels can provide more hopping choices, which increases the attack difficulty.

Normalized average throughput with respect to number of channels.
Conclusion
Cognitive acoustic communication is a promising technique for the underwater acoustic networks. In such technique, the SUs utilize the idle channels of PUs to implement their data transmission. The acoustic communication capacity can be significantly improved. However, the jamming attack can block the communication of legitimate users. The existing anti-jamming approaches for CRNs are not suitable for UCANs due to the constraints in the underwater acoustic networks. In this article, we introduced the security issues in the cognitive acoustic networks and proposed jamming-resilient channel access algorithm called MAB-ACA. At first, the channel state is estimated through Baum–Welch genetic algorithm. Then the anti-jamming channel access strategy is made based on MAB algorithm. Under different types of jamming attack, the jamming-resilient approach can achieve higher throughput.
Footnotes
Academic Editor: Fei Yu
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported in part by the National Natural Science Foundation of China under grants U1609204, 61374021, and 61531015, and the ASFC under grant 2015ZC76006.
