Abstract
This paper presents a selective spectrum sensing and access strategy in a cognitive radio sensor network (CRSN), in order to maximize the throughput of secondary user (SU) system. An SU senses multiple channels simultaneously via wideband spectrum sensing. To maximize the throughput and reduce the sensing energy consumption, not all of the channels are sensed. The SU selects some channels for spectrum sensing and accesses these channels based on the sensing results. The unselected channels are accessed directly with low transmission power. A selection making algorithm based on partially observable Markov decision process (POMDP) theory is proposed, to make the SU determine which channels are selected for sensing, how long the sensing time, and the transmission powers of channels. An optimal policy and a myopic policy are proposed to solve the proposed POMDP problem. Moreover, an optimization problem is proposed to solve the synchronism problem among the selected channels. Numerical results show that the proposed selective spectrum sensing and access strategy improves the system performance efficiently.
1. Introduction
Wireless sensor networks (WSNs) play a critical role in many research areas, such as machine-to-machine network (M2M), emergency communication, and smart home [1–4]. There are two characteristics about the WSNs: limited energy and scarce spectrum. Generally, the nodes of WSNs are powered by battery; energy efficiency is one of the design factors. In this paper, we focus on the spectrum scarcity problem of WSNs. According to the dramatic increasing number of wireless devices, the problem of available spectrum scarcity becomes more serious. Cognitive radio (CR), which has been introduced as a way to improve the efficiency of spectrum utilization, becomes a research focus in recent years [1–27]. Combined with the advantages of CR and WSNs, cognitive radio sensor network (CRSN) has been studied [1–4, 6–8, 15]. In studies [1, 2], CR has been used in smart grid communication networks. The spectrum access strategy has been proposed, in order to find much available spectrum resource for data collection and transmission. In order to improve the energy efficient of vehicular ad hoc network, CRSN has been studied in [3]. In the CRSN, the wireless nodes are defined as secondary (unlicensed) users (SUs), while the other available spectrum owners are defined as primary (licensed) users (PUs). The important challenge is that the SUs find available channels and adjust their transmission parameters (transmission power, transmission time, carrier frequency, etc.) to access these channels, while avoiding harmful interference to PUs.
Normally, the available spectrum opportunities are found in time [4, 7, 8, 10–12, 15] space [14] and frequency domains [18]. In [10–12], SUs perform spectrum sensing to find access opportunities in time domain. The SUs access the licensed channel when it was not used by PUs. In [4], channel sensing and switching were used for SU to access a set of PU channels. In [7, 8], two spectrum sensing methods were studied in CRSN. In [15], both single-user and collaborative spectrum sensing schemes were proposed in cognitive sensor networks, while quantised sensing observation is used. In [13], the SUs accessed the licensed channel directly at any time; the transmission power was limited to avoid unacceptable interference to PU system. The spectrum access opportunity in space domain was used. In order to find available spectrum in different channels, parallel sensing [19], cooperative sensing [17], and wideband spectrum sensing (WSS) were studied. In [16, 17, 20], cooperative sensing has been studied. The SUs sense other channels if the current channel has been occupied or detect a set of channels simultaneously [21–25], to find available spectrum in a wideband spectrum.
Different from the excessive spectrum sensing and access researches, which focus on utilizing the time or space domain only, a spectrum sensing and access strategy of CRSN is proposed in this paper, in which an SU system uses the available spectrum both in time and space domains in multiple channels. WSS is used for SU to identify the presence of PU signals. Different from the normal spectrum sensing scheme, which an SU senses the channels one by one, in WSS scheme, multiple channels are detected simultaneously by an SU; the sensing time durations of channels are the same. After spectrum sensing, the SU accesses all of the channels with mixed access strategy (MAS) [26, 27]. Under MAS, the SU accesses the channels with different powers based on the sensing results. When the channel is sensed as idle, the SU accesses it with a higher transmission power; the available spectrum in time domain is used. Otherwise, transmission power is lower enough to avoid unacceptable interference to PUs. The available spectrum in space domain is used. Thus, comparing with other spectrum access strategies, the SU in MAS obtains greater throughput.
However, all of the channels are selected for WSS which is not a suitable choice. There are three reasons for the necessity of the sensing channels' selection. First, sensing all of the channels is a huge challenge; great energy is needed [9], which is not a good choice for WSNs. Second, if PU signals in some channels are weak, much more sensing time is needed to guarantee the protection of PUs. For the reason of WSS utilization, the sensing overhead of the system is prolonged. Finally, when the idle probabilities of some channels are low, the SU obtains a larger average throughput when it accesses these channels directly, compared with the average throughout which is obtained when the SU accesses the channels after sensing. Thus, in this paper, the SU selects some channels for spectrum sensing. For the channel which has been selected, the SU accesses it and adjusts its transmission power based on the sensing results. Otherwise, the SU accesses the channels directly via underlay access strategy. Therefore, in our proposed system model, a tradeoff exists between achieving larger throughput and selecting appropriate sensing channels.
According to the dynamic spectrum environment, the SU cannot obtain accurate states of PU channels, due to the imperfect spectrum sensing and not all of the channels are selected for sensing. In this paper, we propose a selection making algorithm by using the partially observable Markov decision process (POMDP) theory. Under the selection making algorithm, the SU determines which channels are selected for spectrum sensing, how long the sensing time, and the transmission powers of the accessed channels. The objective of the selection making algorithm is to maximize the throughput of SU system, while avoiding unacceptable interference to PUs. An optimal policy and a myopic policy are derived to solve the formulated POMDP problem. Moreover, we present an optimization algorithm to solve the synchronism problem among the selected channels. Extensive numerical examples are proposed to demonstrate the merit of the proposed algorithms.
The contributions of this paper can be described as follows.
A new selective sensing and access strategy of CRSN based on POMDP theory is proposed, in which at beginning of each slot, an SU selects some channels for wideband spectrum sensing and accesses all of the channels via mixed access strategy. An optimal policy and a myopic policy are proposed to solve the proposed POMDP problem. We consider an optimization problem, in which the decision probability thresholds of the selected channels are jointly optimized, in order to ensure the synchronism among the selected channels.
The rest of this paper is organized as follows. In Section 2, we review the related work. System model is proposed and analyzed in Section 3. In Section 4, we discuss a selection making algorithm via the POMDP framework; an optimal policy and a myopic policy are proposed to solve the proposed POMDP problem. The advantage of the proposed algorithms is illustrated by numerical results in Section 5, and conclusions are drawn in Section 6.
2. Related Work
Wideband spectrum sensing has been discussed in [21–25]. In [21], the detection thresholds of energy detectors in channels were jointly optimized. In order to maximize the throughput of SU, the sensing time and detection thresholds were jointly optimized in [22]. In [23, 24], both the sensing time and transmission powers of channels are jointly optimized. However, in these works, the SU selects all of the channels for spectrum sensing, and accesses these channels only when the sensing result is idle. The selection of sensing channels has not been considered. Different from the previous works, in this paper, not all of the channels are selected for sensing. After spectrum sensing, the SU accesses the channels via mixed access strategy. No matter the sensing result is idle or occupied, the SU accesses the channel with different transmission powers, and greater throughput is obtained. Moreover, different from the previous works in which the WSS was considered in one slot, we consider the sensing channels' selection and spectrum access in multiple slots. The problem becomes complicated.
According to the time-varying character of the dynamic spectrum environment, POMDP is used to formulate the selective spectrum sensing and access problem. In [28, 29], two optimal opportunistic spectrum access MAC protocols were proposed. In [30], a well-known separation principle was proposed to transfer the solution of the POMDP problem from optimal policy to myopic policy. However, both are based on the condition that the SU can sense and access one channel in a slot. In our proposed scheme, the SU selects several channels for spectrum sensing and accesses all of the channels after sensing. The calculation of sensing time becomes complicated. In [31], an adaptive sensing scheduling scheme was proposed, based on POMDP theory. The study [32] is probably the most relevant paper, in which an optimal sensing channels' selection policy is proposed. However, it accesses the channel only when the sensing result is idle. In our work, the SU accesses all of the channels with different transmission powers. Although the previous works take the same mathematical method (POMDP), the problem in this paper becomes quite complicated, and some efficient methods are proposed to solve the problem.
3. System Model
In this section, we present the system model of this paper and the structure of wideband spectrum sensing. Then, a selective sensing and access strategy in multiple channels is proposed.
3.1. System Model
An SU system shares a licensed wideband spectrum assigned to PUs, which can be divided into N nonoverlapping narrowband channels. The channels operate in a time-slotted manner. The traffic of PU system is modeled as a two states ON-OFF process. Figure 1 shows the structure of SU system. We assume the SU obtains the duration of slot and can keep synchronization with PUs [10, 11]. Denote

Structure of SU system.
At beginning of each slot, the SU selects some channels for WSS and accesses the channels with MAS. For the other channels, the SU waits the same time, in order to keep synchronism with others. The unselected channels are accessed directly. The SU cannot transmit in one channel while performing spectrum sensing in another one. Denote

Structure of SU in one slot.
3.2. Wideband Spectrum Sensing
In WSS scheme, we assume an wideband spectrum is occupied by some PUs. The PUs are operating in different spectrum bands; idle probabilities of channels are not the same. WSS is used to identify the presence of PUs in some special channels. In WSS, the wideband is divided into N nonoverlapping channels. The SU receives data through a wideband RF antenna. Then, the received data is passed through a high speed A/D converter [21–25]. The structure of WSS is shown as Figure 3. We can find that the sensing time durations of channels are the same.

Structure of wideband spectrum sensing.
3.3. Selective Sensing and Access Strategy
At the beginning of each slot, the SU determines (1) which channels are selected for spectrum sensing, (2) how long the sensing time, (3) which channels to be accessed with power
Constraint (4) is a sensing time constraint, in order to protect PUs. Constraint (5) guarantees the WSS is used for spectrum sensing. It is not easy to solve this problem directly. We present a selection making algorithm based on POMDP theory, to find the optimal and suboptimal solutions.
4. Selection Making Algorithm
In the proposed strategy, because not all of the channels are selected for sensing and the presence of sensing errors, the SU cannot obtain the accurate states of each channel. At the beginning of each slot, decisions are made based on the previous actions and observations. This setting matches well with the POMDP framework [28–35]. Therefore, an optimization selection making problem under the proposed strategy is formulated as a POMDP, which determines an optimal policy for sensing channels' selection, the size of sensing time, and the access decisions. Next, we describe the POMDP framework. An optimal and a myopic policy are proposed to solve the POMDP problem.
4.1. POMDP Framework
4.1.1. Actions
At the beginning of slot m, the actions of SU have three stages: determine which channels to sense, the size of sensing time, and the transmission power. Let
4.1.2. Observations
Let If channel n has been selected for sensing and the sensing result is idle, the SU accesses it with power However, if the SU transmitter receives NAK, it indicates the sensing result is wrong; the transmission of SU causes unacceptable interference to PU. We denote this as observation 1, If the sensing result of channel n is occupied, the SU accesses it with power If the SU accesses channel n directly with power
The difference between observations 2 and 3 is that the SU has not sensed the channel in observation 3. Although the announcing signals are the same, the transmitter of SU can distinguish different observations.
4.1.3. Belief Vector
In the POMDP formulation, belief vector is used to infer the channel states at the beginning of each slot. It is a conditional probability for the past history, including the past decisions and observations. At the end of each slot, the belief vector is updated based on different actions and the corresponding observations, in order to obtain accurate information of the dynamic environment. The belief vector of channel n in slot m is denoted as
Finally, when
4.1.4. Reward Function
Denote
Based on the above discussion and analysis, the procedure in the POMDP framework is shown as Figure 4.

The procedure in the POMDP framework.
4.2. Solution to POMDP
In the proposed scheme, the design objective is to develop an optimal selective sensing and access policy in each slot, in order to maximize the expected total reward obtained in the finite M slots. The complete problem formulation based on POMDP is given by
It is a constraint POMDP problem, which requires an intractable randomized policy to achieve optimality. However, the objective function can be separated from the constraint, if the SU trusts the current sensing result and accesses the channel based on the sensing result in the current slot [28, 30]. The sensing time is obtained from
After obtaining the sensing time, the problem reduces to a simple one. Two questions are considered: which channels are selected for spectrum sensing and which transmission powers are selected for transmission. The narrowband channels are independent with each other. The actions of SU system can be divided into the combination of each channel's actions. Then, we can calculate the optimal actions of each channel independently.
4.2.1. Optimal Policy
In order to calculate the optimal policy of channel n effectively, a value function
It represents the updated knowledge of channel state based on the actions and observations of SU in slot m.
4.2.2. Myopic Policy
The solution of optimal policy leads to great computational complexity, especially when the number of channels is large. In order to address this problem, a myopic policy is proposed, in which the SU maximizes the immediate expected reward in the current slot m. The myopic policy solution is given by
4.2.3. Synchronism among Channels
In the proposed solutions, we calculate an optimal policy and a myopic policy of each channel independently, instead of calculating all of the channels at the same time. The access point is
Denote
The detection probability thresholds of each channel are adjusted, based on the sensing time duration. The
5. Numerical Results
In this section, the proposed optimal selective sensing and access policy will be compared with myopic policy and random policy under different simulation conditions. In random policy, the SU selects channels for spectrum sensing, and the access actions are also selected randomly. The sensing time durations of the three policies are the same, which are obtained from the proposed optimization algorithm (20). The slot size of PUs is fixed, the same as the sensing period of SU system,
Figure 5 shows the performance of SU's aggregate throughput under different total channel number N. We consider two cases:

SU's throughput performance comparison under different total channel number N.
In Figure 6, we study the performance of SU's aggregate throughput under different idle probability of two adjust PU channels. The idle probability of channel n is

SU's throughput performance comparison under different idle probability λ between two adjust channels.
Figure 7 illustrates the performance of aggregate throughput under optimal sensing time and fixed sensing time. Both are under the myopic policy. In the fixed sensing time case, the SU senses the channels in each slot with

SU's throughput performance comparison under different sensing time duration.
Figure 8 illustrates the performance of aggregate throughput under different access strategies. All of them are under the optimal policy. The number of channels is four, and the probability of channel n is

SU's throughput performance comparison under different access strategy.
6. Conclusion
In this paper, we propose a selective spectrum sensing and access strategy in cognitive radio sensor networks. In order to maximize the aggregate throughput of SU system and reduce the spectrum sensing energy consumption, the SU selects some channels for spectrum sensing, accesses these channels based on the sensing results, and accesses the other channels directly. According to the dynamic spectrum environment, a selection making algorithm based on PODMP theory is proposed. An optimal policy and a myopic policy are proposed to solve the POMDP problem. Theoretical analysis and numerical results show that the proposed selection making algorithm can better balance maximizing the throughput of SU system and avoiding unacceptable interference to PUs.
Footnotes
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
Acknowledgment
The work in this paper is partly supported by programs of Natural Science Foundation of China under Grant nos. 60903170, U0835003, and U1035001.
