Abstract
We study the channel assignment strategy in multichannel wireless sensor networks (WSNs) where macrocells and sensor nodes are overlaid. The WSNs dynamically access the licensed spectrum owned by the macrocells to provide pervasive sensing services. We formulate the channel assignment problem as a potential game which has at least one pure strategy Nash equilibrium (NE). To achieve the NE, we propose a stochastic learning-based algorithm which does not require the information of other players’ actions and the time-varying channel. Cluster heads as players in the game act as self-organized learning automata and adjust assignment strategies based on their own action-reward history. The convergence property of the proposed algorithm toward pure strategy NE points is shown theoretically and verified numerically. Simulation results demonstrate that the learning algorithm yields a 26% sensor node capacity improvement as compared to the random selection, and incurs less than 10% capacity loss compared to the exhaustive search.
1. Introduction
In wireless sensor networks [1], spatially distributed, low-power, and low-cost sensor nodes are deployed in a geographical area to monitor the environment. The sensor nodes usually form clusters, and in each cluster there is an energy-rich sensor node acting as the cluster head, while other sensor nodes are referred to as cluster members. A cluster head is a special sensor node with better cognitive radio (CR) [2, 3] functionality and is responsible for the spectrum sensing and the channel assignment among its cluster members.
To enable the various kinds of services [4–6] provided by a pervasive sensing system, proper radio resource management [7] is important. Due to the spectrum scarcity and the ad hoc nature of sensor network deployment, it could be hard to assign licensed bands to sensor networks. Therefore, the CR technology has been considered as a promising solution to the channel assignment problem of sensor networks. CR technology enables dynamic spectrum access (DSA) of unlicensed users in distributed networks. The key idea for CR operation is to allow the active sensing of the dynamic radio environment so as to improve the spectrum utilization. Akan et al. [8] provided a survey on cognitive radio sensor networks. By utilizing the CR technology, the sensor networks are able to attain high data rate due to available spectrum holes. In addition, dynamic spectrum access helps mitigate the interference incurred by dense deployment of sensor nodes.
Despite the promising features of cognitive sensor networks, the deployment of such heterogeneous networks with sensor clusters underlying the same spectrum as macrocells and in the same geographical area brings new technical challenges. In particular, we are interested in the case of densely populated sensor networks where, due to extensive frequency reuse, the cochannel interference (CCI) among sensor nodes and the cross-tier interference (between the macrocell and sensor networks) affect the system performance.
In the absence of a central controller, channel assignment in cognitive sensor networks is implemented in a distributed manner. In this paper, we consider the distributed channel assignment for self-organized cognitive sensor networks from a game-theoretic perspective. The main contributions of this paper are summarized as follows.
We model the femtocell channel assignment problem as an ordinal potential game (OPG). The game considers time-varying channel availability as its external state. We propose a fully decentralized channel assignment algorithm in which the channel is selected by each cluster head independently based on its action-reward history. The convergence property of the algorithm to a pure strategy NE point is verified theoretically as well as numerically.
This paper is organized as follows. We review the related work and compare it with our study in Section 2. In Section 3, the system model for a cognitive sensor network is presented. Section 4 describes the game-theoretic model of the distributed channel assignment problem. Section 5 presents the stochastic learning procedure carried out by the cluster heads. Finally, numerical results are given in Section 6, with the conclusion drawn in Section 7.
Notations. Normal letters represent scalar quantities; uppercase and lowercase boldface letters denote matrices and vectors, respectively. Given a finite set 𝒜,
2. Related Works
Distributed channel assignment has been extensively investigated for different networking applications where concurrent transmissions among neighboring wireless links exist.
In an interference avoidance scenario, different channels must be assigned to neighboring links. In femtocell networks [9], different methods have been proposed to assign different spectrum to adjacent femtocells. Examples include distributed random access [9], dynamic frequency planning [10], and clustering [11]. For sensor networks with multiple channels, graph theory-based methods have also been considered [12]. These methods can be viewed as variations of frequency planning and usually rely on negotiations among neighboring links. Graph theory is also a popular approach as the interference condition can be represented as nodes and edges in a graph. In sensor networks, Chowdhury et al. [13] proposed the dynamic channel allocation (DCA) and studied the related protocol design. Yu et al. [14] considered a game theory-based approach which takes into account both the network topology and transmission routing information. Channel selection for multicell orthogonal frequency division multiple access- (OFDMA-) based networks using graph framework was considered in [15].
On the other hand, in an interference mitigation scenario, mutual interference is tolerated. Channel assignment for cognitive sensor networks has been studied in [16]. Recently, self-organization of distributed agents based upon reinforcement learning (RL) mechanisms [17, 18] has been shown to be effective in the literature. Multiagent Q-learning (MAQL) was applied to femtocell networks in [19, 20]. MAQL involves the actions of other agents as the external state and thus requires the sharing of the knowledge of all agents’ actions. The stochastic learning (SL), in contrast, updates the actions of users based on their individual action-reward history. Nie and Comaniciu [21] considered the channel selection in cognitive radio using interference mitigation game formulation. SL was also applied to the opportunistic spectrum access in cognitive radio networks [22] to achieve the Nash equilibrium (NE) strategy. However, fully distributed resource allocation in cognitive sensor networks has not been extensively investigated.
3. System Model
We consider a cognitive radio sensor network consisting of one MBS and N sensor clusters under the coverage of the MBS. The method of sensor node clustering and cluster head selection [23] are also interesting topics but are out of the scope of this paper. Also we consider only the single-hop transmission and omit the multihop routing issue for ad hoc networks [24]. The sensors are deployed in an apartment block with a dual-stripe room layout, as shown in Figure 1.

Dual-stripe deployment of sensor clusters.
In our considered system, the medium access control (MAC) function in a cluster resembles that of cellular systems. The time domain is divided into frames, and a frame is further divided into time slots. In each frame, a cluster head allocates its cluster members (i.e., sensor nodes) in different time slots following a time division multiple access (TDMA) rule. For simplicity, we assume that in each slot each cluster head allocates one sensor node over one of the available channels. We emphasize that the proposed method can be easily generalized to the cases with multiple sensor nodes per slot. A sensor node is in idle mode unless the current time slot is allocated for it. A cluster head is idle if none of its sensor nodes is transmitting data and active otherwise. We therefore introduce an active ratio, which is defined as the percentage of active clusters in a time slot. An exemplary time slot allocation is depicted in Figure 2.

Exemplary time slot allocation in a frame. In the first slot, cluster heads A and C assign channels for sensor nodes A1 and C1, respectively.
The spectrum is divided into C channels, and the channels may be licensed to different macrocells (a.k.a spectrum owners). By utilizing CR, the sensor nodes access the same frequency band as the macrocell does. Since the sensor nodes are in an energy-tight situation and operate with ultralow power, we assume that the transmission power of a macrocell user equipment (MUE) is much higher than that of the sensors. Thus, the uplink transmission of an MUE will block the nearby sensor nodes using the same spectrum. For cross-tier interference mitigation, we define an avoiding region for each MUE. A channel is available to a cluster only if the channel is not assigned to an MUE whose avoiding region covers the cluster head. The channel availability for sensor clusters is expressed as a binary matrix
Assuming perfect synchronization in time and frequency, let
To reflect a practical cognitive sensor network, our system model incorporates the following considerations.
The uplink resource allocation for MUE is time-varying during the learning period, and the channel availability statistics (i.e., There is no centralized controller and the channel selection is performed independently by each cluster head. The number of sensor clusters in the system, N, is unknown.
With these considerations, solving (4) is a challenging task, since the only available information for decision making at each individual player is its own action-reward history. Thus, a fully distributed channel selection scheme is proposed.
4. Game-Theoretic Model and Channel Selection
In this section, we present the game-theoretic formulation of the self-organized cognitive sensor network channel selection problem. Our objective is to devise for each cluster head a distributed channel assignment strategy that takes into account the effect of both the sensor-tier and cross-tier interference. We summarize our notations related to the game formulation in Table 1.
Summary of notations in game-theoretic formulation.
4.1. Problem Formulation and Game Model
The channel selection problem described in the previous section can be modeled by a normal-form game with external state, expressed as a 4-tuple:
Inspired by [21], the reward function is designed to consider the interference received (inward) and generated (outward) by each link. In this way, the cluster heads implicitly cooperate to reduce the interference generated toward other sensor nodes. We define the generalized SINR (gSINR) for player i as
By the definition in (7), when the channel is available, the reward is given by Shannon's capacity formula where both inward and outward interference are accounted for. When the channel is not available, the reward is zero. Notice that the calculation of the reward function in (7) relies on the knowledge of other players’ action. This leads to overhead due to the required information. The implementation is possible, and discussion on such protocol design can be found in [21]. The self-organization claimed in this paper is based on the fact that the action in each time instant is selected by each player independently and simultaneously.
For systems with the channel availability as the external state, the utility function is defined as the expected reward of player i over the external state (i.e., channel availability
4.2. Analysis of Nash Equilibrium
With the utility function defined in (8), we show the existence of an NE point for the proposed game in the following proposition.
Proposition 1.
The game 𝒢 is an ordinal potential game (OPG) which possesses at least one pure strategy NE.
Proof.
Consider the function
Now consider an improvement step made by cluster head i that changes its action unilaterally from
Therefore, 𝒢 is an OPG with potential function
Notice that the term
5. Stochastic Learning Procedure
Here, we discuss obtaining the NE via stochastic learning. As the channel state is time-varying and the action is selected by each player simultaneously and independently in each play, previous algorithms that require complete information (e.g., better response dynamics [26]) may not be applicable here. Thus, we propose a decentralized stochastic learning- (SL-) based algorithm by which the BSs learn toward the equilibrium strategy profile from their individual action-reward history.
To facilitate the development of the SL-based channel selection algorithm, we extend the channel selection game into a mixed strategy form. Let
The proposed distributed channel assignment (DCA) algorithm for cognitive sensor networks is described in Algorithm 1.
(1) Initially, set (2) At the beginning of the nth slot, each player selects an action (3) In each slot, each BS transmits data. At the end of each slot, each BS receives the instantaneous reward depending on the precoding scheme. (4) All players update their channel assignment probability vector and utility estimation according to the rules: where
In each play, the channel selection is based on a probability distribution over the set of channels. After each play, cluster head i obtains the instantaneous reward and updates the mixed strategy (i.e., channel selection vector)
Proposition 2.
If the learning rates are sufficiently small, the sequence
Proof.
Please refer to [28, Section 4].
The ODE in (14) is actually the ODE of the replicator dynamics [29]. An intuitive interpretation is that the probability of taking an action increases if the utility is higher than the average utility over all possible actions and decreases otherwise.
The convergence property of the proposed algorithm is discussed in the following proposition.
Proposition 3.
The SOCA algorithm converges to a pure strategy NE for OPGs if the learning rates are sufficiently small.
Proof.
First, we rewrite the ODE in (14) as follows:
For OPGs,
Thus,
While the SL-based learning algorithm converges to an NE point when the learning rates approach zero, smaller learning rates lead to a slower convergence. Therefore, the choice of the learning rates strikes a trade-off between accuracy and speed and may be determined by training in practice.
6. Numerical Results
For system-level simulations, we consider a cognitive sensor network deployed within the coverage of a cellular network. As in Figure 1, the simulation environment includes one macrocell covering one dual-stripe apartment block. The apartment block contains 40 single-floor apartments. There is one sensor cluster in each apartment. When a sensor cluster is active, its cluster head assigns one channel to cluster members randomly located in the same apartment. Without loss of generality, we consider the channel assignment in the first slot of each frame, in which for each active cluster there is one cluster member. The simulation parameters are listed in Table 2.
Simulation parameters.
6.1. Convergence of the Proposed SL-Based Learning Algorithm
We first study the time-evolving behaviors of the proposed stochastic learning method.
6.1.1. Evolution of Mixed Strategies
Figure 3 shows the evolutions of the channel assignment probabilities (i.e., mixed strategy) using the proposed SL-based algorithm. We consider different learning rates and study the convergence behaviors. It is observed that, with equal initial probability, the channel assignment probability converges to a pure strategy (i.e., the probability of choosing one strategy approaches one) in around 80 and 20 iterations for

Evolution of the mixed strategies (probability of taking different actions) of all players. Each pair of
6.1.2. Verification of NE
As shown in Figure 3, the convergence toward pure strategy is observed for both

Test of unilateral deviation from the resulting strategy profile of each of the 10 players.
6.1.3. Evolution of Actions
During the learning procedure, the channel assignment is based on probabilistic experiments. When the channel assignment changes in the next frame, the switching between different channels brings overhead since the sensor node needs to be reconfigured. The evolution of actions for selected players is shown in Figure 5. As can be seen, while Figure 3 (Left) reveals that it takes around 80 iterations for all players to converge to pure strategies, the actions seldom change after about 60 iterations in the learning procedure. This suggests that channel switching, if at all happens, usually happens only in the beginning of the entire learning procedure. Actually, our proposed learning algorithm aims at learning the equilibrium strategy in the long run. The channel switching and the incurred sensor node reconfiguration are manageable overheads compared to the long operation time.

Evolution of the actions
6.1.4. Different Active Ratios
We further consider different active ratios and investigate the convergence behaviors under different levels of mutual interference. The results for active ratio of 50% and 75% are shown in Figure 6. We observe that the convergence toward pure strategy takes around 100 and 150 iterations for active ratio of 50% and 75%, respectively. Comparing the case of 25% active ratio in Figure 3 (Left), we see that it takes fewer iterations for densely active networks to converge than for sparsely active sensor networks.

Evolution of the mixed strategies (probability of taking different actions) of all players with active ratios of 50% and 75%. Each pair of
6.2. Capacity Performance
6.2.1. Capacity under Unilateral Deviation
In Figure 4 we have shown that unilateral deviation leads to decreased utility. While the altruistic utility function design reduces the mutual interference, we are also interested in the performance of Nash equilibrium strategy in terms of the throughput of each cluster as well as the whole system. Therefore, in Figure 7, we test the change in capacity under unilateral deviation from the NE strategy for all players. As depicted in Figure 7(a), there is no significant change in the average capacity per sensor link when only one player unilaterally deviates from its NE strategy. From Figure 7(b) we observe that, for all players, deviation from NE strategy decreases their own capacity.

Test of unilateral deviation from the resulting strategy profile of each of the 10 players.
6.2.2. Comparison with Other Methods
We further compare the performance of the proposed channel selection scheme with two other approaches, namely, random allocation and exhaustive search, described as follows.
In the random allocation scheme, each cluster head randomly selects a channel for its sensor node in each frame. Neither learning algorithm nor centralized controller is implemented. In the exhaustive search scheme, it is assumed that there exists a centralized controller which knows all system information including the channel gains, the channel availability statistics, and the number of clusters. The channel assignment profile is determined by maximizing the expected sum capacity (i.e., solving (4)).
The performance of different channel selection schemes is evaluated by the average capacity per sensor node,
The simulation results of average capacity and JFI for different active ratios are summarized in Table 3. We observe that the exhaustive search method results in the best average capacity with the worst fairness. The random selection scheme, in contrast, has the lowest average capacity but good fairness due to its randomness nature. The proposed method shows well-balanced performance in terms of both average capacity and fairness. The results show the advantages of the proposed method; through the learning procedure toward equilibrium, the capacity of each player is considered and fewer players are sacrificed. If we examine the final channel selection profile, it is observed that, in the progress of convergence toward the NE point, the proposed learning algorithm allocates the mutually interfered users on different channels.
Comparison of the capacity and fairness for different channel assignment schemes.
7. Conclusion
In this paper, we have studied the problem of distributed channel assignment in self-organized cognitive sensor networks with unknown channel and unknown number of clusters. We have presented a game-theoretic approach to distributively manage interference and enable the coexistence of sensor and macrocell operations in a scenario where sensor nodes operate in the same spectrum as a cellular system. We modeled channel assignment problem by means of an ordinal potential game. A decentralized stochastic learning algorithm has been proposed. Simulation results have demonstrated the convergence of the algorithm toward a pure strategy Nash equilibrium with sufficiently small learning rates. The proposed method outperforms the random selection scheme in terms of average capacity, while the performance loss compared to the exhaustive search is limited. In addition, its fairness level is comparable to that of the random selection and surpasses the exhaustive search scheme.
Footnotes
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
Acknowledgment
This work was supported in part by the National Science Council, Taiwan, under Grant NSC 102-2218-E-001-001.
