Abstract:Dynamic spectrum access (DSA) is regarded as an effective and efficient technology to share radio spectrum among different networks. As a secondary user (SU), a DSA device will face two critical problems: avoiding causing harmful interference to primary users (PUs), and conducting effective interference coordination with other secondary users. These two problems become even more challenging for a distributed DSA network where there is no centralized controllers for SUs. In this paper, we investigate communication strategies of a distributive DSA network under the presence of spectrum sensing errors. To be specific, we apply the powerful machine learning tool, deep reinforcement learning (DRL), for SUs to learn "appropriate" spectrum access strategies in a distributed fashion assuming NO knowledge of the underlying system statistics. Furthermore, a special type of recurrent neural network (RNN), called the reservoir computing (RC), is utilized to realize DRL by taking advantage of the underlying temporal correlation of the DSA network. Using the introduced machine learning-based strategy, SUs could make spectrum access decisions distributedly relying only on their own current and past spectrum sensing outcomes. Through extensive experiments, our results suggest that the RC-based spectrum access strategy can help the SU to significantly reduce the chances of collision with PUs and other SUs. We also show that our scheme outperforms the myopic method which assumes the knowledge of system statistics, and converges faster than the Q-learning method when the number of channels is large.

What problem does this paper attempt to address?

This paper attempts to solve two key problems in Dynamic Spectrum Access (DSA) networks, especially in a distributed environment without a centralized controller: 1. **Avoid causing harmful interference to Primary Users (PUs)**: In DSA networks, as Secondary Users (SUs), devices must ensure that they do not interfere with the communication of PUs. 2. **Effectively coordinate interference with other Secondary Users**: SUs need to avoid interfering with each other to improve spectrum utilization and communication quality. Specifically, the paper applies Deep Reinforcement Learning (DRL) and Reservoir Computing (RC) to solve these problems. Through these methods, SUs can learn "appropriate" spectrum access strategies without knowing the system statistical information. RC utilizes the time correlation in DSA networks, enabling SUs to make distributed spectrum access decisions based on current and past spectrum sensing results. ### Main contributions of the paper include: 1. **Propose a distributed dynamic spectrum access strategy based on DRL and RC**, which takes into account the imperfect spectrum sensing results of SUs and enables SUs to perform spectrum access in a fully distributed environment. 2. **Conduct extensive performance evaluations**, and the results show that the proposed machine - learning - based spectrum access strategy can quickly learn the activities of PUs and significantly reduce the chances of collisions with PUs and other SUs. 3. **Compared with the myopic scheme assuming known system statistical information**, the proposed scheme performs better in both single - SU and multiple - SU cases. Compared with Q - learning, this scheme shows a faster convergence speed and better performance. In addition, compared with the DRL+MLP strategy, the DRL+RC strategy can utilize the time correlation of sensing results, thus bringing significant performance improvements. ### Formula summary - **Cumulative discounted reward**: \[ R=\sum_{t = 1}^{\infty}\gamma^{t - 1}r_{t+1} \] where \(\gamma\in[0,1]\) is the discount rate, and \(r_{t+1}\) is the immediate reward obtained at time step \(t + 1\). - **Q - value update rule**: \[ Q(s_t,a_t)\leftarrow Q(s_t,a_t)+\alpha\left[r_{t+1}+\gamma\max_{a_{t+1}}Q(s_{t+1},a_{t+1})-Q(s_t,a_t)\right] \] where \(\alpha\in(0,1)\) is the learning rate, and \(\gamma\in[0,1]\) is the discount rate. - **Signal - to - Interference - plus - Noise Ratio (SINR) of the received signal**: \[ \text{SINR}_i=\frac{p_{ij}\cdot|h_{ii}|^2}{p_{jj}\cdot|h_{ji}|^2+\sum_{k = 1,k\neq i}^{L}p_{kj}\cdot|h_{ki}|^2+B\cdot N_0} \] where \(p_{ij}\), \(p_{jj}\), and \(p_{kj}\) are the transmit powers of PU \(j\), SU \(i\), and other SU \(k\) on the \(j\)-th radio channel respectively; \(|h_{ii}|^2\), \(|h_{ji}|^2\), and \(|h_{ki}|^2\) represent the link gains respectively; \(B\) is the channel bandwidth, and \(N_0\) is the noise spectral density. Through these methods and formulas, the paper shows how to achieve efficient dynamic spectrum access in a complex wireless environment.

Distributive Dynamic Spectrum Access through Deep Reinforcement Learning: A Reservoir Computing Based Approach

Dynamic Spectrum Sharing Based on Deep Reinforcement Learning in Mobile Communication Systems

Multi-agent Reinforcement Learning Based Distributed Dynamic Spectrum Access

Dynamic Multichannel Sensing in Cognitive Radio: Hierarchical Reinforcement Learning

Traffic Priority-Aware Multi-User Distributed Dynamic Spectrum Access: A Multi-Agent Deep RL Approach

Deep Reinforcement Learning Based Massive Access Management for Ultra-Reliable Low-Latency Communications

DRL-based Underlay Dynamic Spectrum Access for Cognitive Satellite Networks under Spectrum Sensing Errors

A deep reinforcement learning-based D2D spectrum allocation underlaying a cellular network

RDRL: A Recurrent Deep Reinforcement Learning Scheme for Dynamic Spectrum Access in Reconfigurable Wireless Networks

Cooperative Multi-Agent Reinforcement Learning Based Distributed Dynamic Spectrum Access in Cognitive Radio Networks

Deep Reinforcement Learning for Spectrum Sharing in Future Mobile Communication System.

Transfer Reinforcement Learning for Dynamic Spectrum Environment

DRL meets DSA Networks: Convergence Analysis and Its Application to System Design

Dynamic Spectrum Access in Cognitive Radio Networks Using Deep Reinforcement Learning and Evolutionary Game

Dynamic Spectrum Access for D2D-Enabled Internet-of-Things: A Deep Reinforcement Learning Approach

Deep Reinforcement Learning-Based Dynamic Spectrum Access for D2D Communication Underlay Cellular Networks

Multi-Agent Reinforcement Learning for Dynamic Spectrum Access.

Dynamic Spectrum Access for Ambient Backscatter Communication-assisted D2D Systems with Quantum Reinforcement Learning

Dynamic multiple access based on deep reinforcement learning for Internet of Things

Federated Dynamic Spectrum Access

Primary-User-Friendly Dynamic Spectrum Anti-Jamming Access: A GAN-Enhanced Deep Reinforcement Learning Approach