Abstract:With the development of wireless communication and Internet of Things (IoT), there are massive wireless devices that need to share the limited spectrum resources. Dynamic spectrum access (DSA) is a promising paradigm to remedy the problem of inefficient spectrum utilization brought upon by the historical command-and-control approach to spectrum allocation. In this article, we investigate the distributed DSA problem for multiusers in a typical multichannel cognitive radio network. The problem is formulated as a decentralized partially observable Markov decision process (Dec-POMDP), and we propose a centralized off-line training and distributed online execution framework based on cooperative multi-agent reinforcement learning (MARL). We employ the deep recurrent <span class="mjpage"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="1.838ex" height="2.509ex" style="vertical-align: -0.671ex;" viewBox="0 -791.3 791.5 1080.4" role="img" focusable="false" xmlns="http://www.w3.org/2000/svg"><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"> <use xlink:href="#MJMATHI-51" x="0" y="0"></use></g></svg></span> -network (DRQN) to address the partial observability of the state for each cognitive user. The ultimate goal is to learn a cooperative strategy which maximizes the sum throughput of a cognitive radio network in a distributed fashion without information exchange between cognitive users. Finally, we validate the proposed algorithm in various settings through extensive experiments. The experimental results show that the proposed CoMARL-DSA algorithm outperforms the state-of-the-art deep <span class="mjpage"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="1.838ex" height="2.509ex" style="vertical-align: -0.671ex;" viewBox="0 -791.3 791.5 1080.4" role="img" focusable="false" xmlns="http://www.w3.org/2000/svg"><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"> <use xlink:href="#MJMATHI-51" x="0" y="0"></use></g></svg></span> -learning for spectrum access (DQSA) in terms of successful access rate and collision rate by at least 14% and 12%, respectively.<svg xmlns="http://www.w3.org/2000/svg" style="display: none;"><defs id="MathJax_SVG_glyphs"><path stroke-width="1" id="MJMATHI-51" d="M399 -80Q399 -47 400 -30T402 -11V-7L387 -11Q341 -22 303 -22Q208 -22 138 35T51 201Q50 209 50 244Q50 346 98 438T227 601Q351 704 476 704Q514 704 524 703Q621 689 680 617T740 435Q740 255 592 107Q529 47 461 16L444 8V3Q444 2 449 -24T470 -66T516 -82Q551 -82 583 -60T625 -3Q631 11 638 11Q647 11 649 2Q649 -6 639 -34T611 -100T557 -165T481 -194Q399 -194 399 -87V-80ZM636 468Q636 523 621 564T580 625T530 655T477 665Q429 665 379 640Q277 591 215 464T153 216Q153 110 207 59Q231 38 236 38V46Q236 86 269 120T347 155Q372 155 390 144T417 114T429 82T435 55L448 64Q512 108 557 185T619 334T636 468ZM314 18Q362 18 404 39L403 49Q399 104 366 115Q354 117 347 117Q344 117 341 117T337 118Q317 118 296 98T274 52Q274 18 314 18Z"></path></defs></svg>

Joint Spectrum and Power Allocation in Wireless Network: A Two-Stage Multi-Agent Reinforcement Learning Method

Multi-Agent Reinforcement Learning for Multi-Cell Spectrum and Power Allocation

Traffic-driven Spectrum and Power Allocation Via Scalable Multi-Agent Reinforcement Learning

Deep Reinforcement Learning for Joint Spectrum and Power Allocation in Cellular Networks

Joint Optimization of Handover Control and Power Allocation Based on Multi-Agent Deep Reinforcement Learning

Distributed Noncoherent Joint Transmission Based on Multi-Agent Reinforcement Learning for Dense Small Cell Networks

Multi-Agent Deep Reinforcement Learning for Dynamic Power Allocation in Wireless Networks.

Scalable Joint Learning of Wireless Multiple-Access Policies and their Signaling

Multi-Agent Reinforcement Learning for Dynamic Spectrum Access.

Multi-Agent Reinforcement Learning-Based Decentralized Spectrum Access in Vehicular Networks with Emergent Communication

Semi-Distributed Joint Power and Spectrum Allocation for LAA Based Small Cell Networks

Against Jamming Attack in Wireless Communication Networks: A Reinforcement Learning Approach

Distributed Deep Reinforcement Learning-Based Spectrum and Power Allocation for Heterogeneous Networks

Joint AMC and Resource Allocation for Mobile Wireless Networks Based on Distributed MARL

Online Multi-Agent Reinforcement Learning for Multiple Access in Wireless Networks

Multi-Agent Reinforcement Learning for Joint Cooperative Spectrum Sensing and Channel Access in Cognitive UAV Networks

Reinforcement Learning Enhanced Iterative Power Allocation in Stochastic Cognitive Wireless Mesh Networks

Cooperative Multi-Agent Reinforcement Learning Based Distributed Dynamic Spectrum Access in Cognitive Radio Networks

Joint Sub-Band and Transmission Rate Selection for Anti-Jamming Non-Contiguous Orthogonal Frequency Division Multiplexing System: An Upper Confidence Bound Based Reinforcement Learning Approach

Generalization of Deep Reinforcement Learning for Jammer-Resilient Frequency and Power Allocation

Distributed Two-tier DRL Framework for Cell-Free Network: Association, Beamforming and Power Allocation.