Abstract:Due to the non-uniform geographic distribution and time-varying characteristics of the ground traffic request, how to make full use of the limited beam resources to serve users flexibly and efficiently is a brand-new challenge for beam hopping satellite systems. The conventional greedy-based beam hopping methods do not consider the long-term reward, which is difficult to deal with the time-varying traffic demand. Meanwhile, the heuristic algorithms such as genetic algorithm have a slow convergence time, which can not achieve real-time scheduling. Furthermore, existing methods based on deep reinforcement learning (DRL) only make decisions on beam patterns, lack of the freedom of bandwidth. This paper proposes a dynamic beam pattern and bandwidth allocation scheme based on DRL, which flexibly uses three degrees of freedom of time, space and frequency. Considering that the joint allocation of bandwidth and beam pattern will lead to an explosion of action space, a cooperative multi-agents deep reinforcement learning (MADRL) framework is presented in this paper, where each agent is only responsible for the illumination allocation or bandwidth allocation of one beam. The agents can learn to collaborate by sharing the same reward to achieve the common goal, which refers to maximize the throughput and minimize the delay fairness between cells. Simulation results demonstrate that the offline trained MADRL model can achieve real-time beam pattern and bandwidth allocation to match the non-uniform and time-varying traffic request. Furthermore, when the traffic demand increases, our model has a good generalization ability.

Multi-Agent DRL for Two-Timescale Bandwidth Allocation in Multi-Beam Satellite Networks

Dynamic Beam Pattern and Bandwidth Allocation Based on Multi-Agent Deep Reinforcement Learning for Beam Hopping Satellite Systems

DRL-Based Dynamic Resource Allocation for Multi-Beam Satellite Systems

A Novel Deep Reinforcement Learning Architecture for Dynamic Power and Bandwidth Allocation in Multibeam Satellites

Multi-objective deep reinforcement learning based time-frequency resource allocation for multi-beam satellite communications

Deep Reinforcement Learning Based Dynamic Channel Allocation Algorithm in Multibeam Satellite Systems

Multi-Agent DRL for Resource Allocation and Cache Design in Terrestrial-Satellite Networks

Distributed Intelligence: A Verification for Multi-Agent DRL-Based Multibeam Satellite Resource Allocation

Dynamic Resource Allocation With Deep Reinforcement Learning in Multibeam Satellite Communication

Double-Timescale Multi-Agent Deep Reinforcement Learning for Flexible Payload in VHTS Systems

Deep Reinforcement Learning for Dynamic Bandwidth Allocation in Multi-Beam Satellite Systems

Dynamic Beam Hopping for DVB-S2X Satellite: A Multi-Objective Deep Reinforcement Learning Approach

An Online Power Allocation Algorithm Based on Deep Reinforcement Learning in Multibeam Satellite Systems

Resource Allocation Using Deep Reinforcement Learning in GEO Multibeam Satellite System.

A DRL Resource Allocation for Downlink NOMA Multi-beam Satellite Communications.

Satellite-Terrestrial Coordinated Multi-Satellite Beam Hopping Scheduling Based on Multi-Agent Deep Reinforcement Learning

Multi-Agent Deep Reinforcement Learning-Based Flexible Satellite Payload for Mobile Terminals.

Dynamic Power Allocation in High Throughput Satellite Communications: A Two-Stage Advanced Heuristic Learning Approach

Dynamic Beam Hopping for DVB-S2X GEO Satellite: A DRL-Powered GA Approach

A Deep Reinforcement Learning-Based Framework for Dynamic Resource Allocation in Multibeam Satellite Systems.