Abstract:Due to the non-uniform geographic distribution and time-varying characteristics of the ground traffic request, how to make full use of the limited beam resources to serve users flexibly and efficiently is a brand-new challenge for beam hopping satellite systems. The conventional greedy-based beam hopping methods do not consider the long-term reward, which is difficult to deal with the time-varying traffic demand. Meanwhile, the heuristic algorithms such as genetic algorithm have a slow convergence time, which can not achieve real-time scheduling. Furthermore, existing methods based on deep reinforcement learning (DRL) only make decisions on beam patterns, lack of the freedom of bandwidth. This paper proposes a dynamic beam pattern and bandwidth allocation scheme based on DRL, which flexibly uses three degrees of freedom of time, space and frequency. Considering that the joint allocation of bandwidth and beam pattern will lead to an explosion of action space, a cooperative multi-agents deep reinforcement learning (MADRL) framework is presented in this paper, where each agent is only responsible for the illumination allocation or bandwidth allocation of one beam. The agents can learn to collaborate by sharing the same reward to achieve the common goal, which refers to maximize the throughput and minimize the delay fairness between cells. Simulation results demonstrate that the offline trained MADRL model can achieve real-time beam pattern and bandwidth allocation to match the non-uniform and time-varying traffic request. Furthermore, when the traffic demand increases, our model has a good generalization ability.

Multi-objective deep reinforcement learning based time-frequency resource allocation for multi-beam satellite communications

Multi-Agent DRL for Two-Timescale Bandwidth Allocation in Multi-Beam Satellite Networks

DRL-Based Dynamic Resource Allocation for Multi-Beam Satellite Systems

Improved Satellite Resource Allocation Algorithm Based on DRL and MOP

User-level Scheduling and Resource Allocation for Multi-Beam Satellite Systems with Full Frequency Reuse

Sequential Dynamic Resource Allocation in Multi-Beam Satellite Systems: A Learning-Based Optimization Method

Resource Allocation Using Deep Reinforcement Learning in GEO Multibeam Satellite System.

A DRL Resource Allocation for Downlink NOMA Multi-beam Satellite Communications.

Dynamic Beam Pattern and Bandwidth Allocation Based on Multi-Agent Deep Reinforcement Learning for Beam Hopping Satellite Systems

Resource Allocation Algorithm for Multi-Beam LEO Satellite Based on Decision Performance Evaluation

Deep Reinforcement Learning Based Resource Allocation for RSMA in LEO Satellite-Terrestrial Networks

Deep Reinforcement Learning Based Dynamic Channel Allocation Algorithm in Multibeam Satellite Systems

Dynamic Resource Allocation With Deep Reinforcement Learning in Multibeam Satellite Communication

A Deep Reinforcement Learning-Based Framework for Dynamic Resource Allocation in Multibeam Satellite Systems.

Research on Joint Resource Allocation for Multibeam Satellite Based on Metaheuristic Algorithms

Dynamic Power Allocation in High Throughput Satellite Communications: A Two-Stage Advanced Heuristic Learning Approach

Joint Beam Direction Control and Radio Resource Allocation in Dynamic Multi-beam LEO Satellite Networks

BeiDou Short-Message Satellite Resource Allocation Algorithm Based on Deep Reinforcement Learning

A Novel Deep Reinforcement Learning Architecture for Dynamic Power and Bandwidth Allocation in Multibeam Satellites