Abstract:Due to the non-uniform geographic distribution and time-varying characteristics of the ground traffic request, how to make full use of the limited beam resources to serve users flexibly and efficiently is a brand-new challenge for beam hopping satellite systems. The conventional greedy-based beam hopping methods do not consider the long-term reward, which is difficult to deal with the time-varying traffic demand. Meanwhile, the heuristic algorithms such as genetic algorithm have a slow convergence time, which can not achieve real-time scheduling. Furthermore, existing methods based on deep reinforcement learning (DRL) only make decisions on beam patterns, lack of the freedom of bandwidth. This paper proposes a dynamic beam pattern and bandwidth allocation scheme based on DRL, which flexibly uses three degrees of freedom of time, space and frequency. Considering that the joint allocation of bandwidth and beam pattern will lead to an explosion of action space, a cooperative multi-agents deep reinforcement learning (MADRL) framework is presented in this paper, where each agent is only responsible for the illumination allocation or bandwidth allocation of one beam. The agents can learn to collaborate by sharing the same reward to achieve the common goal, which refers to maximize the throughput and minimize the delay fairness between cells. Simulation results demonstrate that the offline trained MADRL model can achieve real-time beam pattern and bandwidth allocation to match the non-uniform and time-varying traffic request. Furthermore, when the traffic demand increases, our model has a good generalization ability.

Dynamic Resource Allocation With Deep Reinforcement Learning in Multibeam Satellite Communication

Dynamic Resource Allocation With Deep Reinforcement Learning in Multibeam Satellite Communication

BeiDou Short-Message Satellite Resource Allocation Algorithm Based on Deep Reinforcement Learning

DRL-Based Dynamic Resource Allocation for Multi-Beam Satellite Systems

Deep Reinforcement Learning-Based Autonomous Mission Planning Method for High and Low Orbit Multiple Agile Earth Observing Satellites

Collaborative Deep Reinforcement Learning for Resource Optimization in Non-Terrestrial Networks

Dynamic Channel Allocation for Satellite Internet of Things via Deep Reinforcement Learning

Deep Reinforcement Learning Architecture for Continuous Power Allocation in High Throughput Satellites

A Satellite Adaptive Modulation Coding Method Based on Deep Reinforcement Learning

Resource Scheduling Based on Deep Reinforcement Learning in UAV Assisted Emergency Communication Networks

Multi-objective deep reinforcement learning based time-frequency resource allocation for multi-beam satellite communications

Deep Reinforcement Learning-Based Power Allocation for Rate-Splitting Multiple Access in 6G LEO Satellite Communication System

DDPG with Transfer Learning and Meta Learning Framework for Resource Allocation in Underlay Cognitive Radio Network

Dynamic Beam Pattern and Bandwidth Allocation Based on Multi-Agent Deep Reinforcement Learning for Beam Hopping Satellite Systems

Dynamic resource allocation in IRS-assisted UAV wideband cognitive radio networks: A DDQN-TD3 approach

Penalized Reinforcement Learning-Based Energy-Efficient UAV-RIS Assisted Maritime Uplink Communications Against Jamming

Dynamic Spectrum Sharing Based on Deep Reinforcement Learning in Mobile Communication Systems

Collaborative Computing in Non-Terrestrial Networks: A Multi-Time-Scale Deep Reinforcement Learning Approach

Event-Triggered Deep Reinforcement Learning for Dynamic Task Scheduling in Multi-Satellite Resource Allocation