Dynamic Beam Pattern Based on Cooperation Multi-Agent VDN-D3QN for LEO Satellite Communication System

Meng,Bo Hu,Shanzhi Chen,Shaoli Kang
DOI: https://doi.org/10.1109/tgcn.2024.3457242
2024-01-01
IEEE Transactions on Green Communications and Networking
Abstract:Due to the cooperative coverage characteristic of LEO satellites and non-uniform traffic demand of beam positions, allocating the limited beam and power resource to massive beam positions flexibly and effectively is a challenge in beam hopping LEO satellite communication system. The agents in existing beam hopping schemes, which rely on deep reinforcement learning, are limited to acquiring state information within the coverage area of LEO satellite. For this reason, we propose a cooperation multi-agent Value-Decomposition Networks with Dueling Double Deep Q-Learning Network (VDN-D3QN) framework to generate dynamic beam hopping pattern for assuring delay fairness and throughput among beam positions in LEO satellite communication system. The proposed VDN-D3QN dynamic beam hopping method is divided into training and test phase, where each agent is only responsible for the beam hopping pattern of one LEO satellite. During the train phase, the agents learn to cooperate with other agents to maximize the system throughput and minimize the delay fairness among beam positions by Dueling Double Deep Q-Learning Network. Then, the Value-Decomposition Networks is employed to learn the optimal policy in a centralized manner through interaction with the environment. In test phase, the trained agents are deployed to address the challenging problem of inter-satellite communication in a distributed manner, and one agent is deployed per LEO satellite. The trained agents can make decisions about the dynamic beam hopping pattern based on the available local state information in LEO satellite communication system. The evaluation results demonstrate that the proposed multi-agent VDN-D3QN algorithm can effectively handle the non-uniform traffic demand of multi-satellites simultaneously. Besides, the simulation results indicate that the proposed VDN-D3QN algorithm can allocate resource intelligently for adapting the requirements of beam positions and achieving better performance compared to the baselines.
What problem does this paper attempt to address?