Deep Reinforcement Learning based Routing for Non-cooperative Multi-flow Games in Dynamic AANETs
Huasen He,Kaixuan Sun,Shuangwu Chen,Xiaofeng Jiang,Rangang Zhu,Jian Yang
DOI: https://doi.org/10.1109/TVT.2024.3440949
IF: 6.8
2024-01-01
IEEE Transactions on Vehicular Technology
Abstract:Aeronautical Ad hoc Networks (AANETs) have been identified as pivotal constituents of the Next Generation Wireless Communication Network (NGWCN), courtesy of their ability to facilitate global coverage and low-latency network services. However, in contrast with terrestrial networks, largescale AANETs exhibit distinct characteristics of high dynamics, which impose considerable challenges to global state synchronization for computing routing paths. For suppressing synchronization overhead, we consider partial-observable network state to make routing decisions. Specifically, we formulate the multiflow routing problem as a non-cooperative Multi-Player Partially Observable Markov Decision Process (MP-POMDP) game, in which each flow acting as a player aims to maximize its own transmission bandwidth, while consciously avoiding conflicts with bandwidth already occupied by other flows. To tackle the high dimensional state space of the proposed MP-POMDP game, we employ the Deep Reinforcement Learning (DRL) approach to develop a novel Distributed Game based Multi-flow Routing (DGMR) algorithm by utilizing a parallel multi-agent scheme. In DGMR, each flow is equipped with an agent for routing selection and the agent will move along the routing path and utilize the recently observed states to make the next-hop routing decision. Moreover, to provide fixed-size inputs for neural networks, a Pareto-based Optimal Neighbor Selection (PONS) algorithm based on Pareto optimality theory is proposed to filter out a fixed number of neighbors from variable neighbor sets of aircraft. The selected neighbors are proximal to the destination and have sufficient available bandwidth resources, which guarantee highquality routing decisions. The experimental results show that DGMR has high scalability and achieves up to ten times of bandwidth utilization than the benchmark algorithms.