Abstract:In this study, we examine the problem of downlink wireless routing in integrated access backhaul (IAB) networks involving fiber-connected base stations, wireless base stations, and multiple users. Physical constraints prevent the use of a central controller, leaving base stations with limited access to real-time network conditions. These networks operate in a time-slotted regime, where base stations monitor network conditions and forward packets accordingly. Our objective is to maximize the arrival ratio of packets, while simultaneously minimizing their latency. To accomplish this, we formulate this problem as a multi-agent partially observed Markov Decision Process (POMDP). Moreover, we develop an algorithm that uses Multi-Agent Reinforcement Learning (MARL) combined with Advantage Actor Critic (A2C) to derive a joint routing policy on a distributed basis. Due to the importance of packet destinations for successful routing decisions, we utilize information about similar destinations as a basis for selecting specific-destination routing decisions. For portraying the similarity between those destinations, we rely on their relational base-station associations, i.e., which base station they are currently connected to. Therefore, the algorithm is referred to as Relational Advantage Actor Critic (Relational A2C). To the best of our knowledge, this is the first work that optimizes routing strategy for IAB networks. Further, we present three types of training paradigms for this algorithm in order to provide flexibility in terms of its performance and throughput. Through numerical experiments with different network scenarios, Relational A2C algorithms were demonstrated to be capable of achieving near-centralized performance even though they operate in a decentralized manner in the network of interest. Based on the results of those experiments, we compare Relational A2C to other reinforcement learning algorithms, like Q-Routing and Hybrid Routing. This comparison illustrates that solving the joint optimization problem increases network efficiency and reduces selfish agent behavior.

Multiagent Meta-Reinforcement Learning for Adaptive Multipath Routing Optimization

Toward Packet Routing With Fully Distributed Multiagent Deep Reinforcement Learning

Toward Packet Routing with Fully-distributed Multi-agent Deep Reinforcement Learning

Multi-agent reinforcement learning for network routing in integrated access backhaul networks

Packet Routing with Graph Attention Multi-agent Reinforcement Learning

Routing optimization with Monte Carlo Tree Search-based multi-agent reinforcement learning

Multi-Agent Path Finding Method Based on Evolutionary Reinforcement Learning

Multi-agent Deep Reinforcement Learning for Resilience-Driven Routing and Scheduling of Mobile Energy Storage Systems

A Multi-Policy Deep Reinforcement Learning Approach for Multi-Objective Joint Routing and Scheduling in Deterministic Networks

Packet Routing Against Network Congestion: A Deep Multi-agent Reinforcement Learning Approach

Multi-Vehicle Routing Problems with Soft Time Windows: A Multi-Agent Reinforcement Learning Approach

Scalable Deep Reinforcement Learning for Routing and Spectrum Access in Physical Layer

Routing Optimization With Deep Reinforcement Learning in Knowledge Defined Networking

Routing Protocol Design for Underwater Optical Wireless Sensor Networks: A Multiagent Reinforcement Learning Approach.

Digital Twin Enhanced Multi-Agent Reinforcement Learning for Large-Scale Mobile Network Coverage Optimization

Scalable Model-based Policy Optimization for Decentralized Networked Systems

An adaptive intelligent routing algorithm based on deep reinforcement learning

Meta-Learning-Based Deep Reinforcement Learning for Multiobjective Optimization Problems

Combining multi-agent deep deterministic policy gradient and rerouting technique to improve traffic network performance under mixed traffic conditions

Optimising Stochastic Routing for Taxi Fleets with Model Enhanced Reinforcement Learning

Distributed and Adaptive Traffic Engineering with Deep Reinforcement Learning