Abstract:Future network services must adapt to the highly dynamic uplink and downlink traffic. To fulfill this requirement, the 3rd Generation Partnership Project (3GPP) proposed dynamic time division duplex (D-TDD) technology in Long Term Evolution (LTE) Release 11. Afterward, the 3GPP RAN#86 meeting clarified that 5G NR needs to support dynamic adjustment of the duplex pattern (transmission direction) in the time domain. Although 5G NR provides a more flexible duplex pattern, how to configure an effective duplex pattern according to services traffic is still an open research area. In this research, we propose a distributed multi-agent deep reinforcement learning (MARL) based decentralized D-TDD configuration method. First, we model a D-TDD configuration problem as a dynamic programming problem. Given the buffer length of all UE, we model the D-TDD configuration policy as a conditional probability distribution. Our goal is to find a D-TDD configuration policy that maximizes the expected discount return of all UE's sum rates. Second, in order to reduce signaling overhead, we design a fully decentralized solution with distributed MARL technology. Each agent in MARL makes decisions only based on local observations. We regard each base station (BS) as an agent, and each agent configures uplink and downlink time slot ratio according to length of intra-BS user (UE) queue buffer. Third, in order to solve the problem of overall system revenue caused by the lack of global information in MARL, we apply leniency control and binary LSTM (BLSTM) based auto-encoder. Leniency controller effectively controls Q-value estimation process in MARL according to Q-value and current network conditions, and auto-encoder makes up for the defect that leniency control cannot handle complex environments and high-dimensional data. Through the parallel distributed training, the global D-TDD policy is obtained. This method deploys the MARL algorithm on the Mobile Edge Computing (MEC) server of each BS and uses the storage and computing capabilities of the server for distributed training. The simulation results show that the proposed distributed MARL converges stably in various environments, and performs better than distributed deep reinforcement algorithm.

Coverage Optimization for Large-Scale Mobile Networks with Digital Twin and Multi-Agent Reinforcement Learning

Digital Twin Enhanced Multi-Agent Reinforcement Learning for Large-Scale Mobile Network Coverage Optimization

Mobile Cell-Free Massive MIMO with Multi-Agent Reinforcement Learning: A Scalable Framework

Coordinated Reinforcement Learning for Optimizing Mobile Networks

Multi-Agent Reinforcement Learning Based Unlicensed Resource Sharing for LTE-U Networks.

Coverage and Capacity Optimization in STAR-RISs Assisted Networks: A Machine Learning Approach

Deep Reinforcement Learning Based Massive Access Management for Ultra-Reliable Low-Latency Communications

Multi-Agent Reinforcement Learning for Multi-Cell Spectrum and Power Allocation

Parallel Digital Twin-driven Deep Reinforcement Learning for User Association and Load Balancing in Dynamic Wireless Networks

A novel handover scheme for millimeter wave network: An approach of integrating reinforcement learning and optimization

Multi-agent Reinforcement Learning for Energy Saving in Multi-Cell Massive MIMO Systems

Multi-Agent Deep Reinforcement Learning for Resilience Optimization in 5G RAN

Joint Optimization of Handover Control and Power Allocation Based on Multi-Agent Deep Reinforcement Learning

Multi-Agent Reinforcement Learning based Uplink OFDMA for IEEE 802.11ax Networks

Coverage-aware and Reinforcement Learning Using Multi-agent Approach for HD Map QoS in a Realistic Environment

Deep Learning–Based Coverage and Capacity Optimization

Sim-to-Real Optimization of Complex Real World Mobile Network with Imperfect Information via Deep Reinforcement Learning from Self-play

Deployment Optimization for Shared e-Mobility Systems With Multi-Agent Deep Neural Search

A collaborative optimization strategy for computing offloading and resource allocation based on multi-agent deep reinforcement learning

Scalable Model-based Policy Optimization for Decentralized Networked Systems

Multi-Agent Reinforcement Learning Based Fully Decentralized Dynamic Time Division Configuration for 5G and B5G Network