Cooperative Multi-Agent Deep Reinforcement Learning Methods for UAV-aided Mobile Edge Computing Networks

Mintae Kim,Hoon Lee,Sangwon Hwang,Merouane Debbah,Inkyu Lee
2024-07-04
Abstract:This paper presents a cooperative multi-agent deep reinforcement learning (MADRL) approach for unmmaned aerial vehicle (UAV)-aided mobile edge computing (MEC) networks. An UAV with computing capability can provide task offlaoding services to ground internet-of-things devices (IDs). With partial observation of the entire network state, the UAV and the IDs individually determine their MEC strategies, i.e., UAV trajectory, resource allocation, and task offloading policy. This requires joint optimization of decision-making process and coordination strategies among the UAV and the IDs. To address this difficulty, the proposed cooperative MADRL approach computes two types of action variables, namely message action and solution action, each of which is generated by dedicated actor neural networks (NNs). As a result, each agent can automatically encapsulate its coordination messages to enhance the MEC performance in the decentralized manner. The proposed actor structure is designed based on graph attention networks such that operations are possible regardless of the number of IDs. A scalable training algorithm is also proposed to train a group of NNs for arbitrary network configurations. Numerical results demonstrate the superiority of the proposed cooperative MADRL approach over conventional methods.
Information Theory
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to achieve efficient task offloading, resource allocation and UAV trajectory optimization through the multi - agent deep reinforcement learning (MADRL) method in the UAV - aided mobile edge computing (UAV - aided MEC) network. Specifically, the research aims to minimize the total energy consumption of ground Internet of Things devices (IDs) while ensuring that tasks are completed within the specified time. This requires jointly optimizing the flight trajectory of UAVs, computing resource allocation and task offloading strategies, and these optimization variables need to be independently determined by UAVs and ground devices in a partially observable environment. ### Problem Background In the UAV - aided mobile edge computing system, UAVs can provide computing services for ground Internet of Things devices, thereby reducing the local computing burden and latency. However, due to the mobility of UAVs and the dynamic changes of the network environment, how to effectively perform task offloading, resource allocation and trajectory planning is a complex challenge. Traditional centralized deep reinforcement learning methods are difficult to handle large - scale heterogeneous networks, so a distributed method is needed to deal with this problem. ### Research Objectives The goal of the paper is to propose a new cooperative multi - agent deep reinforcement learning framework (C - MADDPG) to achieve autonomous coordination between UAVs and ground devices. This framework enables each agent to effectively cooperate under partial observation by introducing message actions and solution actions, thereby optimizing the performance of the entire system. ### Key Challenges 1. **Partial Observability**: Each agent can only obtain part of the state information of the network, so an effective coordination mechanism needs to be designed. 2. **Scalability**: The number of devices in the network may change dynamically, so an algorithm that can adapt to networks of different scales needs to be designed. 3. **Continuous Action Space**: The trajectory and resource allocation of UAVs involve a continuous action space, which places higher requirements on reinforcement learning algorithms. ### Solutions The C - MADDPG framework proposed in the paper solves the above problems in the following ways: 1. **Introducing Message Actions**: Pass necessary statistical information through message actions to help each agent better understand the global state. 2. **Graph Attention Network (GAT)**: Use GAT to handle different numbers of ground devices, making the model more scalable. 3. **Parameter Sharing Strategy**: All ground devices use the same neural network structure, reducing the number of training parameters and improving the model's generalization ability. 4. **Joint Training Strategy**: Adopt the centralized training and decentralized execution (CTDE) method to ensure that each agent can work in coordination. ### Summary In general, this paper aims to solve the problems of task offloading, resource allocation and trajectory optimization in the UAV - aided mobile edge computing network by introducing the cooperative multi - agent deep reinforcement learning framework, thereby improving the overall performance and energy efficiency of the system.