Collaborative Task Offloading and Resource Allocation in Small-Cell MEC: A Multi-Agent PPO-Based Scheme

Han Li,Ke Xiong,Yuping Lu,Wei Chen,Pingyi Fan,Khaled Ben Letaief
DOI: https://doi.org/10.1109/tmc.2024.3496536
IF: 6.075
2024-01-01
IEEE Transactions on Mobile Computing
Abstract:Small-cell mobile edge computing (SE-MEC) networks amalgamate the virtues of MEC and small-cell networks, providing user devices (UDs) with lower-latency services and enhancing data processing capabilities. Nevertheless, time-varying wireless channels, dynamic UD requirements, and severe interference among UDs make it difficult to fully exploit the limited network resources and stably provide computing services for UDs. Therefore, efficient task offloading and resource allocation (TORA) is essential to SE-MEC networks. Moreover, since multiple small cells are deployed, decentralized TORA schemes are preferred in practice. Thus, this paper aims to design distributed adaptive TORA schemes with low communication overhead for SE-MEC networks. In pursuit of an eco-friendly design, an optimization problem with the goal of minimizing the total energy consumption (TEC) of UDs subject to delay constraints is formulated. To effectively deal with network's dynamic characteristics, the reinforce learning framework is applied, where the TEC minimization problem is first modeled as a partially observable Markov decision process (POMDP), and then an efficient multi-agent proximal policy optimization (MAPPO)-based scheme is presented to solve it. In the presented MAPPO-based scheme, each small-cell base station (SBS) serves as an agent and is capable of making TORA decisions only with its own local information. To promote collaboration among multiple agents, a global reward function is designed, where both TEC and delay constraints satisfaction probability (DCSP) of the UDs are taken into account. A state normalization mechanism is also introduced into the presented MAPPO-based scheme for enhancing learning performance. Simulation results show that although the proposed MAPPO-based scheme works in a distributed manner, it achieves very similar performance to the centralized one, and it also shows that there exists a trade-off between DCSP and TEC. In addition, it is demonstrated that the state normalization mechanism has a significant effect on reducing TEC.
What problem does this paper attempt to address?