Telemetry-aided cooperative multi-agent online reinforcement learning for DAG task scheduling in computing power networks
Yunfeng Duan,Jingchun Li,Hao Sun,Fanqin Zhou,Jiaxing Chen,Tiandong Wu,Wenjing Li,Yuxing Fan
DOI: https://doi.org/10.1016/j.simpat.2023.102885
IF: 4.199
2024-01-07
Simulation Modelling Practice and Theory
Abstract:As demand for computing power and low latency in intelligence applications grows, the efficient management and coordination of resources in computing power networks become crucial. This paper presents a telemetry-aided multi-agent cooperation framework for DAG task scheduling in computing power networks. Utilizing distributed agents with network telemetry, the framework accurately assesses local network state information, formulates scheduling policies, and assigns tasks to edge servers. An online learning algorithm for DAG task scheduling is also introduced to enhance the cooperation strategy in decision-making, enabling rapid task scheduling and resource allocation decisions. Simulation results demonstrate a minimum 13.5% reduction in total task execution time compared to sub-optimal methods, along with improved node and link load balancing.
computer science, interdisciplinary applications, software engineering