Large Language Models (llms) Inference Offloading and Resource Allocation in Cloud-Edge Networks: an Active Inference Approach

Jingcheng Fang,Ying He,F. Richard Yu,Jianqiang Li,Victor C. Leung
DOI: https://doi.org/10.1109/vtc2023-fall60731.2023.10333824
IF: 6.075
2024-01-01
IEEE Transactions on Mobile Computing
Abstract:As the research and applications of large language model (LLM) become increasingly sophisticated, it is difficult for resource-limited mobile terminals to run large-model inference tasks efficiently. Traditional deep reinforcement learning (DRL) based approaches have been used to offload LLM inference tasks to servers. However, existing solutions suffer from data inefficiency, insensitivity to latency requirements, and non-adaptability to task load variations. In this paper, we propose an active inference with rewardless guidance algorithm using expected future free energy for offloading decisions and allocating resources for the LLM inference task offloading and resource allocation problem of cloud-edge networks systems. Experimental results show that our proposed method has superior performance over mainstream DRLs, improves in data utilization efficiency, and is more adaptable to changing task load scenarios.
What problem does this paper attempt to address?