When Large Language Model Agents Meet 6G Networks: Perception, Grounding, and Alignment

Minrui Xu,Dusit Niyato,Jiawen Kang,Zehui Xiong,Shiwen Mao,Zhu Han,Dong In Kim,Khaled B. Letaief
2024-02-17
Abstract:AI agents based on multimodal large language models (LLMs) are expected to revolutionize human-computer interaction and offer more personalized assistant services across various domains like healthcare, education, manufacturing, and entertainment. Deploying LLM agents in 6G networks enables users to access previously expensive AI assistant services via mobile devices democratically, thereby reducing interaction latency and better preserving user privacy. Nevertheless, the limited capacity of mobile devices constrains the effectiveness of deploying and executing local LLMs, which necessitates offloading complex tasks to global LLMs running on edge servers during long-horizon interactions. In this article, we propose a split learning system for LLM agents in 6G networks leveraging the collaboration between mobile devices and edge servers, where multiple LLMs with different roles are distributed across mobile devices and edge servers to perform user-agent interactive tasks collaboratively. In the proposed system, LLM agents are split into perception, grounding, and alignment modules, facilitating inter-module communications to meet extended user requirements on 6G network functions, including integrated sensing and communication, digital twins, and task-oriented communications. Furthermore, we introduce a novel model caching algorithm for LLMs within the proposed system to improve model utilization in context, thus reducing network costs of the collaborative mobile and edge LLM agents.
Artificial Intelligence,Networking and Internet Architecture
What problem does this paper attempt to address?
The paper primarily explores how to achieve efficient, flexible, and long-term AI assistant services by deploying large language model (LLM) agents on mobile devices and edge servers in a 6G network environment. Specifically, the paper attempts to address the following key issues: 1. **Environmental Awareness**: The paper discusses how to utilize multimodal sensing technology to enable mobile LLM agents to process visual, auditory, and other information from different sensors, and to understand the surrounding environment in conjunction with human instructions. 2. **Decision Grounding**: It proposes maintaining a digital replica of a physical entity on edge servers through Digital Twins technology to assist mobile LLM agents in making decisions and planning based on a global perspective. 3. **Task-Oriented Communication**: The study investigates how to optimize task-oriented communication strategies in resource-constrained 6G networks to ensure efficient information exchange between mobile LLM agents and edge LLM agents, thereby better completing complex tasks. 4. **Distributed Computing Architecture**: A split learning system based on collaborative end-edge-cloud computing is designed, offloading computation-intensive tasks to global LLMs on edge servers, while local LLMs are responsible for real-time sensing and responding to user needs. 5. **Model Caching Algorithm**: A new model caching algorithm, Least Age-of-Thought (LAoT), is introduced to improve model utilization and reduce network costs, thereby supporting the services of distributed mobile and edge LLM agents. In summary, this paper aims to overcome the challenges faced by deploying LLM agents on current mobile devices, such as limited computing power and context window constraints, through the aforementioned methods, thereby achieving broader and more personalized AI assistant service proliferation.