Abstract:On-device large language models (LLMs), referring to running LLMs on edge devices, have raised considerable interest owing to their superior privacy, reduced latency, and bandwidth saving. Nonetheless, the capabilities of on-device LLMs are intrinsically constrained by the limited capacity of edge devices compared to the much more powerful cloud centers. To bridge the gap between cloud-based and on-device AI, mobile edge intelligence (MEI) presents a viable solution to this problem by provisioning AI capabilities within the edge of mobile networks with improved privacy and latency relative to cloud computing. MEI sits between on-device AI and cloud-based AI, featuring wireless communications and more powerful computing resources than end devices. This article provides a contemporary survey on harnessing MEI for LLMs. We first cover the preliminaries of LLMs, starting with LLMs and MEI, followed by resource-efficient LLM techniques. We then illustrate several killer applications to demonstrate the need for deploying LLMs at the network edge and present an architectural overview of MEI for LLMs (MEI4LLM). Subsequently, we delve into various aspects of MEI4LLM, extensively covering edge LLM caching and delivery, edge LLM training, and edge LLM inference. Finally, we identify future research opportunities. We aim to inspire researchers in the field to leverage mobile edge computing to facilitate LLM deployment in close proximity to users, thereby unleashing the potential of LLMs across various privacy- and delay-sensitive applications.
What problem does this paper attempt to address?
### Problems the Paper Aims to Solve
This paper aims to explore how to utilize Mobile Edge Intelligence (MEI) to support the deployment of Large Language Models (LLMs) on edge devices. Specifically, the paper attempts to address the following key issues:
1. **Gap between Cloud Centers and Edge Devices**:
- Currently, most LLMs primarily run in cloud data centers, which brings inherent issues such as data privacy breaches, high bandwidth costs, and long service delays.
- Edge devices, due to limited computing, storage, and memory resources, find it challenging to support large-scale LLMs.
2. **Deployment of LLMs on Edge Devices**:
- Although deploying LLMs on edge devices can enhance privacy protection and reduce latency, existing industrial efforts mainly focus on smaller-scale LLMs with parameters less than 1 billion, which have relatively limited functionality.
- With increasing demand, there is a need to deploy larger-scale LLMs on edge devices, but this significantly increases computational and storage overhead.
3. **Model Training on Edge Devices**:
- Fine-tuning on edge devices can achieve personalized and context-aware AI applications, but due to high training costs, existing edge LLM products usually do not include this feature.
4. **Resource-Efficient Deployment**:
- How to efficiently cache, transmit, train, and infer LLMs in mobile edge networks to improve storage, communication, and computing efficiency.
### Solutions
To address the above challenges, the paper proposes the following solutions:
1. **Mobile Edge Intelligence (MEI)**:
- MEI provides computing resources at the network edge, between device-end AI and cloud AI, combining wireless communication technology to achieve low-latency and high-privacy AI services.
- The development of 6G mobile networks will enable MEI to support low-latency inference and training services for large-scale LLMs.
2. **Resource-Efficient LLM Technologies**:
- Including parameter-efficient fine-tuning, split inference/learning, efficient LLM caching and transmission technologies to optimize storage, computing, and communication efficiency.
3. **Architectural Design**:
- Proposes an AI-native architecture, including parameter-sharing LLM caching and transmission, distributed LLM training/fine-tuning, and distributed LLM inference to support the deployment of LLMs on edge devices.
4. **Application Scenarios**:
- The paper details several key application scenarios, such as mobile health, humanoid robots, autonomous driving, and virtual assistants, demonstrating the necessity and advantages of deploying LLMs on edge devices.
Through these methods, the paper aims to promote the migration of LLMs from cloud centers to edge devices, unleashing their potential in various privacy-sensitive and latency-sensitive applications.