Abstract:Large models like transformer models cannot be directly applied and trained at the network edge since edge devices are heterogeneous and often equipped with limited resources. To collaboratively train a large model at the network edge, existing works assign sub-models of the large model with proper sizes to each edge device, and adopt advanced technologies like knowledge distillation to aggregate their local models to update the large model. However, these methods assign sub-models in a coarse-grained manner and require a certain amount of edge devices to train the large model locally, thus cannot achieve efficient edge learning for large models in practice. In this paper, to make the utmost of edge devices to collaboratively train a large model faster and better, we propose a novel synchronous edge learning framework to achieve efficient large model training across heterogeneous resource-limited edge devices. Specifically, to reduce the waiting time in the training process while ensuring that every edge device can afford the local training, we design a capability-aware local model customization mechanism to granularly tailor personalized model structures for each edge device based on their memory and computing capabilities to make them have similar local training times. Then, to efficiently integrate the local training achievements of heterogeneous edge devices, we propose a layer augmentation-based heterogeneous model aggregation mechanism to align local models and quickly and effectively get the global update for the global large model. Besides, we propose a monument distillation-based model deployment mechanism to deploy the updated global large model at edge devices without the loss of learned knowledge. In this way, each edge device can fully contribute to the global large model training, thus achieving efficient edge learning for the large model. Experimental results demonstrate that our framework has better performance in accuracy and efficiency than the state-of-art edge learning frameworks in the heterogeneous environment.

Efficiently Distilling LLMs for Edge Applications

MCMC: Multi-Constrained Model Compression Via One-Stage Envelope Reinforcement Learning.

EDGE-LLM: Enabling Efficient Large Language Model Adaptation on Edge Devices via Layerwise Unified Compression and Adaptive Layer Tuning and Voting

Edge-LLM: A Collaborative Framework for Large Language Model Serving in Edge Computing

Energy-Efficient Split Learning for Fine-Tuning Large Language Models in Edge Networks

Adaptive Layer Splitting for Wireless LLM Inference in Edge Computing: A Model-Based Reinforcement Learning Approach

Building on Efficient Foundations: Effectively Training LLMs with Structured Feedforward Layers

EdgeShard: Efficient LLM Inference via Collaborative Edge Computing

Fine-Tuning and Deploying Large Language Models Over Edges: Issues and Approaches

Towards Efficient Edge Learning for Large Models in Heterogeneous Resource-limited Environments.

Lillama: Large Language Models Compression via Low-Rank Feature Distillation

Activation Sparsity Opportunities for Compressing General Large Language Models

CHESS: Optimizing LLM Inference via Channel-Wise Thresholding and Selective Sparsification

Federated Fine-Tuning of LLMs on the Very Edge: The Good, the Bad, the Ugly

Efficient and Economic Large Language Model Inference with Attention Offloading

Search for Efficient Large Language Models

Efficient federated learning on resource-constrained edge devices based on model pruning

Training Latency Minimization for Model-Splitting Allowed Federated Edge Learning

Delta-CoMe: Training-Free Delta-Compression with Mixed-Precision for Large Language Models

Toward Communication-Efficient Federated Learning in the Internet of Things with Edge Computing.