Abstract:Large Language Models (LLM) and foundation models are popular as they offer new opportunities for individuals and businesses to improve natural language processing, interact with data, and retrieve information faster. However, training or fine-tuning LLMs requires a vast amount of data, which can be challenging to access due to legal or technical restrictions and may require private computing resources. Federated Learning (FL) is a solution designed to overcome these challenges and expand data access for deep learning applications. This paper takes a hardware-centric approach to explore how LLMs can be brought to modern edge computing systems. Our study fine-tunes the FLAN-T5 model family, ranging from 80M to 3B parameters, using FL for a text summarization task. We provide a micro-level hardware benchmark, compare the model FLOP utilization to a state-of-the-art data center GPU, and study the network utilization in realistic conditions. Our contribution is twofold: First, we evaluate the current capabilities of edge computing systems and their potential for LLM FL workloads. Second, by comparing these systems with a data-center GPU, we demonstrate the potential for improvement and the next steps toward achieving greater computational efficiency at the edge.

What problem does this paper attempt to address?

The problems that this paper attempts to solve are: How to efficiently conduct federated fine - tuning of large - language models (LLMs) and foundation models (FMs) on resource - constrained network edge devices while meeting new regulatory requirements, such as the energy - efficiency requirements in the EU AI Act. Specifically, the paper focuses on the following aspects: 1. **Limited computing resources**: Edge devices (such as embedded devices) have low memory bandwidth, which limits the computing potential of federated learning applications. Compared with data centers, the memory bandwidth of edge devices is significantly lower, which will affect key operations during the training process and lead to severe training - time delays. 2. **Energy - efficiency first**: With the introduction of the EU AI Act, service providers need to focus on energy - efficient operations. However, in the federated learning system, since client - side hardware configurations vary and are usually temporarily involved in training, it is difficult to obtain detailed hardware information, and thus it is difficult to use traditional performance evaluation metrics (such as Model - FLOP Utilization, MFU) to measure energy efficiency. 3. **Training challenges of large - scale models**: Foundation models (FMs) have a large number of parameters and are more difficult to train and fine - tune. During the federated learning process, the gradients of these models are more likely to explode or disappear, increasing the complexity of training. 4. **High communication costs**: Edge devices are usually located in geographically widely - distributed areas, with limited network bandwidth, and the communication costs are much higher than in data - center environments. Especially when dealing with deep - learning models with millions of parameters, communication overhead may become the main bottleneck. Based on the above challenges, the research questions of the paper can be summarized as: How can efficient foundation - model training and fine - tuning be achieved on resource - constrained network edge devices? Which factors can have the greatest impact on improving the efficiency of the federated learning system? By studying these problems, the author hopes to provide theoretical and technical support for the application of federated learning on edge devices, promote sustainable computing, and ensure compliance with relevant laws and regulations.

Federated Fine-Tuning of LLMs on the Very Edge: The Good, the Bad, the Ugly

Federated Large Language Model: Solutions, Challenges and Future Directions

FLEdge: Benchmarking Federated Machine Learning Applications in Edge Computing Systems

EDGE-LLM: Enabling Efficient Large Language Model Adaptation on Edge Devices via Layerwise Unified Compression and Adaptive Layer Tuning and Voting

Fine-Tuning and Deploying Large Language Models Over Edges: Issues and Approaches

Automated Federated Pipeline for Parameter-Efficient Fine-Tuning of Large Language Models

FedPT: Federated Proxy-Tuning of Large Language Models on Resource-Constrained Edge Devices

Towards Efficient Model-Heterogeneity Federated Learning for Large Models

Optimizing Federated Learning with Heterogeneous Edge Devices

OpenFedLLM: Training Large Language Models on Decentralized Private Data via Federated Learning

Experimental Evaluation and Analysis of Federated Learning in Edge Computing Environments

Towards Federated Large Language Models: Motivations, Methods, and Future Directions

eFedLLM: Efficient LLM Inference Based on Federated Learning

Empowering Federated Learning for Massive Models with NVIDIA FLARE

Efficient Federated Finetuning of Tiny Transformers with Resource-Constrained Devices

AnycostFL: Efficient On-Demand Federated Learning over Heterogeneous Edge Devices

FederatedScope-LLM: A Comprehensive Package for Fine-tuning Large Language Models in Federated Learning

An Empirical Analysis and Resource Footprint Study of Deploying Large Language Models on Edge Devices

Efficient Deployment of Large Language Model Across Cloud-Device Systems

EdgeLLM: A Highly Efficient CPU-FPGA Heterogeneous Edge Accelerator for Large Language Models

Enabling Efficient On-Device Fine-Tuning of LLMs Using Only Inference Engines