Federated Fine-Tuning of LLMs on the Very Edge: The Good, the Bad, the Ugly

Herbert Woisetschläger,Alexander Isenko,Shiqiang Wang,Ruben Mayer,Hans-Arno Jacobsen
DOI: https://doi.org/10.1145/3650203.3663331
2024-05-02
Abstract:Large Language Models (LLM) and foundation models are popular as they offer new opportunities for individuals and businesses to improve natural language processing, interact with data, and retrieve information faster. However, training or fine-tuning LLMs requires a vast amount of data, which can be challenging to access due to legal or technical restrictions and may require private computing resources. Federated Learning (FL) is a solution designed to overcome these challenges and expand data access for deep learning applications. This paper takes a hardware-centric approach to explore how LLMs can be brought to modern edge computing systems. Our study fine-tunes the FLAN-T5 model family, ranging from 80M to 3B parameters, using FL for a text summarization task. We provide a micro-level hardware benchmark, compare the model FLOP utilization to a state-of-the-art data center GPU, and study the network utilization in realistic conditions. Our contribution is twofold: First, we evaluate the current capabilities of edge computing systems and their potential for LLM FL workloads. Second, by comparing these systems with a data-center GPU, we demonstrate the potential for improvement and the next steps toward achieving greater computational efficiency at the edge.
Machine Learning,Distributed, Parallel, and Cluster Computing,Performance
What problem does this paper attempt to address?
The problems that this paper attempts to solve are: How to efficiently conduct federated fine - tuning of large - language models (LLMs) and foundation models (FMs) on resource - constrained network edge devices while meeting new regulatory requirements, such as the energy - efficiency requirements in the EU AI Act. Specifically, the paper focuses on the following aspects: 1. **Limited computing resources**: Edge devices (such as embedded devices) have low memory bandwidth, which limits the computing potential of federated learning applications. Compared with data centers, the memory bandwidth of edge devices is significantly lower, which will affect key operations during the training process and lead to severe training - time delays. 2. **Energy - efficiency first**: With the introduction of the EU AI Act, service providers need to focus on energy - efficient operations. However, in the federated learning system, since client - side hardware configurations vary and are usually temporarily involved in training, it is difficult to obtain detailed hardware information, and thus it is difficult to use traditional performance evaluation metrics (such as Model - FLOP Utilization, MFU) to measure energy efficiency. 3. **Training challenges of large - scale models**: Foundation models (FMs) have a large number of parameters and are more difficult to train and fine - tune. During the federated learning process, the gradients of these models are more likely to explode or disappear, increasing the complexity of training. 4. **High communication costs**: Edge devices are usually located in geographically widely - distributed areas, with limited network bandwidth, and the communication costs are much higher than in data - center environments. Especially when dealing with deep - learning models with millions of parameters, communication overhead may become the main bottleneck. Based on the above challenges, the research questions of the paper can be summarized as: How can efficient foundation - model training and fine - tuning be achieved on resource - constrained network edge devices? Which factors can have the greatest impact on improving the efficiency of the federated learning system? By studying these problems, the author hopes to provide theoretical and technical support for the application of federated learning on edge devices, promote sustainable computing, and ensure compliance with relevant laws and regulations.