FedsLLM: Federated Split Learning for Large Language Models over Communication Networks

Kai Zhao,Zhaohui Yang,Chongwen Huang,Xiaoming Chen,Zhaoyang Zhang
2024-07-12
Abstract:Addressing the challenges of deploying large language models in wireless communication networks, this paper combines low-rank adaptation technology (LoRA) with the splitfed learning framework to propose the federated split learning for large language models (FedsLLM) framework. The method introduced in this paper utilizes LoRA technology to reduce processing loads by dividing the network into client subnetworks and server subnetworks. It leverages a federated server to aggregate and update client models. As the training data are transmitted through a wireless network between clients and both main and federated servers, the training delay is determined by the learning accuracy and the allocation of communication bandwidth. This paper models the minimization of the training delay by integrating computation and communication optimization, simplifying the optimization problem into a convex problem to find the optimal solution. Additionally, it presents a lemma that describes the precise solutions to this problem. Simulation results demonstrate that the proposed optimization algorithm reduces delays by an average of 47.63% compared to unoptimized scenarios.
Networking and Internet Architecture,Machine Learning
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is how to minimize the training latency by optimizing the allocation of computing and communication resources when deploying large - language models in wireless communication networks. Specifically, the paper proposes a framework named FedsLLM (Federated Split Learning for Large Language Models), which combines the low - rank adaptation technique (LoRA) and federated split learning (SplitFed Learning), aiming to reduce the processing load and improve efficiency by jointly aggregating and updating client models on the server. In addition, the paper also models the computing and communication optimization problems, simplifies them into a convex optimization problem to find the optimal solution, thereby effectively reducing the training latency. ### Main contributions of the paper: 1. **Proposing the FedsLLM framework**: This framework utilizes the LoRA technique to reduce the number of parameters, alleviates the computational burden by splitting the network into client sub - networks and server sub - networks, and collaboratively trains the model among multiple clients through the federated learning mechanism. 2. **Optimizing the allocation of computing and communication resources**: The paper models the computing and communication optimization problems, transforms them into a convex optimization problem, and gives a lemma for the exact solution, thus effectively reducing the training latency. 3. **Experimental verification**: Through simulation experiments, the paper shows the performance of the proposed optimization algorithm under different transmission powers. The results indicate that, compared with the unoptimized scheme, this algorithm can reduce the latency by an average of 47.63%. ### Problems solved: - **Limited computing resources**: When deploying large - language models on mobile devices and Internet of Things devices, computing resources are usually limited. FedsLLM reduces the computational burden by splitting the model and using the LoRA technique. - **Limited communication bandwidth**: In wireless communication networks, bandwidth resources are limited. FedsLLM reduces data transmission latency by optimizing bandwidth allocation. - **Model privacy protection**: Through the federated learning mechanism, FedsLLM conducts distributed training without sharing the original data, improving the level of model privacy protection. ### Formula analysis: - **Model splitting**: \[ \omega_0+\Delta\omega = \omega_0+BA, \quad \text{s.t.} \quad B\in\mathbb{R}^{d\times r}, \quad A\in\mathbb{R}^{r\times k}, \quad r\ll\min(d,k) \] This formula describes that in the LoRA technique, two matrices \(A\) and \(B\) are used to reduce the input and output dimensions, thereby achieving parameter - efficient fine - tuning. - **Loss function**: \[ F_k(\omega_0,\Delta\omega)=\frac{1}{D_k}\sum_{l = 1}^{D_k}f(\omega_0,\omega_s,\omega_c,x_{kl},y_{kl}) \] This formula defines the total loss function for each user, where \(D_k\) is the number of data samples of user \(k\), and \(f\) is the specific loss function. - **Optimization problem**: \[ \min_{\Delta\omega}F(\omega_0,\Delta\omega)=\frac{1}{D}\sum_{k = 1}^K\sum_{l = 1}^{D_k}f(\omega_0,\omega_s,\omega_c,x_{kl},y_{kl}) \] This formula describes the global optimization objective of FedsLLM, that is, minimizing the total loss function of all users. - **Latency optimization**: \[ T_k=I_0\left(\tau+t_{c,k}+v\log_2\left(\frac{1}{\eta}\right)t_{s,k}\right) \] This formula calculates for each user.