Abstract:Federated fine-tuning of pre-trained Large Language Models (LLMs) enables task-specific adaptation across diverse datasets while preserving data privacy. However, the large model size and heterogeneity in client resources pose significant computational and communication challenges. To address these issues, in this paper, we propose a novel Heterogeneous Adaptive Federated Low-Rank Adaptation (LoRA) fine-tuned LLM framework (HAFL). To accommodate client resource heterogeneity, we first introduce an importance-based parameter truncation scheme, which allows clients to have different LoRA ranks, and smoothed sensitivity scores are used as importance indicators. Despite its flexibility, the truncation process may cause performance degradation. To tackle this problem, we develop an importance-based parameter freezing scheme. In this approach, both the cloud server and clients maintain the same LoRA rank, while clients selectively update only the most important decomposed LoRA rank-1 matrices, keeping the rest frozen. To mitigate the information dilution caused by the zero-padding aggregation method, we propose an adaptive aggregation approach that operates at the decomposed rank-1 matrix level. Experiments on the 20 News Group classification task show that our method converges quickly with low communication size, and avoids performance degradation when distributing models to clients compared to truncation-based heterogeneous LoRA rank scheme. Additionally, our adaptive aggregation method achieves faster convergence compared to the zero-padding approach.

What problem does this paper attempt to address?

The problems that this paper attempts to solve are: when fine - tuning large - language models (LLMs) in a federated learning environment, how to deal with the challenges of computational resource heterogeneity and low communication efficiency while maintaining model performance and data privacy. Specifically, the paper proposes solutions to the following problems: 1. **Computational Resource Heterogeneity**: - Different clients have different computational resources, making it impossible to uniformly set the rank of LoRA (Low - Rank Adaptation). A lower - rank LoRA can reduce the computational burden but may lead to performance degradation. 2. **Low Communication Efficiency**: - Large - language models have a large number of parameters, and frequent transmission of model updates will occupy a large amount of bandwidth, affecting training efficiency. Especially in federated learning, communication overhead is a key bottleneck. 3. **Performance Degradation**: - Using the truncation method to adapt to LoRA of different ranks may lead to a decline in model performance, especially when distributing high - rank global models to low - rank clients. To solve these problems, the paper proposes a new framework named HAFL (Heterogeneous Adaptive Federated LoRA fine - tuned LLMs framework), and its main contributions include: - **Importance - Aware Parameter Truncation Scheme**: Allows different clients to select different LoRA ranks according to their computing capabilities and selects the most important rank - 1 matrices for training through importance scores. - **Importance - Aware Parameter Freezing Scheme**: Ensures that all clients and cloud - side servers use the same high - rank LoRA, but the clients only update the most important rank - 1 matrices, and the rest remain frozen, thereby avoiding performance loss. - **Adaptive Global Aggregation Method**: Aggregate at the level of decomposed rank - 1 matrices instead of at the entire LoRA matrix level to prevent information dilution and improve aggregation efficiency. Through these innovations, the HAFL framework can significantly reduce communication overhead and computational resource requirements while maintaining model performance, and is suitable for federated learning environments with heterogeneous resources.

Federated LLMs Fine-tuned with Adaptive Importance-Aware LoRA

FLoRA: Federated Fine-Tuning Large Language Models with Heterogeneous Low-Rank Adaptations

Towards Robust and Efficient Federated Low-Rank Adaptation with Heterogeneous Clients

Federated Fine-tuning of Large Language Models under Heterogeneous Tasks and Client Resources

Towards Federated Low-Rank Adaptation with Rank-Heterogeneous Communication

Heterogeneous LoRA for Federated Fine-tuning of On-Device Foundation Models

FDLoRA: Personalized Federated Learning of Large Language Model via Dual LoRA Tuning

Improving LoRA in Privacy-preserving Federated Learning

Differentially Private Low-Rank Adaptation of Large Language Model Using Federated Learning

Federated Low-Rank Adaptation for Large Models Fine-Tuning over Wireless Networks

RBLA: Rank-Based-LoRA-Aggregation for Fine-tuning Heterogeneous Models in FLaaS

Selective Aggregation for Low-Rank Adaptation in Federated Learning

LoRA-FAIR: Federated LoRA Fine-Tuning with Aggregation and Initialization Refinement

Personalized Federated Fine-Tuning for LLMs via Data-Driven Heterogeneous Model Architectures

SA-FedLora: Adaptive Parameter Allocation for Efficient Federated Learning with LoRA Tuning

Federated LoRA with Sparse Communication

FedLoRA: Model-Heterogeneous Personalized Federated Learning with LoRA Tuning

Exact Aggregation for Federated and Efficient Fine-Tuning of Foundation Models

CELLM: An Efficient Communication in Large Language Models Training for Federated Learning

LoRA ensembles for large language model fine-tuning

LoRA-FA: Memory-efficient Low-rank Adaptation for Large Language Models Fine-tuning