Abstract:On-device LLMs have gained increasing attention for their ability to enhance privacy and provide a personalized user experience. To facilitate learning with private and scarce local data, federated learning has become a standard approach, though it introduces challenges related to system and data heterogeneity among end users. As a solution, we propose a novel $\textbf{Co}$llaborative learning approach with a $\textbf{Mi}$xture of $\textbf{G}$eneralists and $\textbf{S}$pecialists (CoMiGS), being the first to effectively address both. Our approach distinguishes generalists and specialists by aggregating certain experts across end users while keeping others localized to specialize in user-specific datasets. A key innovation of our method is the bi-level optimization formulation of the Mixture-of-Experts learning objective, where the router is updated using a separate validation set that represents the target distribution. CoMiGS effectively balances collaboration and personalization, as demonstrated by its superior performance in scenarios with high data heterogeneity across multiple datasets. By design, our approach accommodates users' varying computational resources through different numbers of specialists. By decoupling resource abundance from data quantity, CoMiGS remains robust against overfitting-due to the generalists' regularizing effect-while adapting to local data through specialist expertise.

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper aims to solve two major challenges faced when performing personalized collaborative fine - tuning of large - language models (LLMs) on - device: **system heterogeneity** and **data heterogeneity**. Specifically: 1. **System heterogeneity**: - Different users' devices have different computing resources, resulting in differences in model architectures and the number of parameters. - A method is required to be able to run effectively on devices with different computing resources while maintaining model performance. 2. **Data heterogeneity**: - There are large differences in the distribution of users' local data, resulting in significant differences in datasets of different users in terms of topics, language habits, etc. - It is necessary to perform effective model fine - tuning using limited local data while protecting users' privacy. To solve these problems, the paper proposes a new collaborative learning method - **CoMiGS (Collaborative learning approach with a Mixture of Generalists and Specialists)**. This method is achieved in the following ways: - **Generalists**: Share some model parameters so that different users can collaborate in learning, thereby improving the generalization ability of the model. - **Specialists**: Retain some model parameters as user - specific to adapt to the unique characteristics of local data and provide personalized solutions. - **Two - level optimization framework**: Introduce a two - level optimization formula, in which the router parameters are updated using an independent validation set, while the expert parameters are updated based on the training set. This design enables the model to better adapt to the target distribution, especially in cases where the data distribution is inconsistent. In this way, CoMiGS can effectively balance the needs for collaboration and personalization while dealing with the problems of system and data heterogeneity. In addition, this method also allows for the dynamic adjustment of the number of expert modules according to users' computing resources, thereby further improving flexibility and robustness. ### Summary The main contributions of the paper are: - Propose a new CoMiGS method, which effectively solves the problems of system heterogeneity and data heterogeneity for the first time. - Introduce an innovative two - level optimization formula, which improves the performance of the model in cases of distribution shift. - Separate the relationship between resource heterogeneity and data volume, so that users with more local data can benefit from larger models, while users with stronger computing resources but less data are less likely to over - fit.

On-Device Collaborative Language Modeling via a Mixture of Generalists and Specialists

Collaborative Learning Between Cloud and End Devices

Learning to Decode Collaboratively with Multiple Language Models

Personalized Collaborative Fine-Tuning for On-Device Large Language Models

FedMoE: Personalized Federated Learning via Heterogeneous Mixture of Experts

How to Collaborate: Towards Maximizing the Generalization Performance in Cross-Silo Federated Learning

MoME: Mixture of Multimodal Experts for Generalist Multimodal Large Language Models

Cloud-Device Collaborative Learning for Multimodal Large Language Models

Federated Multi-Task Learning under a Mixture of Distributions

Learning to Collaborate in Decentralized Learning of Personalized Models

Federated Mutual Learning: a Collaborative Machine Learning Method for Heterogeneous Data, Models, and Objectives

Mutual Enhancement of Large and Small Language Models with Cross-Silo Knowledge Transfer

Collaborative Machine Learning Model Building with Families Using Co-ML

CCoE: A Compact LLM with Collaboration of Experts

Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts

Omni-SMoLA: Boosting Generalist Multimodal Models with Soft Mixture of Low-rank Experts

FedMKT: Federated Mutual Knowledge Transfer for Large and Small Language Models

Co-ML: Collaborative Machine Learning Model Building for Developing Dataset Design Practices

Supervised Knowledge Makes Large Language Models Better In-context Learners

CoLLM: Integrating Collaborative Embeddings into Large Language Models for Recommendation