CCoE: A Compact LLM with Collaboration of Experts

Shaomang Huang,Jianfeng Pan,Hanzhong Zheng
2024-07-25
Abstract:In the domain of Large Language Model (LLM), LLMs demonstrate significant capabilities in natural language understanding and generation. With the growing needs of applying LLMs on various domains, it is a research question that how to efficiently train and build a model that has expertise in different domains but with a low training cost. We propose CCoE architecture, a framework of easily coupling multiple strong domain experts together to fuse into a big LLM, provides a collective way of utilizing the different domain expert LLMs. Besides, training a large collaborative of multiple expert LLMs requires a high requirements on training sources. CCoE bypasses this problem through isolating other experts and train each expert separately. The design of CCoE assembles multiple expert LLMs through the CoE (Collaboration of Experts) layer. Each CoE layer could have one or more expert LLMs. Expert LLMs have different number of layers and have been well-trained for different domain tasks. Each expert is fine-tuned to be able to achieve the comparable results with SOTA domain LLMs. We start from 5 experts in the domain of Code, Math, Law, text-to-SQL and Medical. The results indicate that our CCoE framework can easily and efficiently boost nearly 10%-20% performance on original base model in different domains but using less resources on training, as well as inference.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
The paper attempts to address the problem of how to efficiently train and build a model with expertise in different domains but with lower training costs in the field of large language models (LLM). Specifically, the paper proposes the CCoE architecture, which aims to form a unified large language model by seamlessly integrating expert models from multiple domains, thereby achieving efficient performance improvement across domains while significantly reducing the resources required for training and inference. ### Main Issues 1. **Integration of Multi-Domain Expertise**: - Existing large language models, although performing well in natural language understanding and generation, still have deficiencies in tasks specific to certain domains (such as code, mathematics, law, medicine, etc.). - How to effectively integrate expertise from different domains into one model so that it can perform well in multiple domains? 2. **Efficient Training and Resource Utilization**: - Training a large language model that performs well in multiple domains usually requires a large amount of computational resources and data. - How to reduce the cost of training and inference while maintaining high performance? 3. **Avoiding Catastrophic Forgetting**: - Fine-tuning an existing model for a specific domain can easily lead to the model forgetting previously learned knowledge (i.e., catastrophic forgetting). - How to enhance the model's capabilities in specific domains while retaining its general capabilities? ### Solution The CCoE (Collaboration of Experts) architecture proposed in the paper addresses the above issues in the following ways: 1. **Integration of Expert Models**: - The CCoE framework allows the integration of expert models from multiple domains (each expert model focusing on a specific domain) into a unified large language model. - Each expert model can be trained independently and coupled with the main model through the CoE layer (Collaboration of Experts layer). 2. **Flexible Resource Management**: - By isolating the training of each expert model, CCoE can train and update each expert model independently without updating the main model's parameters. - This design allows the model to activate only the relevant expert models when handling different tasks, thereby significantly reducing resource consumption. 3. **Continuous Learning and Expansion**: - CCoE supports the dynamic addition of new expert models to adapt to the ever-changing domain requirements. - Through "push" and "pop" operations, expert models can be easily added or removed, enabling continuous learning and model expansion. ### Experimental Results - Experimental results show that the CCoE framework can improve performance by 10%-20% across multiple domains (including code, mathematics, law, medicine, and text-to-SQL) while significantly reducing the resources required for training and inference. - By efficiently fine-tuning high-quality domain datasets, expert models can achieve performance comparable to domain-specific models in their respective fields. ### Conclusion The CCoE architecture provides an effective method for integrating expert models from multiple domains, achieving efficient performance improvement across multiple domains while significantly reducing the cost of training and inference. This approach offers a new perspective for the practical deployment of large language models in multi-domain applications.