Abstract:While training large language models (LLMs) from scratch can indeed lead to models with distinct capabilities and strengths, it incurs substantial costs and may lead to redundancy in competencies. Knowledge fusion aims to integrate existing LLMs of diverse architectures and capabilities into a more potent LLM through lightweight continual training, thereby reducing the need for costly LLM development. In this work, we propose a new framework for the knowledge fusion of chat LLMs through two main stages, resulting in FuseChat. Firstly, we conduct pairwise knowledge fusion on source chat LLMs of varying structures and scales to create multiple target LLMs with identical structure and size via lightweight fine-tuning. During this process, a statistics-based token alignment approach is introduced as the cornerstone for fusing LLMs with different structures. Secondly, we merge these target LLMs within the parameter space, where we propose a novel method for determining the merging coefficients based on the magnitude of parameter updates before and after fine-tuning. We implement and validate FuseChat using six prominent chat LLMs with diverse architectures and scales, including OpenChat-3.5-7B, Starling-LM-7B-alpha, NH2-SOLAR-10.7B, InternLM2-Chat-20B, Mixtral-8x7B-Instruct, and Qwen-1.5-Chat-72B. Experimental results on two instruction-following benchmarks, AlpacaEval 2.0 and MT-Bench, demonstrate the superiority of FuseChat-7B over baselines of various sizes. Our model is even comparable to the larger Mixtral-8x7B-Instruct and approaches GPT-3.5-Turbo-1106 on MT-Bench. Our code, model weights, and data are public at \url{<a class="link-external link-https" href="https://github.com/fanqiwan/FuseAI" rel="external noopener nofollow">this https URL</a>}.

What problem does this paper attempt to address?

The paper aims to address two main issues in the development of large language models (LLMs): 1. **High Cost Issue**: Training a brand-new large language model requires a significant amount of computational resources and time, which is prohibitively expensive for individuals or many institutions. 2. **Capability Redundancy Issue**: Although existing large language models differ in structure and functionality, they often exhibit similar capabilities when performing various natural language processing tasks. To tackle these problems, the paper proposes a new framework called FUSECHAT, which integrates chat-oriented large language models (chat LLMs) of different architectures and scales through a "fuse-and-merge" approach, thereby creating a new model with better performance and lower cost. Specifically, the workflow of FUSECHAT is divided into two stages: - **Fusion Stage**: Select a baseline model (referred to as the "pivot LLM") and then perform pairwise knowledge fusion with other source LLMs to generate multiple target LLMs with the same structure and scale. During this process, a statistical token alignment method is introduced to handle token differences between different models and ensure effective knowledge transfer. - **Merge Stage**: Merge the aforementioned target LLMs in the parameter space to obtain the final fused model, FUSECHAT. To determine the merging coefficients, the paper proposes a novel method called SCE (Select, Calculate, Erase), which automatically allocates merging coefficients at each parameter matrix level based on the magnitude of parameter updates. In the experimental section, the authors used six different open-source chat LLMs as source models and selected OpenChat-3.5-7B as the pivot model. The experimental results show that FUSECHAT outperforms the baseline models on two representative instruction-following benchmarks (AlpacaEval 2.0 and MT-Bench), including chat LLMs of various scales. In summary, by proposing the FUSECHAT framework, the paper effectively addresses the cost issue of developing new LLMs and is able to integrate the advantages of existing models to improve overall performance.

FuseChat: Knowledge Fusion of Chat Models

Knowledge Fusion of Chat LLMs: A Preliminary Technical Report

TCMChat: A Generative Large Language Model for Traditional Chinese Medicine

ProFuser: Progressive Fusion of Large Language Models

Cool-Fusion: Fuse Large Language Models without Training

Knowledge Fusion of Large Language Models

FastLearn: A Rapid Learning Agent for Chat Models to Acquire Latest Knowledge

SpecFuse: Ensembling Large Language Models via Next-Segment Prediction

ChatCell: Facilitating Single-Cell Analysis with Natural Language

CodeFuse-13B: A Pretrained Multi-lingual Code Large Language Model

RLHFuse: Efficient RLHF Training for Large Language Models with Inter- and Intra-Stage Fusion

Why Not Transform Chat Large Language Models to Non-English?

Llama 2: Open Foundation and Fine-Tuned Chat Models

MemoChat: Tuning LLMs to Use Memos for Consistent Long-Range Open-Domain Conversation

FuseGen: PLM Fusion for Data-generation based Zero-shot Learning

Mastering Text, Code and Math Simultaneously via Fusing Highly Specialized Language Models

OpenChat: Advancing Open-source Language Models with Mixed-Quality Data

MatChat: A Large Language Model and Application Service Platform for Materials Science

EduChat: A Large-Scale Language Model-based Chatbot System for Intelligent Education

FuseMoE: Mixture-of-Experts Transformers for Fleximodal Fusion