Knowledge Fusion of Chat LLMs: A Preliminary Technical Report

Fanqi Wan,Ziyi Yang,Longguang Zhong,Xiaojun Quan,Xinting Huang,Wei Bi

2024-05-28

Abstract:Recently, FuseLLM introduced the concept of knowledge fusion to transfer the collective knowledge of multiple structurally varied LLMs into a target LLM through lightweight continual training. In this report, we extend the scalability and flexibility of the FuseLLM framework to realize the fusion of chat LLMs, resulting in FusionChat. FusionChat comprises two main stages. Firstly, we undertake knowledge fusion for structurally and scale-varied source LLMs to derive multiple target LLMs of identical structure and size via lightweight fine-tuning. Then, these target LLMs are merged within the parameter space, wherein we propose a novel method for determining the merging weights based on the variation ratio of parameter matrices before and after fine-tuning. We validate our approach using three prominent chat LLMs with diverse architectures and scales, namely NH2-Mixtral-8x7B, NH2-Solar-10.7B, and OpenChat-3.5-7B. Experimental results spanning various chat domains demonstrate the superiority of FusionChat-7B across a broad spectrum of chat LLMs at 7B and 34B scales, even surpassing GPT-3.5 (March) and approaching Mixtral-8x7B-Instruct.

Computation and Language

What problem does this paper attempt to address?

The paper mainly discusses how to integrate the knowledge of multiple chat language models (LLMs) with different structures and sizes into a target model to create a more powerful unified model called FUSION CHAT. Existing methods such as model fusion and knowledge distillation usually involve combining model outputs or compressing model knowledge. FUSION CHAT, on the other hand, transfers the knowledge from multiple source LLMs to multiple target LLMs with the same structure and size through lightweight continuous training, and then merges these target LLMs in the parameter space. The paper proposes a new method called Variation Ratio Merge (VARM), which determines the merging weights based on the variation rate of parameter matrices before and after fine-tuning, achieving more fine-grained weight allocation without the need for additional training efforts. Experiments were conducted using three different open-source chat LLMs (NH2-Mixtral-8x7B, NH2-Solar-10.7B, and OpenChat-3.5-7B) to verify the effectiveness of FUSION CHAT, demonstrating its superiority over the source LLMs and fine-tuning baselines in multi-domain chat dialogue tasks, approaching or even surpassing GPT-3.5 (March) and Mixtral-8x7B-Instruct. Compared to FUSELLM, FUSION CHAT has better scalability and flexibility, supporting different sizes of source LLMs and allowing easier integration of new source LLMs, which is particularly useful in frequently updated open-source community chat LLMs.

Knowledge Fusion of Chat LLMs: A Preliminary Technical Report

FuseChat: Knowledge Fusion of Chat Models

Knowledge Fusion of Large Language Models

Cool-Fusion: Fuse Large Language Models without Training

ProFuser: Progressive Fusion of Large Language Models

Collaborative Knowledge Fusion: A Novel Approach for Multi-task Recommender Systems via LLMs

ChatCell: Facilitating Single-Cell Analysis with Natural Language

Fusion-Eval: Integrating Assistant Evaluators with LLMs

Why Not Transform Chat Large Language Models to Non-English?

MemoChat: Tuning LLMs to Use Memos for Consistent Long-Range Open-Domain Conversation

Fusion-Eval: Integrating Evaluators with LLMs

Evaluating the External and Parametric Knowledge Fusion of Large Language Models

Blending Is All You Need: Cheaper, Better Alternative to Trillion-Parameters LLM

SpecFuse: Ensembling Large Language Models via Next-Segment Prediction

Chat Vector: A Simple Approach to Equip LLMs With New Language Chat Capabilities

Chatlaw: A Multi-Agent Collaborative Legal Assistant with Knowledge Graph Enhanced Mixture-of-Experts Large Language Model

Advancing Academic Knowledge Retrieval via LLM-enhanced Representation Similarity Fusion

ChatLLM Network: More brains, More intelligence

MatChat: A Large Language Model and Application Service Platform for Materials Science

Multi-User Chat Assistant (MUCA): a Framework Using LLMs to Facilitate Group Conversations