Knowledge Fusion of Chat LLMs: A Preliminary Technical Report

Fanqi Wan,Ziyi Yang,Longguang Zhong,Xiaojun Quan,Xinting Huang,Wei Bi
2024-05-28
Abstract:Recently, FuseLLM introduced the concept of knowledge fusion to transfer the collective knowledge of multiple structurally varied LLMs into a target LLM through lightweight continual training. In this report, we extend the scalability and flexibility of the FuseLLM framework to realize the fusion of chat LLMs, resulting in FusionChat. FusionChat comprises two main stages. Firstly, we undertake knowledge fusion for structurally and scale-varied source LLMs to derive multiple target LLMs of identical structure and size via lightweight fine-tuning. Then, these target LLMs are merged within the parameter space, wherein we propose a novel method for determining the merging weights based on the variation ratio of parameter matrices before and after fine-tuning. We validate our approach using three prominent chat LLMs with diverse architectures and scales, namely NH2-Mixtral-8x7B, NH2-Solar-10.7B, and OpenChat-3.5-7B. Experimental results spanning various chat domains demonstrate the superiority of FusionChat-7B across a broad spectrum of chat LLMs at 7B and 34B scales, even surpassing GPT-3.5 (March) and approaching Mixtral-8x7B-Instruct.
Computation and Language
What problem does this paper attempt to address?
The paper mainly discusses how to integrate the knowledge of multiple chat language models (LLMs) with different structures and sizes into a target model to create a more powerful unified model called FUSION CHAT. Existing methods such as model fusion and knowledge distillation usually involve combining model outputs or compressing model knowledge. FUSION CHAT, on the other hand, transfers the knowledge from multiple source LLMs to multiple target LLMs with the same structure and size through lightweight continuous training, and then merges these target LLMs in the parameter space. The paper proposes a new method called Variation Ratio Merge (VARM), which determines the merging weights based on the variation rate of parameter matrices before and after fine-tuning, achieving more fine-grained weight allocation without the need for additional training efforts. Experiments were conducted using three different open-source chat LLMs (NH2-Mixtral-8x7B, NH2-Solar-10.7B, and OpenChat-3.5-7B) to verify the effectiveness of FUSION CHAT, demonstrating its superiority over the source LLMs and fine-tuning baselines in multi-domain chat dialogue tasks, approaching or even surpassing GPT-3.5 (March) and Mixtral-8x7B-Instruct. Compared to FUSELLM, FUSION CHAT has better scalability and flexibility, supporting different sizes of source LLMs and allowing easier integration of new source LLMs, which is particularly useful in frequently updated open-source community chat LLMs.