Fusing Models with Complementary Expertise

Hongyi Wang,Felipe Maia Polo,Yuekai Sun,Souvik Kundu,Eric Xing,Mikhail Yurochkin
2024-05-10
Abstract:Training AI models that generalize across tasks and domains has long been among the open problems driving AI research. The emergence of Foundation Models made it easier to obtain expert models for a given task, but the heterogeneity of data that may be encountered at test time often means that any single expert is insufficient. We consider the Fusion of Experts (FoE) problem of fusing outputs of expert models with complementary knowledge of the data distribution and formulate it as an instance of supervised learning. Our method is applicable to both discriminative and generative tasks and leads to significant performance improvements in image and text classification, text summarization, multiple-choice QA, and automatic evaluation of generated text. We also extend our method to the "frugal" setting where it is desired to reduce the number of expert model evaluations at test time. Our implementation is publicly available at <a class="link-external link-https" href="https://github.com/hwang595/FoE-ICLR2024" rel="external noopener nofollow">this https URL</a>.
Machine Learning
What problem does this paper attempt to address?
This paper attempts to address the problem of how to fuse the outputs of multiple expert models (with complementary knowledge) trained on different tasks and data distributions to improve the model's generalization ability during testing. Specifically, the paper focuses on how to effectively merge the outputs of these models with complementary expertise to achieve significant performance improvements in tasks such as image classification, text classification, text generation, multiple-choice question answering, and automatic text generation evaluation. ### Main Issues 1. **Generalization Problem of Single Expert Models**: - A single expert model performs well on specific tasks and data distributions, but its performance may degrade when the test data distribution differs from the training data. - Although foundation models can be fine-tuned to obtain high-quality expert models, these expert models still face generalization issues. 2. **Fusion Problem of Multiple Expert Models**: - How to effectively combine multiple expert models with complementary expertise to improve the overall model's generalization ability and performance. - Traditional ensemble learning methods (such as ensemble learning and mixture of experts) usually assume that expert models are trained and tested on the same distribution, whereas this paper considers pre-trained expert models that exhibit different expertise on different data distributions. ### Solutions 1. **Fusion of Experts (FoE)**: - Proposes a supervised learning method to fuse the outputs of models with complementary expertise. - Applicable to both discriminative and generative tasks, by training a fusion model (Fuser) to select or combine the outputs of expert models to produce the final prediction or generation result. 2. **Frugal Fusion of Experts (FrugalFoE)**: - Proposes a "frugal" fusion method under resource constraints, reducing the number of expert models that need to be evaluated during testing. - Efficiently selects the most suitable expert models for fusion by transforming the problem into a shortest path problem on a graph. ### Experimental Validation - The effectiveness of the FoE method is validated through extensive experiments, including tasks such as image classification, text summarization, text generation evaluation, sentiment analysis, and large-scale multi-task language understanding (MMLU). - Results show that the FoE method performs close to or even exceeds the "oracle model" (i.e., always selecting the most suitable expert model for a given task) and significantly outperforms single expert models and other baseline methods. - Particularly in the CIFAR-100 superclass classification task, FrugalFoE achieves the same accuracy as querying all experts by querying only 37.5% of the experts. ### Summary This paper addresses the problem of how to effectively fuse multiple expert models trained on different data distributions by proposing the Fusion of Experts (FoE) and Frugal Fusion of Experts (FrugalFoE) methods, thereby improving the model's generalization ability and performance across various tasks.