Abstract:Deep model fusion is an emerging technique that unifies the predictions or parameters of several deep neural networks into a single model in a cost-effective and data-efficient manner. This enables the unified model to take advantage of the original models' strengths, potentially exceeding their performance. Although a variety of deep model fusion techniques have been introduced, their evaluations tend to be inconsistent and often inadequate to validate their effectiveness and robustness against distribution shifts. To address this issue, we introduce FusionBench, which is the first comprehensive benchmark dedicated to deep model fusion. FusionBench covers a wide range of tasks, including open-vocabulary image classification, text classification, and text-to-text generation. Each category includes up to eight tasks with corresponding task-specific models, featuring both full fine-tuning and LoRA fine-tuning, as well as models of different sizes, to ensure fair and balanced comparisons of various multi-task model fusion techniques across different tasks, model scales, and fine-tuning strategies. We implement and evaluate a broad spectrum of deep model fusion techniques. These techniques range from model ensemble methods, which combine the predictions to improve the overall performance, to model merging, which integrates different models into a single one, and model mixing methods, which upscale or recombine the components of the original models. FusionBench now contains 26 distinct tasks, 74 fine-tuned models, and 16 fusion techniques, and we are committed to consistently expanding the benchmark with more tasks, models, and fusion techniques. In addition, we offer a well-documented set of resources and guidelines to aid researchers in understanding and replicating the benchmark results. Homepage <a class="link-external link-https" href="https://github.com/tanganke/fusion_bench" rel="external noopener nofollow">this https URL</a>

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the inconsistent and insufficient evaluation of deep model fusion techniques. Although a variety of deep model fusion techniques have been proposed, the evaluation of these techniques often lacks standardization, and it is difficult to verify their effectiveness and robustness to distribution shift. Specifically: 1. **Inconsistent evaluation**: Different researchers use different tasks, models, and settings when evaluating deep model fusion techniques, making the results difficult to compare. 2. **Lack of standardized evaluation**: There is no unified benchmark platform to systematically evaluate the effectiveness and robustness of these techniques, especially their performance in the face of data distribution changes. 3. **Difficulties in implementation and reproducibility**: The existing evaluation methods and experimental settings are complex and difficult to reproduce, further exacerbating the evaluation inconsistency. To solve these problems, the author introduced **FusionBench**, which is the first comprehensive benchmark platform specifically for deep model fusion. The main goal of FusionBench is to provide a modular and extensible platform that covers a wide range of tasks, models, and fusion techniques to ensure a fair and balanced multi - task model fusion technique evaluation. Specifically, FusionBench includes the following: - **A wide range of tasks**: It includes multiple tasks such as open - vocabulary image classification, text classification, and text generation. - **Diverse models**: Each task category contains up to eight tasks and their corresponding task - specific models, covering full - fine - tuning and LoRA fine - tuning models, as well as models of different scales. - **Rich fusion techniques**: It implements multiple deep model fusion techniques from model ensemble, model merging to model mixing. - **User - friendly resources**: It provides detailed documentation, code examples, and tutorials to help researchers understand and reproduce the benchmark results. Through FusionBench, researchers can evaluate and compare different deep model fusion techniques more systematically, thus promoting the further development of this field.

FusionBench: A Comprehensive Benchmark of Deep Model Fusion

Deep Model Fusion: A Survey

AIBench: an Industry Standard AI Benchmark Suite from Internet Services.

DeepfakeBench: A Comprehensive Benchmark of Deepfake Detection

A Task-guided, Implicitly-searched and Meta-initialized Deep Model for Image Fusion

SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models

MultiBench: Multiscale Benchmarks for Multimodal Representation Learning

FusionBooster: A Unified Image Fusion Boosting Paradigm

A Task-guided, Implicitly-searched and Metainitialized Deep Model for Image Fusion

Deepfusion: A Deep Learning Framework For The Fusion Of Heterogeneous Sensory Data

Fusion Matters: Learning Fusion in Deep Click-through Rate Prediction Models

Fusion from Decomposition: A Self-Supervised Approach for Image Fusion and Beyond

ReFusion: Learning Image Fusion from Reconstruction with Learnable Loss via Meta-Learning

Benchmarking and Comparing Multi-exposure Image Fusion Algorithms

Cool-Fusion: Fuse Large Language Models without Training

Task-driven Image Fusion with Learnable Fusion Loss

Deep Neural Network Fusion via Graph Matching with Applications to Model Ensemble and Federated Learning.

Wolf2Pack: The AutoFusion Framework for Dynamic Parameter Fusion

UFA-FUSE: A novel deep supervised and hybrid model for multi-focus image fusion

Multi-focus Image Fusion: A Benchmark

Fusionmlp: A Mlp-Based Unified Image Fusion Framework