Abstract:Multi-task learning (MTL) aims to empower a model to tackle multiple tasks simultaneously. A recent development known as task arithmetic has revealed that several models, each fine-tuned for distinct tasks, can be directly merged into a single model to execute MTL without necessitating a retraining process using the initial training data. Nevertheless, this direct addition of models often leads to a significant deterioration in the overall performance of the merged model. This decline occurs due to potential conflicts and intricate correlations among the multiple tasks. Consequently, the challenge emerges of how to merge pre-trained models more effectively without using their original training data. This paper introduces an innovative technique called Adaptive Model Merging (AdaMerging). This approach aims to autonomously learn the coefficients for model merging, either in a task-wise or layer-wise manner, without relying on the original training data. Specifically, our AdaMerging method operates as an automatic, unsupervised task arithmetic scheme. It leverages entropy minimization on unlabeled test samples from the multi-task setup as a surrogate objective function to iteratively refine the merging coefficients of the multiple models. Our experimental findings across eight tasks demonstrate the efficacy of the AdaMerging scheme we put forth. Compared to the current state-of-the-art task arithmetic merging scheme, AdaMerging showcases a remarkable 11\% improvement in performance. Notably, AdaMerging also exhibits superior generalization capabilities when applied to unseen downstream tasks. Furthermore, it displays a significantly enhanced robustness to data distribution shifts that may occur during the testing phase.

An Adaptive Node Network Based on Multitask Deep Learning

Traffic Flow and Speed Forecasting Through a Bayesian Deep Multi-Linear Relationship Network.

DEPHN: Different Expression Parallel Heterogeneous Network using virtual gradient optimization for Multi-task Learning

Distributed Learning of Predictive Structures from Multiple Tasks over Networks

AdaTask: A Task-aware Adaptive Learning Rate Approach to Multi-task Learning

On Better Exploring and Exploiting Task Relationships in Multitask Learning: Joint Model and Feature Learning.

A Model-Agnostic Approach to Mitigate Gradient Interference for Multi-Task Learning

Conditionally Adaptive Multi-Task Learning: Improving Transfer Learning in NLP Using Fewer Parameters & Less Data

AdaTT: Adaptive Task-to-Task Fusion Network for Multitask Learning in Recommendations

Distributed Jointly Sparse Multitask Learning over Networks

AdapMTL: Adaptive Pruning Framework for Multitask Learning Model

Multi-task Model and Feature Joint Learning

Adaptive and Dynamic Knowledge Transfer in Multi-task Learning with Attention Networks.

Learning Boost by Exploiting the Auxiliary Task in Multi-task Domain

Improving Multi-task Learning via Seeking Task-based Flat Regions

Multiple Task Learning Using Iteratively Reweighted Least Square.

Modeling Output-Level Task Relatedness in Multi-Task Learning with Feedback Mechanism

Mitigating Task Interference in Multi-Task Learning via Explicit Task Routing with Non-Learnable Primitives

AdaMerging: Adaptive Model Merging for Multi-Task Learning

Robust Analysis of Multi-Task Learning Efficiency: New Benchmarks on Light-Weighed Backbones and Effective Measurement of Multi-Task Learning Challenges by Feature Disentanglement

Towards Impartial Multi-task Learning.