Less is More: Efficient Model Merging with Binary Task Switch

Biqing Qi,Fangyuan Li,Zhen Wang,Junqi Gao,Dong Li,Peng Ye,Bowen Zhou
2024-11-24
Abstract:As an effective approach to equip models with multi-task capabilities without additional training, model merging has garnered significant attention. However, existing methods face challenges of redundant parameter conflicts and the excessive storage burden of parameters. In this work, through controlled experiments, we reveal that for task vectors, only those parameters with magnitudes above a certain threshold contribute positively to the task, exhibiting a pulse-like characteristic. We then attempt leveraging this characteristic to binarize the task vectors and reduce storage overhead. Further controlled experiments show that the binarized task vectors incur almost no decrease in fine-tuning and merging performance, and even exhibit stronger performance improvements as the proportion of redundant parameters increases. Based on these insights, we propose Task Switch (T-Switch), which decomposes task vectors into three components: 1) an activation switch instantiated by a binarized mask vector, 2) a polarity switch instantiated by a binarized sign vector, and 3) a scaling knob instantiated by a scalar coefficient. By storing task vectors in a binarized form, T-Switch alleviates parameter conflicts while ensuring efficient task parameter storage. Furthermore, to enable automated switch combination in T-Switch, we further introduce Auto-Switch, which enables training-free switch combination via retrieval from a small query set. Experiments indicate that our methods achieve significant performance improvements over existing baselines, requiring only 1-3% of the storage space of full-precision parameters.
Machine Learning
What problem does this paper attempt to address?
The problems that this paper attempts to solve are two main challenges in model merging: parameter conflicts among task vectors and the excessive burden of parameter storage. Specifically: 1. **Parameter Conflicts**: In a multi - task scenario, there may be parameter conflicts among task vectors of different tasks, which will limit the improvement of model performance. Especially in the dynamic merging method, even if an automatic combination strategy is adopted, the inherent conflicts of task vectors will still limit the performance gain after further pruning. 2. **Storage Overhead**: Storing all task vectors requires a large amount of storage space. Especially in resource - constrained scenarios, this poses an obstacle to practical applications. For example, if each task vector is stored in full - precision form and automatically combined by a router, the storage requirement will far exceed the storage requirement of pre - trained weights. To solve these problems, the author proposes a method named Task Switch (T - Switch), which reduces parameter conflicts and improves storage efficiency through the following steps: - **Pulse Activation Feature**: Through a series of controlled experiments, the author found that only when the magnitude of a parameter in a task vector exceeds a certain threshold will it make a positive contribution to the task, while other small - magnitude parameters may be redundant and even have a negative impact on task performance. Based on this finding, the author proposes a method of binarizing task vectors, that is, only retaining those parameters that make significant contributions to the task and discarding the rest. - **Binarized Task Vectors**: To further reduce storage overhead, the author binarizes the task vectors, that is, converts non - zero parameters into binary form and restores them to a length close to the original task vector through a scaling factor. Experimental results show that this binarized approximation not only significantly reduces the storage burden, but also hardly degrades the fine - tuning and merging performance, and even improves the performance in some cases. - **Dynamic Merging Mechanism**: Based on the above binarized task vectors, the author proposes T - Switch, which consists of three components: an activation switch, a polarity switch, and a scaling knob. These components work together to flexibly reorganize binarized task vectors without increasing additional training costs, thereby achieving efficient dynamic merging. In addition, to make the combination of task vectors more automated, the author also introduces Auto - Switch, which realizes automatic switch combinations without training by retrieving from a small query set. In conclusion, this paper aims to relieve parameter conflicts and significantly improve storage efficiency by reducing redundant parameters in task vectors and adopting binarization techniques, thereby providing an efficient and practical solution for model merging in multi - task scenarios.