Task Arithmetic in the Tangent Space: Improved Editing of Pre-Trained Models

Guillermo Ortiz-Jimenez,Alessandro Favero,Pascal Frossard
2023-11-22
Abstract:Task arithmetic has recently emerged as a cost-effective and scalable approach to edit pre-trained models directly in weight space: By adding the fine-tuned weights of different tasks, the model's performance can be improved on these tasks, while negating them leads to task forgetting. Yet, our understanding of the effectiveness of task arithmetic and its underlying principles remains limited. We present a comprehensive study of task arithmetic in vision-language models and show that weight disentanglement is the crucial factor that makes it effective. This property arises during pre-training and manifests when distinct directions in weight space govern separate, localized regions in function space associated with the tasks. Notably, we show that fine-tuning models in their tangent space by linearizing them amplifies weight disentanglement. This leads to substantial performance improvements across multiple task arithmetic benchmarks and diverse models. Building on these findings, we provide theoretical and empirical analyses of the neural tangent kernel (NTK) of these models and establish a compelling link between task arithmetic and the spatial localization of the NTK eigenfunctions. Overall, our work uncovers novel insights into the fundamental mechanisms of task arithmetic and offers a more reliable and effective approach to edit pre-trained models through the NTK linearization.
Machine Learning,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to understand the effectiveness and basic principles of task arithmetic in deep neural networks. Specifically, the paper explores why adding and subtracting the fine - tuning weights of different tasks can improve the model's performance on these tasks or make the model forget specific tasks. However, the current understanding of the effectiveness of task arithmetic and the mechanisms behind it is still limited. To this end, the author has conducted a comprehensive study, especially in vision - language models, and revealed that weight disentanglement is a key factor in the effectiveness of task arithmetic. In addition, the paper also explores how to enhance the effect of task arithmetic by linearizing the model, thus providing a more reliable and effective pre - training model editing method. ### Main research contents and contributions 1. **Formalization of task arithmetic**: - The paper formalizes the concept of task arithmetic introduced by Ilharco et al. and defines it as Property 1, allowing for quantitative analysis of task arithmetic. 2. **Relationship between task arithmetic and the neural tangent kernel (NTK)**: - Research shows that the task arithmetic of non - linear models cannot be explained solely by its NTK, but requires weight disentanglement as a necessary condition. 3. **Linearizing the model to enhance task arithmetic**: - A method of enhancing weight disentanglement by linearizing the model is proposed, which significantly improves the performance on multiple task arithmetic benchmark tests. For example, the accuracy is increased by 5.8 percentage points and 13.1 percentage points on task addition and task negation respectively. 4. **Spatial localization of weight disentanglement and kernel function eigenfunctions**: - The weight disentanglement in the linearized model is related to the spatial localization of the kernel function eigenfunctions, and this prediction is verified numerically. 5. **Weight disentanglement is an emergent property of pre - training**: - It is proved that weight disentanglement is an emergent property in the pre - training process, not the result of random initialization. ### Experimental results - **Task addition**: By adding multiple task vectors in the pre - training model, a multi - task model is generated. The results show that the linearized model performs better than the non - linear model on task addition. - **Task negation**: By subtracting a task vector from the pre - training model, the model forgets this task while maintaining the performance on the control task. The results show that the linearized model also performs better than the non - linear model on task negation. ### Conclusion The paper provides new insights into the basic mechanisms of task arithmetic, especially the key role of weight disentanglement in it. By linearizing the model, not only can the effect of task arithmetic be enhanced, but also the performance of the model on single tasks can be improved. These findings are helpful for developing more efficient and accurate model editing techniques, enabling researchers to more flexibly adapt pre - training models to various tasks.