Layer-wise Model Merging for Unsupervised Domain Adaptation in Segmentation Tasks

Roberto Alcover-Couso,Juan C. SanMiguel,Marcos Escudero-Viñolo,Jose M Martínez
2024-09-24
Abstract:Merging parameters of multiple models has resurfaced as an effective strategy to enhance task performance and robustness, but prior work is limited by the high costs of ensemble creation and inference. In this paper, we leverage the abundance of freely accessible trained models to introduce a cost-free approach to model merging. It focuses on a layer-wise integration of merged models, aiming to maintain the distinctiveness of the task-specific final layers while unifying the initial layers, which are primarily associated with feature extraction. This approach ensures parameter consistency across all layers, essential for boosting performance. Moreover, it facilitates seamless integration of knowledge, enabling effective merging of models from different datasets and tasks. Specifically, we investigate its applicability in Unsupervised Domain Adaptation (UDA), an unexplored area for model merging, for Semantic and Panoptic Segmentation. Experimental results demonstrate substantial UDA improvements without additional costs for merging same-architecture models from distinct datasets ($\uparrow 2.6\%$ mIoU) and different-architecture models with a shared backbone ($\uparrow 6.8\%$ mIoU). Furthermore, merging Semantic and Panoptic Segmentation models increases mPQ by $\uparrow 7\%$. These findings are validated across a wide variety of UDA strategies, architectures, and datasets.
Computer Vision and Pattern Recognition,Artificial Intelligence,Multimedia
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve several key challenges in Unsupervised Domain Adaptation (UDA), especially in segmentation tasks (such as semantic segmentation and panoptic segmentation): 1. **Unstable UDA training**: UDA usually faces an unstable training process in segmentation tasks. This results in the model performance being inferior to that of supervised learning models. 2. **High computational cost**: Traditional UDA methods, such as teacher - student distillation, can improve training stability but add extra computational cost because it is necessary to infer the teacher model and train the student model during the training process. 3. **Limitations of existing model merging methods**: Existing model merging methods mainly focus on different checkpoints in the same training process or models with similar initializations, and cannot effectively combine models from different datasets and tasks. Moreover, these methods usually ignore the differences between different layers, leading to a decline in the performance of the merged model. To solve these problems, the author proposes a new layer - wise model merging method. Its main goals are: - **No extra training cost**: By directly merging the parameters of multiple pre - trained models, extra training overhead is avoided. - **Cross - task model merging**: It is able to merge models designed for different tasks, expanding the application range of model merging. - **Extensive benchmarking**: Extensive benchmarking of multiple UDA strategies, architectures and datasets has been carried out to verify the effectiveness of this method. Specifically, this method is achieved in the following ways: - **Maintaining the consistency of the initial layers**: For the shallow layers related to feature extraction, the parameters are merged in a unified way to ensure the robustness of the feature extraction ability. - **Retaining task - specific knowledge in the final layers**: For the deep layers related to task - specific outputs, the parameters of the original model are retained to maintain task - specific knowledge. - **Applicable to different architectures and datasets**: This method is not only applicable to models with the same architecture, but can also handle models trained with different architectures and different datasets. The experimental results show that this method significantly improves the model performance in the UDA setting without extra training or inference cost. For example, in the UDA setting from synthetic to real - world scenes, merging models with the same architecture can improve the mIoU by 2.6%, and merging different - architecture models sharing the same backbone can improve the mIoU by 6.8%. In addition, merging semantic segmentation and panoptic segmentation models can improve the mPQ by 7%. ### Summary By proposing a layer - wise model merging method, this paper solves the problems of training instability and high computational cost in UDA for segmentation tasks, expands the application range of model merging, and achieves better performance improvement.