Source-Free Domain Adaptation Guided by Vision and Vision-Language Pre-Training

Wenyu Zhang,Li Shen,Chuan-Sheng Foo
2024-10-03
Abstract:Source-free domain adaptation (SFDA) aims to adapt a source model trained on a fully-labeled source domain to a related but unlabeled target domain. While the source model is a key avenue for acquiring target pseudolabels, the generated pseudolabels may exhibit source bias. In the conventional SFDA pipeline, a large data (e.g. ImageNet) pre-trained feature extractor is used to initialize the source model at the start of source training, and subsequently discarded. Despite having diverse features important for generalization, the pre-trained feature extractor can overfit to the source data distribution during source training and forget relevant target domain knowledge. Rather than discarding this valuable knowledge, we introduce an integrated framework to incorporate pre-trained networks into the target adaptation process. The proposed framework is flexible and allows us to plug modern pre-trained networks into the adaptation process to leverage their stronger representation learning capabilities. For adaptation, we propose the Co-learn algorithm to improve target pseudolabel quality collaboratively through the source model and a pre-trained feature extractor. Building on the recent success of the vision-language model CLIP in zero-shot image recognition, we present an extension Co-learn++ to further incorporate CLIP's zero-shot classification decisions. We evaluate on 4 benchmark datasets and include more challenging scenarios such as open-set, partial-set and open-partial SFDA. Experimental results demonstrate that our proposed strategy improves adaptation performance and can be successfully integrated with existing SFDA methods. Project code is available at <a class="link-external link-https" href="https://github.com/zwenyu/colearn-plus" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to effectively transfer a model trained on a fully - labeled source domain to a related but unlabeled target domain when the source - domain data is unavailable. Specifically, the paper focuses on the Source - Free Domain Adaptation (SFDA) problem. In traditional SFDA methods, the pre - trained network is only used to initialize the source model and is discarded after the source training is completed. However, this method may lead to poor performance of the source model on the target domain, because important features in the pre - trained network may be forgotten or over - fitted to the source - domain data distribution during the source training process. To solve these problems, the paper proposes an integrated framework that incorporates the pre - trained network into the target - domain adaptation process to preserve and utilize the useful knowledge in these networks. Specific contributions include: 1. **Observation and Problem Definition**: - It is observed that fine - tuning the pre - trained network on the source domain may cause it to be over - fitted to the source - domain data distribution, thus losing the generalization ability to the target domain. - It is proposed to reuse the pre - trained network in the target - domain adaptation process to recover and insert target - domain knowledge. 2. **Proposed Framework**: - A two - branch co - learning strategy (Co - learn) is proposed, where one branch is an adaptation model based on the source model, and the other branch is a model based on the pre - trained network. - By iteratively updating these two branches, more accurate target pseudo - labels are jointly generated, thereby improving the adaptation performance of the target domain. 3. **Extended Algorithm**: - The Co - learn ++ algorithm is proposed, which further integrates the pre - trained vision - language model CLIP and uses its zero - shot classification decisions to improve the estimation of task - specific classifiers. - By combining the text encoder of CLIP, a zero - shot classifier suitable for the target - domain label space is generated, thereby improving the quality of pseudo - labels. 4. **Experimental Verification**: - The performance of the proposed method is evaluated on 4 benchmark datasets, and more challenging scenarios such as open - set, partial - set, and open - partial - set SFDA are included. - The experimental results show that the proposed framework and strategy significantly improve the adaptation performance on multiple datasets and can be successfully combined with existing SFDA methods. In conclusion, this paper aims to improve the performance of source - free domain adaptation learning by reusing the knowledge of pre - trained networks, especially when there is a large difference between the target - domain data distribution and the source - domain data distribution.