Universal Domain Adaptation from Foundation Models: A Baseline Study

Bin Deng,Kui Jia
2023-11-03
Abstract:Foundation models (e.g., CLIP or DINOv2) have shown their impressive learning and transfer capabilities in a wide range of visual tasks, by training on a large corpus of data and adapting to specific downstream tasks. It is, however, interesting that foundation models have not been fully explored for universal domain adaptation (UniDA), which is to learn models using labeled data in a source domain and unlabeled data in a target one, such that the learned models can successfully adapt to the target data. In this paper, we make comprehensive empirical studies of state-of-the-art UniDA methods using foundation models. We first observe that, unlike fine-tuning from ImageNet pre-trained models, as previous methods do, fine-tuning from foundation models yields significantly poorer results, sometimes even worse than training from scratch. While freezing the backbones, we demonstrate that although the foundation models greatly improve the performance of the baseline method that trains the models on the source data alone, existing UniDA methods generally fail to improve over the baseline. This suggests that new research efforts are very necessary for UniDA using foundation models. Based on these findings, we introduce \textit{CLIP distillation}, a parameter-free method specifically designed to distill target knowledge from CLIP models. The core of our \textit{CLIP distillation} lies in a self-calibration technique for automatic temperature scaling, a feature that significantly enhances the baseline's out-class detection capability. Although simple, our method outperforms previous approaches in most benchmark tasks, excelling in evaluation metrics including H-score/H$^3$-score and the newly proposed universal classification rate (UCR) metric. We hope that our investigation and the proposed simple framework can serve as a strong baseline to facilitate future studies in this field.
Machine Learning,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper attempts to address the challenges encountered when using foundation models (such as CLIP or DINOv2) for Universal Domain Adaptation (UniDA). Specifically, the paper focuses on the following aspects: 1. **Performance of foundation models in UniDA**: Compared to the traditional method of fine-tuning models pre-trained on ImageNet, directly fine-tuning from foundation models often yields worse results, sometimes even performing worse than training from scratch. 2. **Improvement methods**: The paper finds that by freezing the backbone network of the foundation model and only updating the classifier head, performance can be significantly improved in certain cases, especially when using the CLIP model. 3. **Proposing a new method**: Based on the above observations, the paper proposes CLIP distillation, a parameter-free method aimed at distilling target domain knowledge from the CLIP model. The core of this method is an auto temperature scaling self-calibration technique, which enhances the baseline model's ability to detect outlier classes. 4. **Evaluation metrics**: The paper also introduces a new evaluation metric—Universal Classification Rate (UCR), which is a threshold-agnostic evaluation standard suitable for methods that do not consider threshold effects. Overall, this paper aims to fill the current research gap in the application of foundation models to UniDA tasks and provides a robust baseline framework for future research.