Abstract:Foundation models (e.g., CLIP or DINOv2) have shown their impressive learning and transfer capabilities in a wide range of visual tasks, by training on a large corpus of data and adapting to specific downstream tasks. It is, however, interesting that foundation models have not been fully explored for universal domain adaptation (UniDA), which is to learn models using labeled data in a source domain and unlabeled data in a target one, such that the learned models can successfully adapt to the target data. In this paper, we make comprehensive empirical studies of state-of-the-art UniDA methods using foundation models. We first observe that, unlike fine-tuning from ImageNet pre-trained models, as previous methods do, fine-tuning from foundation models yields significantly poorer results, sometimes even worse than training from scratch. While freezing the backbones, we demonstrate that although the foundation models greatly improve the performance of the baseline method that trains the models on the source data alone, existing UniDA methods generally fail to improve over the baseline. This suggests that new research efforts are very necessary for UniDA using foundation models. Based on these findings, we introduce \textit{CLIP distillation}, a parameter-free method specifically designed to distill target knowledge from CLIP models. The core of our \textit{CLIP distillation} lies in a self-calibration technique for automatic temperature scaling, a feature that significantly enhances the baseline's out-class detection capability. Although simple, our method outperforms previous approaches in most benchmark tasks, excelling in evaluation metrics including H-score/H$^3$-score and the newly proposed universal classification rate (UCR) metric. We hope that our investigation and the proposed simple framework can serve as a strong baseline to facilitate future studies in this field.

What problem does this paper attempt to address?

The paper attempts to address the challenges encountered when using foundation models (such as CLIP or DINOv2) for Universal Domain Adaptation (UniDA). Specifically, the paper focuses on the following aspects: 1. **Performance of foundation models in UniDA**: Compared to the traditional method of fine-tuning models pre-trained on ImageNet, directly fine-tuning from foundation models often yields worse results, sometimes even performing worse than training from scratch. 2. **Improvement methods**: The paper finds that by freezing the backbone network of the foundation model and only updating the classifier head, performance can be significantly improved in certain cases, especially when using the CLIP model. 3. **Proposing a new method**: Based on the above observations, the paper proposes CLIP distillation, a parameter-free method aimed at distilling target domain knowledge from the CLIP model. The core of this method is an auto temperature scaling self-calibration technique, which enhances the baseline model's ability to detect outlier classes. 4. **Evaluation metrics**: The paper also introduces a new evaluation metric—Universal Classification Rate (UCR), which is a threshold-agnostic evaluation standard suitable for methods that do not consider threshold effects. Overall, this paper aims to fill the current research gap in the application of foundation models to UniDA tasks and provides a robust baseline framework for future research.

Universal Domain Adaptation from Foundation Models: A Baseline Study

MCKD: Mutually Collaborative Knowledge Distillation for Federated Domain Adaptation and Generalization

Class-Level Adaptation Network with Self Training for Unsupervised Domain Adaptation

Dual Contrastive Universal Adaptation Network.

A New Learning Paradigm for Foundation Model-Based Remote-Sensing Change Detection

Source-Free Domain Adaptation with Frozen Multimodal Foundation Model

Domain consensual contrastive learning for few-shot universal domain adaptation

A New Learning Paradigm for Foundation Model-based Remote Sensing Change Detection

Learning to Detect Open Classes for Universal Domain Adaptation

On Universal Black-Box Domain Adaptation

UDA-Bench: Revisiting Common Assumptions in Unsupervised Domain Adaptation Using a Standardized Framework

Domain Consensus Clustering for Universal Domain Adaptation

HyUniDA: Breaking Label Set Constraints for Universal Domain Adaptation in Cross-Scene Hyperspectral Image Classification

Domain-Aware Fine-Tuning of Foundation Models

Upcycling Models under Domain and Category Shift

CLDA: Collaborative Learning for Enhanced Unsupervised Domain Adaptation

Universal Semi-Supervised Domain Adaptation by Mitigating Common-Class Bias

Prediction of Common Labels for Universal Domain Adaptation

Universal Domain Adaptation for Hyperspectral Image Classification

Uni3DA: Universal 3D Domain Adaptation for Object Recognition

COCA: Classifier-Oriented Calibration via Textual Prototype for Source-Free Universal Domain Adaptation