DomainVerse: A Benchmark Towards Real-World Distribution Shifts For Tuning-Free Adaptive Domain Generalization

Feng Hou,Jin Yuan,Ying Yang,Yang Liu,Yang Zhang,Cheng Zhong,Zhongchao Shi,Jianping Fan,Yong Rui,Zhiqiang He
2024-03-05
Abstract:Traditional cross-domain tasks, including domain adaptation and domain generalization, rely heavily on training model by source domain data. With the recent advance of vision-language models (VLMs), viewed as natural source models, the cross-domain task changes to directly adapt the pre-trained source model to arbitrary target domains equipped with prior domain knowledge, and we name this task Adaptive Domain Generalization (ADG). However, current cross-domain datasets have many limitations, such as unrealistic domains, unclear domain definitions, and the inability to fine-grained domain decomposition, which drives us to establish a novel dataset DomainVerse for ADG. Benefiting from the introduced hierarchical definition of domain shifts, DomainVerse consists of about 0.5 million images from 390 fine-grained realistic domains. With the help of the constructed DomainVerse and VLMs, we propose two methods called Domain CLIP and Domain++ CLIP for tuning-free adaptive domain generalization. Extensive and comprehensive experiments demonstrate the significance of the dataset and the effectiveness of the proposed methods.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper aims to address the issue of distribution shift in cross-domain tasks, particularly for Adaptable Domain Generalization (ADG) tasks in real-world scenarios. Specifically, the paper addresses the following issues: 1. **Problems with existing datasets**: Existing cross-domain datasets have numerous limitations, such as lack of realism, unclear domain definitions, insufficient number of domains, and sample imbalance. These issues make it difficult for models to generalize effectively in real-world scenarios. 2. **Model adaptability issues**: Traditional Visual Language Models (VLMs) often require fine-tuning or rely on annotated data when dealing with unknown target domains, which is costly and impractical in real applications. Therefore, a model that can adapt to any target domain without fine-tuning is proposed. 3. **Utilization of domain knowledge**: How to effectively utilize prior knowledge of the target domain to enhance model performance. By introducing detailed domain description information, the model can better understand and adapt to the characteristics of different domains. To address the above issues, the authors constructed a new large-scale dataset called DomainVerse and proposed two CLIP-based methods: Domain CLIP and Domain++ CLIP. Both methods can achieve effective domain generalization without any fine-tuning. Experimental results show that these methods significantly outperform existing methods on various evaluation metrics.