Unified Language-driven Zero-shot Domain Adaptation

Senqiao Yang,Zhuotao Tian,Li Jiang,Jiaya Jia
2024-04-11
Abstract:This paper introduces Unified Language-driven Zero-shot Domain Adaptation (ULDA), a novel task setting that enables a single model to adapt to diverse target domains without explicit domain-ID knowledge. We identify the constraints in the existing language-driven zero-shot domain adaptation task, particularly the requirement for domain IDs and domain-specific models, which may restrict flexibility and scalability. To overcome these issues, we propose a new framework for ULDA, consisting of Hierarchical Context Alignment (HCA), Domain Consistent Representation Learning (DCRL), and Text-Driven Rectifier (TDR). These components work synergistically to align simulated features with target text across multiple visual levels, retain semantic correlations between different regional representations, and rectify biases between simulated and real target visual features, respectively. Our extensive empirical evaluations demonstrate that this framework achieves competitive performance in both settings, surpassing even the model that requires domain-ID, showcasing its superiority and generalization ability. The proposed method is not only effective but also maintains practicality and efficiency, as it does not introduce additional computational costs during inference. Our project page is
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to address some key limitations in Zero-shot Domain Adaptation, particularly the issues of model flexibility and scalability in the absence of target domain data in practical applications. Specifically: 1. **Limitations of existing methods**: - Existing language-driven zero-shot domain adaptation methods (such as PØDA) require specific domain IDs to select the corresponding model, which may limit the model's flexibility and scalability in practical applications. - They rely on specific domain data for fine-tuning during training, which may not be directly accessible in some cases due to privacy or data scarcity. 2. **Proposed new task setting**: - The paper introduces a new task setting—Unified Language-driven Zero-shot Domain Adaptation (ULDA), which allows a single model to adapt to multiple different target domains at test time without explicit domain IDs. - ULDA utilizes only source domain data and textual descriptions of the target domain for training, thus avoiding the need for direct access to target domain images. 3. **Methods to address new challenges**: - To overcome the above challenges, the authors propose a new framework comprising three main components: Hierarchical Context Alignment (HCA), Domain Consistent Representation Learning (DCRL), and Text-Driven Rectifier (TDR). - These components work together to align simulated features with target texts at multiple visual levels, preserve semantic correlations between different region representations, and correct biases between simulated features and real target visual features. Through these methods, the paper demonstrates that the proposed framework not only performs well under traditional settings but also shows competitive performance under the new ULDA setting, proving its superiority and generalization capability.