Split to Merge: Unifying Separated Modalities for Unsupervised Domain Adaptation

Xinyao Li,Yuke Li,Zhekai Du,Fengling Li,Ke Lu,Jingjing Li
DOI: https://doi.org/10.1109/cvpr52733.2024.02205
2024-01-01
Computer Vision and Pattern Recognition
Abstract:Large vision-language models (VLMs) like CLIP have demonstrated goodzero-shot learning performance in the unsupervised domain adaptation task. Yet,most transfer approaches for VLMs focus on either the language or visualbranches, overlooking the nuanced interplay between both modalities. In thiswork, we introduce a Unified Modality Separation (UniMoS) framework forunsupervised domain adaptation. Leveraging insights from modality gap studies,we craft a nimble modality separation network that distinctly disentanglesCLIP's features into language-associated and vision-associated components. Ourproposed Modality-Ensemble Training (MET) method fosters the exchange ofmodality-agnostic information while maintaining modality-specific nuances. Wealign features across domains using a modality discriminator. Comprehensiveevaluations on three benchmarks reveal our approach sets a new state-of-the-artwith minimal computational costs. Code: https://github.com/TL-UESTC/UniMoS
What problem does this paper attempt to address?