Unsupervised Domain Adaptation for Referring Semantic Segmentation

Haonan Shi,Wenwen Pan,Zhou Zhao,Mingmin Zhang,Fei Wu
DOI: https://doi.org/10.1145/3581783.3611879
2023-01-01
Abstract:In this paper, we study the task of referring semantic segmentation in a highly practical setting, in which labeled visual data with corresponding text descriptions are available in the source, but only unlabeled visual data (without text descriptions) are available in the target. It is a challenging task that has many difficulties: (1) how to obtain proper queries for the target domain; (2) how to adapt visual-text joint distribution shifts; (3) how to maintain the original segmentation performance. Thus, we propose a cycle-consistent vision-language matching network to narrow down the domain gap and ease adaptation difficulty. Our model has significant practical applications since they are capable generalising to new data sources without requiring corresponding text annotations. First, a pseudo-text selector is devised to handle the missing modality, through the pre-trained clip model to measure the gap between query features of the source and visual features of the target. Next, a cross-domain segmentation predictor is adopted, which prompts the joint representations to be domain invariant and minimize the discrepancy between two domains. Then, we present a cycle-consistent query matcher to learn discriminative features via reconstructing visual features from masks. Instead of doing the textual comparison, we match the visual features to the pseudo queries. Extensive experiments show the effectiveness of our method.
What problem does this paper attempt to address?