Cross-Class Domain Adaptive Semantic Segmentation with Visual Language Models

Wenqi Ren,Ruihao Xia,Meng Zheng,Ziyan Wu,Yang Tang,Nicu Sebe
DOI: https://doi.org/10.1145/3664647.3681122
2024-01-01
Abstract:This paper addresses the issue of cross-class domain adaptation (CCDA) in semantic segmentation, where the target domain contains both shared and novel classes that are either unlabeled or unseen in the source domain. This problem is challenging, as the absence of labels for novel classes hampers the effective solutions of both cross-domain and cross-class problems. Since Visual Language Models (VLMs) have exhibited impressive generalization across diverse data distributions and are capable of generating zero-shot predictions without requiring task-specific training examples, we propose a label alignment method by leveraging VLMs to relabel pseudo labels for novel classes. Considering that VLMs typically provide only image-level predictions, we embed a two-stage method to enable fine-grained semantic segmentation and design a threshold based on the uncertainty of pseudo labels to exclude noisy VLM predictions. To further augment the supervision of novel classes, we devise memory banks with an adaptive update scheme to effectively manage accurate VLM predictions, which are then resampled to increase the sampling probability of novel classes. Through comprehensive experiments, we demonstrate the effectiveness and versatility of our proposed method across various CCDA scenarios.
What problem does this paper attempt to address?