Cross-Domain Semantic Segmentation on Inconsistent Taxonomy using VLMs

Jeongkee Lim,Yusung Kim
2024-08-05
Abstract:The challenge of semantic segmentation in Unsupervised Domain Adaptation (UDA) emerges not only from domain shifts between source and target images but also from discrepancies in class taxonomies across domains. Traditional UDA research assumes consistent taxonomy between the source and target domains, thereby limiting their ability to recognize and adapt to the taxonomy of the target domain. This paper introduces a novel approach, Cross-Domain Semantic Segmentation on Inconsistent Taxonomy using Vision Language Models (CSI), which effectively performs domain-adaptive semantic segmentation even in situations of source-target class mismatches. CSI leverages the semantic generalization potential of Visual Language Models (VLMs) to create synergy with previous UDA methods. It leverages segment reasoning obtained through traditional UDA methods, combined with the rich semantic knowledge embedded in VLMs, to relabel new classes in the target domain. This approach allows for effective adaptation to extended taxonomies without requiring any ground truth label for the target domain. Our method has shown to be effective across various benchmarks in situations of inconsistent taxonomy settings (coarse-to-fine taxonomy and open taxonomy) and demonstrates consistent synergy effects when integrated with previous state-of-the-art UDA methods. The implementation is available at <a class="link-external link-http" href="http://github.com/jkee58/CSI" rel="external noopener nofollow">this http URL</a>.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper aims to address the problem of Unsupervised Domain Adaptation (UDA) in semantic segmentation, particularly in cases where there is an inconsistency in category taxonomy between the source domain and the target domain. Traditional UDA methods assume that the category taxonomy between the source domain and the target domain is consistent, which is limited in practical applications because different scenarios or requirements can lead to category differences between the source domain and the target domain. This paper proposes a new method—Cross-Domain Semantic Segmentation on Inconsistent Taxonomy using Visual Language Models (VLMs), abbreviated as CSI. It leverages the semantic generalization capability of VLMs to effectively perform semantic segmentation even when the categories between the source domain and the target domain are inconsistent. This method allows adaptation to an extended category taxonomy without the need for real labels in the target domain. Research shows that this method performs well in various benchmark tests and can produce consistent synergistic effects when combined with existing UDA methods.