Abstract:Unsupervised domain adaptive segmentation aims to improve the segmentation accuracy of models on target domains without relying on labeled data from those domains. This approach is crucial when labeled target domain data is scarce or unavailable. It seeks to align the feature representations of the source domain (where labeled data is available) and the target domain (where only unlabeled data is present), thus enabling the model to generalize well to the target domain. Current image- and video-level domain adaptation have been addressed using different and specialized frameworks, training strategies and optimizations despite their underlying connections. In this paper, we propose a unified framework PiPa++, which leverages the core idea of ``comparing'' to (1) explicitly encourage learning of discriminative pixel-wise features with intraclass compactness and inter-class separability, (2) promote the robust feature learning of the identical patch against different contexts or fluctuations, and (3) enable the learning of temporal continuity under dynamic environments. With the designed task-smart contrastive sampling strategy, PiPa++ enables the mining of more informative training samples according to the task demand. Extensive experiments demonstrate the effectiveness of our method on both image-level and video-level domain adaption benchmarks. Moreover, the proposed method is compatible with other UDA approaches to further improve the performance without introducing extra parameters.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to improve the segmentation accuracy of the model on the target domain without the need for labeled data in the target domain. Specifically, the paper focuses on the semantic segmentation task in Unsupervised Domain Adaptation (UDA), aiming to achieve a unified framework for image - level and video - level domain adaptation through self - supervised learning methods. This framework can effectively reduce the difference in feature representation between the source domain and the target domain, thereby enhancing the generalization ability of the model on the target domain. ### Main Problems and Solutions 1. **Problems**: - **High cost of data annotation**: In real - world applications, obtaining a large number of data sets with pixel - level annotations is very expensive and time - consuming. - **Domain gap**: There is a significant domain gap between synthetic data and real data, resulting in a decline in the performance of the model on the target domain. - **Limitations of existing methods**: Existing image - level and video - level UDA methods usually design specific training paradigms and optimization strategies, lacking generality and flexibility. 2. **Solutions**: - **Propose the PiPa++ framework**: This framework realizes a unified architecture for image - level and video - level UDA tasks through self - supervised learning methods. - **Multi - granularity contrastive learning**: Through pixel - level and patch - level contrastive learning, enhance the model's understanding of the local context and robustness. - **Task - intelligent sampling strategy**: According to the requirements of different tasks, a task - intelligent sample mining strategy is designed to capture more useful information. - **Temporal continuity**: In dynamic scenes, maintain temporal continuity through cross - frame temporal contrastive learning. ### Specific Methods 1. **Basic Segmentation Loss**: - **Source domain segmentation loss \( L_S^{ce} \)**: \[ L_S^{ce} = \mathbb{E} \left[ -p_S^u \log h_{cls}(g_\theta(x_S^u)) \right] \] where \( p_S^u \) is the one - hot vector of the label \( y_S^u \), \( g_\theta \) is the visual backbone network, and \( h_{cls} \) is the classification head. - **Target domain segmentation loss \( L_T^{ce} \)**: \[ L_T^{ce} = \mathbb{E} \left[ -\bar{p}_T^v \log h_{cls}(g_\theta(x_T^v)) \right] \] where \( \bar{p}_T^v \) is the one - hot vector of the pseudo - label \( \bar{y}_T^v \), and the pseudo - label is generated by the teacher network \( g_{\bar{\theta}} \). 2. **Multi - granularity contrastive learning**: - **Pixel - level contrast loss \( L_{Pixel} \)**: \[ L_{Pixel} = -\sum_{C(i) = C(j)} \log \frac{r(e_i, e_j)}{\sum_{k = 1}^{N_{pixel}} r(e_i, e_k)} \] where \( e \) is the feature map extracted through the projection head \( h_{pixel} \), \( r(e_i, e_j) = \exp \left( \frac{s(e_i, e_j)}{\tau} \right) \), \( s(e_i, e_j) \) is the cosine similarity of two pixel features, and \( \tau \) is the temperature parameter. - **Patch - level contrast loss \( L_{Patch} \)**: \[ L_{Patch} = -\sum_{O_1(i) = O_2(j)} \log \frac{r(f_i, f_j)}{\sum_{k =

PiPa++: Towards Unification of Domain Adaptive Semantic Segmentation via Self-supervised Learning

PiPa: Pixel- and Patch-wise Self-supervised Learning for Domain Adaptative Semantic Segmentation

A New Bidirectional Unsupervised Domain Adaptation Segmentation Framework

Unified Domain Adaptive Semantic Segmentation

Pixel-Level Domain Adaptation: A New Perspective for Enhancing Weakly Supervised Semantic Segmentation

Domain-Adaptive Semantic Segmentation Emerges From Vision-Language Supervised Domain-Debiased Self-Training.

Domain Adaptation for Semantic Segmentation via Patch-Wise Contrastive Learning

UniDA3D: Unified Domain Adaptive 3D Semantic Segmentation Pipeline

Video domain adaptation for semantic segmentation using perceptual consistency matching

DSP: Dual Soft-Paste for Unsupervised Domain Adaptive Semantic Segmentation

Unsupervised Adaptation of Polyp Segmentation Models via Coarse-to-Fine Self-Supervision

SiamSeg: Self-Training with Contrastive Learning for Unsupervised Domain Adaptation Semantic Segmentation in Remote Sensing

Semi-Supervised Domain Adaptation via Adaptive and Progressive Feature Alignment

DAST: Unsupervised Domain Adaptation in Semantic Segmentation Based on Discriminator Attention and Self-Training.

Affinity Space Adaptation for Semantic Segmentation Across Domains

Unsupervised domain adaptive segmentation algorithm based on two-level category alignment

SiamSeg: Self-Training with Contrastive Learning for Unsupervised Domain Adaptation in Remote Sensing

Domain-Agnostic Prior for Transfer Semantic Segmentation.

Context-Aware Mixup for Domain Adaptive Semantic Segmentation

PDAM: A Panoptic-Level Feature Alignment Framework for Unsupervised Domain Adaptive Instance Segmentation in Microscopy Images

Domain Adaptive Semantic Segmentation Via Regional Contrastive Consistency Regularization