Abstract:Panoramic semantic segmentation has received widespread attention recently due to its comprehensive 360\degree field of view. However, labeling such images demands greater resources compared to pinhole images. As a result, many unsupervised domain adaptation methods for panoramic semantic segmentation have emerged, utilizing real pinhole images or low-cost synthetic panoramic images. But, the segmentation model lacks understanding of the panoramic structure when only utilizing real pinhole images, and it lacks perception of real-world scenes when only adopting synthetic panoramic images. Therefore, in this paper, we propose a new task of multi-source domain adaptation for panoramic semantic segmentation, aiming to utilize both real pinhole and synthetic panoramic images in the source domains, enabling the segmentation model to perform well on unlabeled real panoramic images in the target domain. Further, we propose Deformation Transform Aligner for Panoramic Semantic Segmentation (DTA4PASS), which converts all pinhole images in the source domains into panoramic-like images, and then aligns the converted source domains with the target domain. Specifically, DTA4PASS consists of two main components: Unpaired Semantic Morphing (USM) and Distortion Gating Alignment (DGA). Firstly, in USM, the Semantic Dual-view Discriminator (SDD) assists in training the diffeomorphic deformation network, enabling the effective transformation of pinhole images without paired panoramic views. Secondly, DGA assigns pinhole-like and panoramic-like features to each image by gating, and aligns these two features through uncertainty estimation. DTA4PASS outperforms the previous state-of-the-art methods by 1.92% and 2.19% on the outdoor and indoor multi-source domain adaptation scenarios, respectively. The source code will be released.

What problem does this paper attempt to address?

### Problems Addressed by the Paper This paper aims to address the issue of multi-source domain adaptation for panoramic semantic segmentation (MSDA4PASS). Specifically, the paper focuses on the following two main problems: 1. **High annotation cost for panoramic images**: Compared to traditional pinhole images, annotating panoramic images requires more resources and time, resulting in relatively fewer large-scale annotated datasets for panoramic images. 2. **Limitations of existing methods**: - **Using only real pinhole images**: These methods lack an understanding of the panoramic structure. - **Using only synthetic panoramic images**: These methods lack perception of real scenes. To address these issues, the paper proposes a new task—multi-source domain adaptation (MSDA4PASS), leveraging the abundant annotated pinhole images and low-cost generated synthetic panoramic images to improve the model's performance on unannotated real panoramic images. ### Solution To achieve this goal, the paper proposes a new framework—Deformation Transform Aligner for Panoramic Semantic Segmentation (DTA4PASS), which includes two main components: 1. **Unpaired Semantic Morphing (USM)**: - **Objective**: To transform all pinhole images into panoramic-like images through adversarial training, reducing the deformation gap between pinhole images and panoramic images. - **Method**: Introduces a deformation network \( F \) to generate a deformation field \( \phi_{i2a} \), making the transformed pinhole image \( x_i \circ \phi_{i2a} \) as close as possible to the panoramic image. A Semantic Dual-view Discriminator (SDD) is used for adversarial learning, and segmentation information is fed back into the learning process of the deformation network to improve the accuracy of panoramic semantic understanding. 2. **Distortion Gating Alignment (DGA)**: - **Objective**: To align multiple transformed source domains with the target domain through uncertainty estimation, reducing the texture gap between the source and target domains. - **Method**: Introduces a gating module \( g \) to assign pinhole-like and panoramic-like features to each input image, and an uncertainty estimation module to reduce the differences between these two features. ### Experimental Results The paper conducts extensive experiments in both outdoor and indoor scenarios to validate the effectiveness of the proposed method. The experimental results show that DTA4PASS improves performance by 1.92% and 2.19% over the existing state-of-the-art methods in outdoor and indoor scenarios, respectively. ### Conclusion By leveraging the abundant annotated pinhole images and low-cost generated synthetic panoramic images, DTA4PASS successfully addresses the issue of multi-source domain adaptation for panoramic semantic segmentation, significantly improving the model's performance on unannotated real panoramic images.

Multi-source Domain Adaptation for Panoramic Semantic Segmentation

PASS: Panoramic Annular Semantic Segmentation

Can We PASS Beyond the Field of View? Panoramic Annular Semantic Segmentation for Real-World Surrounding Perception

DS-PASS: Detail-Sensitive Panoramic Annular Semantic Segmentation Through SwaftNet for Surrounding Sensing

ADeLA: Automatic Dense Labeling with Attention for Viewpoint Shift in Semantic Segmentation

Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation

Transfer beyond the Field of View: Dense Panoramic Semantic Segmentation via Unsupervised Domain Adaptation

Semantics, Distortion, and Style Matter: Towards Source-free UDA for Panoramic Segmentation

DensePASS: Dense Panoramic Semantic Segmentation via Unsupervised Domain Adaptation with Attention-Augmented Context Exchange

Both Style and Distortion Matter: Dual-Path Unsupervised Domain Adaptation for Panoramic Semantic Segmentation

Omnisupervised Omnidirectional Semantic Segmentation

Aerial-PASS: Panoramic Annular Scene Segmentation in Drone Videos

360SFUDA++: Towards Source-free UDA for Panoramic Segmentation by Learning Reliable Category Prototypes

Bending Reality: Distortion-aware Transformers for Adapting to Panoramic Semantic Segmentation

Look at the Neighbor: Distortion-aware Unsupervised Domain Adaptation for Panoramic Semantic Segmentation

SGAT4PASS: Spherical Geometry-Aware Transformer for PAnoramic Semantic Segmentation

Open Panoramic Segmentation

Laformer: Vision Transformer for Panoramic Image Semantic Segmentation

MultiDAN: Unsupervised, Multistage, Multisource and Multitarget Domain Adaptation for Semantic Segmentation of Remote Sensing Images

EDAPS: Enhanced Domain-Adaptive Panoptic Segmentation

Single Frame Semantic Segmentation Using Multi-Modal Spherical Images