Multi-source Domain Adaptation for Panoramic Semantic Segmentation

Jing Jiang,Sicheng Zhao,Jiankun Zhu,Wenbo Tang,Zhaopan Xu,Jidong Yang,Pengfei Xu,Hongxun Yao
2024-08-29
Abstract:Panoramic semantic segmentation has received widespread attention recently due to its comprehensive 360\degree field of view. However, labeling such images demands greater resources compared to pinhole images. As a result, many unsupervised domain adaptation methods for panoramic semantic segmentation have emerged, utilizing real pinhole images or low-cost synthetic panoramic images. But, the segmentation model lacks understanding of the panoramic structure when only utilizing real pinhole images, and it lacks perception of real-world scenes when only adopting synthetic panoramic images. Therefore, in this paper, we propose a new task of multi-source domain adaptation for panoramic semantic segmentation, aiming to utilize both real pinhole and synthetic panoramic images in the source domains, enabling the segmentation model to perform well on unlabeled real panoramic images in the target domain. Further, we propose Deformation Transform Aligner for Panoramic Semantic Segmentation (DTA4PASS), which converts all pinhole images in the source domains into panoramic-like images, and then aligns the converted source domains with the target domain. Specifically, DTA4PASS consists of two main components: Unpaired Semantic Morphing (USM) and Distortion Gating Alignment (DGA). Firstly, in USM, the Semantic Dual-view Discriminator (SDD) assists in training the diffeomorphic deformation network, enabling the effective transformation of pinhole images without paired panoramic views. Secondly, DGA assigns pinhole-like and panoramic-like features to each image by gating, and aligns these two features through uncertainty estimation. DTA4PASS outperforms the previous state-of-the-art methods by 1.92% and 2.19% on the outdoor and indoor multi-source domain adaptation scenarios, respectively. The source code will be released.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### Problems Addressed by the Paper This paper aims to address the issue of multi-source domain adaptation for panoramic semantic segmentation (MSDA4PASS). Specifically, the paper focuses on the following two main problems: 1. **High annotation cost for panoramic images**: Compared to traditional pinhole images, annotating panoramic images requires more resources and time, resulting in relatively fewer large-scale annotated datasets for panoramic images. 2. **Limitations of existing methods**: - **Using only real pinhole images**: These methods lack an understanding of the panoramic structure. - **Using only synthetic panoramic images**: These methods lack perception of real scenes. To address these issues, the paper proposes a new task—multi-source domain adaptation (MSDA4PASS), leveraging the abundant annotated pinhole images and low-cost generated synthetic panoramic images to improve the model's performance on unannotated real panoramic images. ### Solution To achieve this goal, the paper proposes a new framework—Deformation Transform Aligner for Panoramic Semantic Segmentation (DTA4PASS), which includes two main components: 1. **Unpaired Semantic Morphing (USM)**: - **Objective**: To transform all pinhole images into panoramic-like images through adversarial training, reducing the deformation gap between pinhole images and panoramic images. - **Method**: Introduces a deformation network \( F \) to generate a deformation field \( \phi_{i2a} \), making the transformed pinhole image \( x_i \circ \phi_{i2a} \) as close as possible to the panoramic image. A Semantic Dual-view Discriminator (SDD) is used for adversarial learning, and segmentation information is fed back into the learning process of the deformation network to improve the accuracy of panoramic semantic understanding. 2. **Distortion Gating Alignment (DGA)**: - **Objective**: To align multiple transformed source domains with the target domain through uncertainty estimation, reducing the texture gap between the source and target domains. - **Method**: Introduces a gating module \( g \) to assign pinhole-like and panoramic-like features to each input image, and an uncertainty estimation module to reduce the differences between these two features. ### Experimental Results The paper conducts extensive experiments in both outdoor and indoor scenarios to validate the effectiveness of the proposed method. The experimental results show that DTA4PASS improves performance by 1.92% and 2.19% over the existing state-of-the-art methods in outdoor and indoor scenarios, respectively. ### Conclusion By leveraging the abundant annotated pinhole images and low-cost generated synthetic panoramic images, DTA4PASS successfully addresses the issue of multi-source domain adaptation for panoramic semantic segmentation, significantly improving the model's performance on unannotated real panoramic images.