Multimodal Contrastive Learning for Unpaired and Depth-privileged Semantic Segmentation.

Xiaobo Yang,Danning Ke,Xiaojin Gong
DOI: https://doi.org/10.1109/WCSP58612.2023.10405240
2023-01-01
Abstract:Depth privileged semantic segmentation assumes that RGB and depth information are provided for training but depth is not available during testing. In this work, we propose a multimodal contrastive learning approach to learn a depth privileged segmentation model from unpaired color and depth images. We first construct a two-stream network, in which each stream performs semantic segmentation based on either RGB or depth modality. Meanwhile, features of each modality are projected into a common embedding space, and the embedded features are attracted or repelled under the supervision of an inter-modal contrastive learning (CL) loss. Additionally, we integrate a tunable self-attention module to make the sparse CL supervision more effectively. By mining modality-invariant features, the proposed inter-modal contrastive learning approach enables our model to gain robustness against illumination changes even if no depth information is used during test time. Experiments on two datasets show that our approach performs competitively with previous RGB-D semantic segmentation methods that require color and depth images to be paired and well-aligned for training and testing.
What problem does this paper attempt to address?