CFCI-Net: Cross-modality Feature Calibration and Integration Network for RGB-D Semantic Segmentation

Hao Zhou,Xu Yang,Lu Qi,Haojie Chen,Hai Huang,Hongde Qin
DOI: https://doi.org/10.1109/tiv.2024.3442915
IF: 8.2
2024-01-01
IEEE Transactions on Intelligent Vehicles
Abstract:RGB-D semantic segmentation has proven effective in many real-world applications by incorporating additional depth information. However, current methods often treat depth and RGB images as perfectly calibrated and directly design sophisticated architectures to fuse them. In reality, RGB and depth images frequently suffer from noise and defect problems due to hardware constraints. Fusing noisy depth and RGB features without prior calibration will diminish their representation ability. Moreover, current methods integrate RGB and depth features at a single feature level, failing to eliminate the semantic gap between them effectively. To overcome these challenges, we propose a novel feature calibration and integration network called CFCI-Net, which comprises a global-local feature calibration module (GL-FCM) and a base-shape feature integration module (BS-FIM). The GL-FCM is proposed to calibrate the RGB and depth features using their complementarity from global and local perspectives, respectively. The BS-FIM is designed to integrate the calibrated RGB and depth features by extracting and aggregating their base features and shape features, where base features provide the geometry location while shape features provide the texture information. The proposed CFCI-Net is inserted into vision-transformer-based two-stream backbones for RGB-D semantic segmentation. We conduct extensive experiments on three widely explored RGB-D semantic segmentation datasets, NYUDv2, SUN RGB-D, and 2D-3D-S, confirming the state-of-the-art performance of CFCI-Net.
What problem does this paper attempt to address?