Multi-modality information refinement fusion network for RGB-D salient object detection

Hua Bao,Bo Fan
DOI: https://doi.org/10.1007/s00371-023-03076-6
IF: 2.835
2023-09-22
The Visual Computer
Abstract:RGB-D salient object detection (SOD) has gained more and more research interest in recent years. Due to various imaging mechanisms of RGB and depth modalities, RGB-D images contain different information. Thus, how to effectively fuse multi-modality features and aggregate multi-scale features to generate accurate saliency prediction are still the problems. In this article, we present a Multi-Modality Information Refinement Fusion Network (MIRFNet) for RGB-D SOD to solve the problems. Specifically, a Feature-Enhancement and Cross-Refinement Module (FCM) is proposed to reduce redundant features and the gap between cross-modality data to achieve multi-modality feature fusion effectively. In FCM, the Feature-Enhancement step utilizes attention mechanisms to obtain enhanced features which contain less redundant information and more common salient information, and the Cross-Refinement step employs the enhanced features to reduce the gap between cross-modality features and achieve effective feature fusion. Then, we propose an Edge Guidance Module (EGM) to extract edge information from RGB features. Finally, to effectively aggregate multi-level features and achieve accurate saliency prediction, a Feature-Aggregation and Edge-Refinement Module (FEM) is designed, which introduces specific-modality information and edge information to conduct sufficient information interaction. In FEM, the Feature-Aggregation step aggregates multi-scale features with specific-modality information, and the Edge-Refinement step uses edge information to refine the aggregation features. Extensive experiments demonstrate that MIRFNet can achieve comparable performance against the other 12 SOTA methods on five datasets.
computer science, software engineering
What problem does this paper attempt to address?