Light-TBFNet: RGB-D Salient Detection Based on a Lightweight Two-Branch Fusion Strategy

Yun Wu,Yucheng Shi,Huaiyan Shen,Yaya Tan,Yu Wang
DOI: https://doi.org/10.1007/s11042-022-14230-y
IF: 2.577
2023-01-01
Multimedia Tools and Applications
Abstract:Aiming at the current large model for salient detection tasks, which leads to a poor balance between performance and efficiency. Therefore, we propose a lightweight and accurate salient object detection framework that adopts a dual-stream coding network to extract the depth and RGB features. For the depth feature extraction stream, the depth feature enhancement module is designed to enhance the depth features and extract valid information before layer-by-layer feature fusion with the RGB feature extraction stream to solve the influence of low-quality depth features on the fused features. Then, from the perspective of lightweight, semantic information is used to locate salient regions, spatial detail information is used to optimize salient regions, the traditional top-down fusion of the U-shaped structure is abandoned, and the decoding network is innovatively divided into a spatial detail branch and semantic information branch. The first three layers of fusion features obtained by the coding network are used to extract spatial detail features, and the last three layers of fusion features are used to extract semantic features. After that, a two-branch fusion strategy is proposed for fusing two different level of features in the way of feature interaction and reconstruction. The framework avoids the traditional top-down fusion of U-shaped structures, which increases the computational complexity and decreases inference speed due to the large resolution of low-level features, and the high-level features may be gradually diluted in the top-down propagation process. Finally, by introducing Dice and SSIM loss functions, the hybrid loss function is proposed to supervise network training. Light-TBFNet performs favorably against state-of-the-art methods on six challenging RGB-D SOD datasets with much faster speed (30FPS for the input size of 384 × 384) and fewer parameters (3.79M).
What problem does this paper attempt to address?