Abstract:Vision-based measurement techniques are required in the quality inspection process of various products. However, most of the existing research methods focus on the use of a single modality (red green blue (RGB) image or depth map) for defect detection. In this article, we propose a potential defect detection technique by introducing red green blue-depth (RGB-D) salient object detection (SOD) as a measurement method and presenting a hierarchical fusion and multilevel decoder network (HFMDNet). The key to the recently popular multimodal SOD lies in effectively acquiring cross-modal complementary information and realizing the interaction between cross-level information. Most existing methods attempt to employ various fusion strategies for cross-modal fusion or implement feature enhancement before fusion. However, these methods ignore the hierarchical distinctions between RGB and depth maps in cross-modal fusion, resulting in suboptimal performance in some cases of challenging situations. We fully take the cross-level information interaction both in the fusion and decoding stages into account and propose an HFMDNet. Specifically, we design a hierarchical fusion module (HFM) to compensate for modal differences between multimodal data, including a low-level feature fusion (LFF) module and a high-level feature fusion (HFF) module. Then, a multilevel refinement decoder (MRD) is designed to enhance, refine, and decode the fusion features to generate saliency maps with high quality. In addition, we introduce the edge features in the decoding phase as the auxiliary information to generate salient objects with clear boundaries. Extensive experiments conducted on nine publicly available datasets demonstrate that our HFMDNet delivers competitive and excellent performances.

A Deep Multimodal Feature Learning Network for RGB-D Salient Object Detection

Depth Cue Enhancement and Guidance Network for RGB-D Salient Object Detection

Discriminative feature fusion for RGB-D salient object detection

Attention-guided cross-modal multiple feature aggregation network for RGB-D salient object detection

Lightweight Multi-modal Representation Learning for RGB Salient Object Detection

SLMSF-Net: A Semantic Localization and Multi-Scale Fusion Network for RGB-D Salient Object Detection

CNN-Based RGB-D Salient Object Detection: Learn, Select, and Fuse

M2RNet: Multi-modal and Multi-scale Refined Network for RGB-D Salient Object Detection

Multi-modal Deep Feature Learning for RGB-D Object Detection

HFMDNet: Hierarchical Fusion and Multilevel Decoder Network for RGB-D Salient Object Detection

Compensated Attention Feature Fusion and Hierarchical Multiplication Decoder Network for RGB-D Salient Object Detection

Feature interaction and two-stage cross-modal fusion for RGB-D salient object detection

MMNet: Multi-Stage and Multi-Scale Fusion Network for RGB-D Salient Object Detection

Discriminative Cross-Modal Transfer Learning and Densely Cross-Level Feedback Fusion for RGB-D Salient Object Detection

RGB-D salient object detection via cross-modal joint feature extraction and low-bound fusion loss

Middle-Level Feature Fusion for Lightweight RGB-D Salient Object Detection

JL-DCF: Joint Learning and Densely-Cooperative Fusion Framework for RGB-D Salient Object Detection

Cross-Modal Weighting Network for RGB-D Salient Object Detection

Coordinate Attention Filtering Depth-Feature Guide Cross-Modal Fusion RGB-Depth Salient Object Detection

MFUR-Net

Discriminative unimodal feature selection and fusion for RGB-D salient object detection