Cross-modal multi-scale feature fusion-based RGB-T saliency object detection method

Guangyu Zhang,Lianqiang Niu
DOI: https://doi.org/10.1088/1742-6596/2562/1/012032
2023-08-18
Journal of Physics: Conference Series
Abstract:To cope with the challenge of significant target detection in complex scenes, this study proposes an RGB-T significant target detection method called CMFF. The method utilizes the complete potential of RGB and thermal infrared modal images and employs a codec structure and cross-modal multiscale feature fusion techniques. In the coding stage, two VGG16 backbone networks are used for multi-level feature extraction and CBAM attention module feature enhancement, and the enhanced features are fused using a stepwise fusion approach. Meanwhile, the weights of the two modalities are assigned using the L 1 -parametric fusion strategy to enhance the complementarity between them. In the decoding stage, global features are extracted from the high-level fused features by introducing the pyramid pooling module (PPM), and the low-level fused features are fused with multi-scale features in the up-sampling and encoding stages to enrich the global and local information of the feature map. Finally, this study conducted comparison experiments on the publicly available VT5000 dataset, and the method achieved an F-measure value of 0.863 and a mean absolute error (MAE) of 0.062, which significantly improved the overall detection performance relative to the six existing methods.
What problem does this paper attempt to address?