Semantic feature-guided and correlation-aggregated salient object detection

Jincheng Luo,Yongjun Li,Bo Li,Xinru Zhang,Chaoyue Li,Zhimin Chenjin,Dongming Zhang
DOI: https://doi.org/10.1007/s10489-023-05141-y
IF: 5.3
2023-11-14
Applied Intelligence
Abstract:Most current salient object detection (SOD) methods employ an encoder-decoder architecture based on fully convolutional neural networks. However, the subjective nature of the saliency object detection task and the local nature of convolutional processing may result in missing global contextual information. In addition, feature fusion without information filtering may introduce more noise and thus weaken the localization ability of prominent objects. Therefore, we propose a transformer-based semantic feature-guided and correlation-aggregated salient object detection (SFC-SOD) method. Specifically, the method takes a pyramid vision transformer (PVT) as the encoder backbone to extract features and designs a top-level feature guidance (TFG) module in the decoder to explore the correlation between the highest-level features and the low-level features. The low-level features are guided in the channel dimension to enhance the expression of the low-level features. Based on the features obtained from TFG, the adaptive feature fusion (AFF) module is designed to efficiently utilize the essential features of different layers for fusion to obtain salient critical information while reducing redundant features. After feature fusion, the top-down correlation-aggregation (TCA) module is introduced to further enhance and refine the salient features by using the high-level output results to guide the lower-level features to establish global dependencies, thus achieving better saliency results. The results of extensive experiments conducted on six widely used datasets show the superior performance of the proposed SFC-SOD by comparing it with several state-of-the-art methods.
computer science, artificial intelligence
What problem does this paper attempt to address?