Aggregating Transformers and CNNs for Salient Object Detection in Optical Remote Sensing Images.

Liuxin Bao,Xiaofei Zhou,Bolun Zheng,Haibing Yin,Zunjie Zhu,Jiyong Zhang,Chenggang Yan
DOI: https://doi.org/10.1016/j.neucom.2023.126560
IF: 6
2023-01-01
Neurocomputing
Abstract:Salient object detection (SOD) in optical remote sensing images (RSIs) plays a significant role in many areas such as agriculture, environmental protection, and the military. However, since the difference in imaging mode and image complexity between RSIs and natural scene images (NSIs), it is difficult to achieve remarkable results by directly extending the saliency method targeting NSIs to RSIs. Besides, we note that the convolutional neural networks (CNNs) based U-Net cannot effectively acquire the global long-range dependency, and the Transformer doesn't adequately characterize the spatial local details of each patch. Therefore, to conduct salient object detection in RSIs, we propose a novel two-branch architecture based network for Aggregating the Transformers and CNNs, namely ATC-Net, where the local spatial details and the global semantic information are fused into the final high-quality saliency map. Specifically, our saliency model adopts an encoder-decoder architecture including two parallel encoder branches and a decoder. Firstly, the two parallel encoder branches extract global and local features by using Transformer and CNNs, respectively. Then, the decoder employs a series of featureenhanced fusion (FF) modules to aggregate multi-level global and local features by interactive guidance and enhance the fused feature via attention mechanism. Finally, the decoder deploys the read out (RO) module to fuse the aggregated feature of FF module and the low-level CNN feature, steering the feature to focus more on spatial local details. Extensive experiments are performed on two public optical RSIs datasets, and the results show that our saliency model consistently outperforms 30 state-of-the-art methods.
What problem does this paper attempt to address?