Remote sensing object detection based on a combination of a CNN and the Swin transformer

Liu Yang,Junhong Liang,Liang Guo,Yang Long,Kaiyan Ding,Qingfang He,Zhihang Zhang
DOI: https://doi.org/10.1080/2150704X.2023.2215896
IF: 2.369
2023-05-25
Remote Sensing Letters
Abstract:Objects in remote sensing images are typically characterized with various appearances composed of complex spatial and spectral information, making the stable feature representation of objects a difficult task. To address the above issues, we propose a new remote sensing object detection method by combining CNN (Convolutional Neural Network) and Swin Transformer. Specifically, we first propose the SCCA (Spatial-Channel Coordinate Attention) module to highlight the essential features of objects in an image by fusing spatial, channel, and location information. Then, we design a new remote sensing object detection network called SCCA-YOLOv5. SCCA-YOLOv5 integrates the advantages of CNN and Transformer, with the former being able to learn local features of objects and the latter capturing global information. Next, we add a light detection head of different scales, where the accuracy of object detection can be significantly improved especially for tiny objects. Finally, extensive experiments are conducted to demonstrate the effectiveness of the proposed object detection method. The experimental results show that the proposed model can better highlight the object region of interest, and the proposed method can improve the average accuracy by 0.82%, 1.57% and 7.47% compared with CBAM, TPH-YOLOv5 and YOLOv5, respectively, which confirms the superiority of the proposed method.
imaging science & photographic technology,remote sensing
What problem does this paper attempt to address?