Abstract:Convolutional neural networks (CNNs) have achieved milestones in object detection of synthetic aperture radar (SAR) images. Recently, vision transformers and their variants have shown great promise in detection tasks. However, ship detection in SAR images remains a substantial challenge because of the characteristics of strong scattering, multi-scale, and complex backgrounds of ship objects in SAR images. This paper proposes an enhancement Swin transformer detection network, named ESTDNet, to complete the ship detection in SAR images to solve the above problems. We adopt the Swin transformer of Cascade-R-CNN (Cascade R-CNN Swin) as a benchmark model in ESTDNet. Based on this, we built two modules in ESTDNet: the feature enhancement Swin transformer (FESwin) module for improving feature extraction capability and the adjacent feature fusion (AFF) module for optimizing feature pyramids. Firstly, the FESwin module is employed as the backbone network, aggregating contextual information about perceptions before and after the Swin transformer model using CNN. It uses single-point channel information interaction as the primary and local spatial information interaction as the secondary for scale fusion based on capturing visual dependence through self-attention, which improves spatial-to-channel feature expression and increases the utilization of ship information from SAR images. Secondly, the AFF module is a weighted selection fusion of each high-level feature in the feature pyramid with its adjacent shallow-level features using learnable adaptive weights, allowing the ship information of SAR images to be focused on the feature maps at more scales and improving the recognition and localization capability for ships in SAR images. Finally, the ablation study conducted on the SSDD dataset validates the effectiveness of the two components proposed in the ESTDNet detector. Moreover, the experiments executed on two public datasets consisting of SSDD and SARShip demonstrate that the ESTDNet detector outperforms the state-of-the-art methods, which provides a new idea for ship detection in SAR images.

EDSD: Efficient Driving Scenes Detection Based on Swin Transformer

Real-time traffic sign detection network based on Swin Transformer

SwinNet: Swin Transformer drives edge-aware RGB-D and RGB-T salient object detection

Swin Transformer-Based Edge Guidance Network for RGB-D Salient Object Detection

SwinSOD: Salient object detection using swin-transformer

An Improved Swin Transformer-Based Model for Remote Sensing Object Detection and Instance Segmentation

SwinHCST: a deep learning network architecture for scene classification of remote sensing images based on improved CNN and Transformer

SwinV2DNet: Pyramid and Self-Supervision Compounded Feature Learning for Remote Sensing Images Change Detection

Ship Detection in SAR Images Based on Feature Enhancement Swin Transformer and Adjacent Feature Fusion

Combining Swin Transformer and Attention-Weighted Fusion for Scene Text Detection

Adaptive enhanced swin transformer with U-net for remote sensing image segmentation*

Target detection based on improved swin transformer and cascade RCNN

SwinSUNet: Pure Transformer Network for Remote Sensing Image Change Detection

A Siamese Swin-Unet for image change detection

TSDet: A new method for traffic sign detection based on YOLOv5-SwinT

Swin Resnetswin Transformers for Change Detection in Remote Sensing Images

Swin Transformer coupling CNNs Makes Strong Contextual Encoders for VHR Image Road Extraction

Diverse Features Discovery Transformer for Pedestrian Attribute Recognition.

DSC-Net: Enhancing Blind Road Semantic Segmentation with Visual Sensor Using a Dual-Branch Swin-CNN Architecture