Ship object detection in one-stage framework based on Swin-Transformer.

Xu Gao,Wenli Sun
DOI: https://doi.org/10.1145/3556384.3556413
2022-01-01
Abstract:In the field of maritime computer vision, practicality is significant. Both shore-based surveillance systems and autonomous ships require highly accurate recognition and rapid detection. Swin-Transformer is a superior neural network in computer vision lately, and demonstrates powerful feature extraction capabilities. It has become a new option for the tasks of detection. On the basis of our previous proven results in applying Swin-Transformer for ship object detection in a two-stage framework, the purpose of this paper is more practical. So, we incorporate Swin-Transformer into one-stage frameworks, enabling real-time detection of ship targets in the maritime environment. By using the famous Seaships dataset for training, Swin-Transformer outperforms CNN-based models with mean average precision of 92.37% and 80.59%, serving as the backbone for YOLOv3 and SSD frameworks, respectively. Notably, detection speed can reach 23 frames per second (fps) and 27 fps, much faster than 9 fps in two-stage framework. The foundation for developing Swin-Transformer in ship object detection has been laid.
What problem does this paper attempt to address?