High Performance Ship Detection via Transformer and Feature Distillation

Yani Zhang,Meng Joo Er,Wenxiao Gao,Jianguo Wu
DOI: https://doi.org/10.1109/ICoIAS56028.2022.9931223
2022-01-01
Abstract:It has been successfully demonstrated that Transformer is superior in object detection via an encoder-decoder architecture, termed Detection Transformer (DETR). However, slow convergence speed and computational complexity degrade the performance when it is applied to ship detection. The primary reason is ship images are rarely available in large-scale public datasets for training a model. Besides, DETR requires high performance computing platforms to deploy and run, which is not friendly for ship detection application. Towards this end, we propose an Efficient Ship Detection Transformer termed ESDT, which comprises of three parts namely backbone, encoder and decoder. The backbone is implemented with the ResNet50 so that deep features can be extracted. Next, the extracted features are fed to the encoder implemented with multi-scale self-attention to capture the long-range dependency of features. Finally, the enhanced features are sent to the decoder for final ship detection. To speed up converge, we introduce a feature distillation mechanism to the ESDT for learning knowledge from the large pretrained DETR. Extensive experiments are performed on commonly-used ship detection dataset Seaships. Qualitative and quantitative results demonstrate the effectiveness and efficiency of our proposed method.
What problem does this paper attempt to address?