An Image Segmentation Method Based on Transformer and Multi-Scale Feature Fusion for UAV Marine Environment Monitoring

Yingying Liu,Fengqin Yao,Laihui Ding,Zhiwei Xu,Xiaogang Yang,Shengke Wang
DOI: https://doi.org/10.1109/icivc58118.2023.10270243
2023-01-01
Abstract:UAVs are widely used in marine environment monitoring and ecological protection, and accurate segmentation of UAV images is the basis for achieving these applications. However, the marine images taken by UAVs often have the problems of large intra-class scale changes and high inter-class similarity, resulting in a sharp decline in the performance of traditional semantic segmentation methods in this segmentation task. In order to alleviate this dilemma, we construct a new semantic segmentation network (EMFNet) combining transformer and multi-scale feature fusion to achieve accurate segmentation of UAV ocean images. For larger intra-class variations, we introduce transformer blocks made up of external attention (EA) to capture potential correlations between data samples. For the high similarity between different semantic categories, we specially construct feature fusion branches to learn diverse feature information. Inspired by camouflage object detection, in the fusion branch, we design a mixed convolutional attention (MCA) module to increase the receptive field and a dual attention fusion module (DAFM) to realize cross-level learnin. At the same time, we apply the auxiliary segmentation head to the fused features to refine the segmentation results layer by layer and discard all auxiliary segmentation headers in the inference stage to maintain real-time inference. Experimental results show that our designed EMFNet achieves 77.13% MIoU on the public dataset Cityscapes and 61.41% MIoU on our UAV-OUC-SEG dataset, respectively, and maintains a real-time inference speed of 61.6FPS.
What problem does this paper attempt to address?