A transformer-based UAV instance segmentation model TF-YOLOv7

Li Tan,Zikang Liu,Xiaokai Huang,Dongfang Li,Feifei Wang
DOI: https://doi.org/10.1007/s11760-023-02992-3
IF: 1.583
2024-02-10
Signal Image and Video Processing
Abstract:In a dense target scenario in a real city, how to efficiently achieve the labeling of different targets and overcome the problem of mutual occlusion caused by the dense targets in the process becomes the key point of our UAV instance segmentation, therefore, to address the problem of mutual occlusion of targets in UAV instance segmentation, this paper proposes a model for UAV instance segmentation TF-YOLOv7. This model introduced Swin Transformer structure in the backbone network to construct a hierarchical feature map by fusing deep network feature blocks, which is well suited for the dense recognition task of instance segmentation. In addition, the Bottleneck Transformer structure was introduced in the detection stage to recognize the abstract information of the underlying features using convolution, and the higher-level information obtained through the convolution layer is processed using the self-attention mechanism, which could effectively handle large resolution images. Finally, the Focal-EioU loss function was introduced to further optimize the masking performance in mutually occluded small targets for the masking problem in occluded target segmentation and improve the segmentation effect on occluded targets. Through experimental validation on the UAV aerial photography dataset VisDroneDET, our proposed model has a 2.2% performance improvement compared with the benchmark model YOLOv7, proving that the model is suitable for UAV instance segmentation tasks.
engineering, electrical & electronic,imaging science & photographic technology
What problem does this paper attempt to address?