FusFormer: global and detail feature fusion transformer for semantic segmentation of small objects

Zheng Li,Houjin Chen,Jupeng Li,Song Peng,Zhenhao Zhang,Baozheng Wang,Changyong Wang
DOI: https://doi.org/10.1007/s11042-024-18911-8
IF: 2.577
2024-03-26
Multimedia Tools and Applications
Abstract:Improving the segmentation accuracy of small objects is essential for tasks such as autono-mous driving and remote sensing. However, the current main semantic segmentation methods are inadequate for small objects. To improve the segmentation accuracy of small objects, long-range global information and fine local details are needed, and neither pure Convolutional Neural Networks (CNNs) nor Vision Transformers (ViTs) can effectively provide these two different types of information simultaneously. In this paper, we introduce a novel model FusFormer for finding a solution, which contains a global branch and a detailed branch to fully capture the long-range features and spatial detail features from the input image. The global branch is based on MiT-B2 to efficiently acquire global context, while the detailed branch acquires rich local detail information by Spatial Prior Module (SPM) and Multi-scale Module (MSM). Feature Interaction Module (FIM) is proposed to perform information fusion across features at a dual scale. In addition, Multi-scale Edge Extraction Module (MSEEM) is utilized to supplement the missing edge information during model training, helping the model to better enhance the intra-class consistency of small objects. Extensive experiments on Cityscapes, ADE20K and PASCAL VOC 2012 show that our model achieves competitive overall segmentation accuracy, especially on small objects. FusFormer achieves 82.6%$$\%$$, 47.3%$$\%$$ and 82.4%$$\%$$ mIoU on the Cityscapes, ADE20K and PASCAL VOC 2012 validation sets, compared with other state-of-the-art methods, the proposed model significantly improves the IoU on small objects by 2%$$\%$$-4%$$\%$$.
computer science, information systems, theory & methods,engineering, electrical & electronic, software engineering
What problem does this paper attempt to address?