ZMNet: feature fusion and semantic boundary supervision for real-time semantic segmentation

Ya Li,Ziming Li,Huiwang Liu,Qing Wang
DOI: https://doi.org/10.1007/s00371-024-03448-6
IF: 2.835
2024-06-22
The Visual Computer
Abstract:Feature fusion module is an essential component of real-time semantic segmentation networks to bridge the semantic gap among different feature layers. However, many networks are inefficient in multi-level feature fusion. In this paper, we propose a simple yet effective decoder that consists of a series of multi-level attention feature fusion modules (MLA-FFMs) aimed at fusing multi-level features in a top-down manner. Specifically, MLA-FFM is a lightweight attention-based module. Therefore, it can not only efficiently fuse features to bridge the semantic gap at different levels, but also be applied to real-time segmentation tasks. In addition, to solve the problem of low accuracy of existing real-time segmentation methods at semantic boundaries, we propose a semantic boundary supervision module (BSM) to improve the accuracy by supervising the prediction of semantic boundaries. Extensive experiments demonstrate that our network achieves a state-of-the-art trade-off between segmentation accuracy and inference speed on both Cityscapes and CamVid datasets. On a single NVIDIA GeForce 1080Ti GPU, our model achieves 77.4% mIoU with a speed of 97.5 FPS on the Cityscapes test dataset, and 74% mIoU with a speed of 156.6 FPS on the CamVid test dataset, which is superior to most state-of-the-art real-time methods.
computer science, software engineering
What problem does this paper attempt to address?