Feature Fusion Network Based on Hybrid Attention for Semantic Segmentation

Xie Xinchen,Chen Li,Lihua Tian
DOI: https://doi.org/10.1109/AIIOT54504.2022.9817347
2022-01-01
Abstract:In the deep learning based real-time image semantic segmentation task, there are high requirements for the inference speed of the network. Due to the small amounts of parameters of the lightweight backbones, the calculation speed is often faster, which meets the requirements of real-time tasks. However, the ability of the lightweight networks to extract features is relatively weak, resulting in much worse segmentation accuracy than the large model. Therefore, how to make full use of the lightweight networks to extract more image information to achieve better segmentation performance has become a key problem. Here, we propose an efficient feature fusion network based on attention mechanism. First, the widely used MobileNetV2 is selected as the lightweight backbone network, and then spatial attention and channel attention are calculated for both high-resolution low-level features and low-resolution high-level features, thus the final feature map got a global receptive field. Besides, through the multi-levels supervised learning for each stage of the backbone, the multi-stage auxiliary loss function enables the network to be trained more effectively. Finally, on the cityscapes dataset, the our proposed network reached 74.12% mIoU, and the inference speed remained at 110 fps.
What problem does this paper attempt to address?