Cross-CBAM: a lightweight network for real-time scene segmentation

Zhengbin Zhang,Zhenhao Xu,Xingsheng Gu,Juan Xiong
DOI: https://doi.org/10.1007/s11554-024-01414-y
IF: 2.293
2024-02-25
Journal of Real-Time Image Processing
Abstract:Real-time semantic segmentation poses a significant challenge in scene parsing. Despite traditional semantic segmentation networks have made remarkable leap-forwards in semantic accuracy, the performance of inference speed remains unsatisfactory. This paper introduces the Cross-CBAM network, a novel lightweight architecture designed for real-time semantic segmentation. Specifically, a Squeeze-and-Excitation Atrous Spatial Pyramid Pooling Module (SE-ASPP) is proposed to obtain variable field-of-view and multiscale information. Additionally, we propose a Cross Convolutional Block Attention Module (CCBAM), wherein a cross-multiply operation guides low-level detail information with high-level semantic information. Unlike previous approaches that leverage attention to concentrate on the relevant information in the backbone, CCBAM utilizes cross-attention for feature fusion within the Feature Pyramid Network (FPN) structure. Extensive experiments on the Cityscapes dataset and Camvid dataset demonstrate the effectiveness of the proposed Cross-CBAM model by achieving a promising trade-off between segmentation accuracy and inference speed. On the Cityscapes test set, we achieve 73.4% mIoU with a speed of 240.9 FPS and 77.2% mIoU with a speed of 88.6 FPS on NVIDIA GTX 1080Ti.
computer science, artificial intelligence,engineering, electrical & electronic,imaging science & photographic technology
What problem does this paper attempt to address?