Cross-CBAM: A Lightweight network for Scene Segmentation

Zhengbin Zhang,Zhenhao Xu,Xingsheng Gu,Juan Xiong

2023-06-04

Abstract:Scene parsing is a great challenge for real-time semantic segmentation. Although traditional semantic segmentation networks have made remarkable leap-forwards in semantic accuracy, the performance of inference speed is unsatisfactory. Meanwhile, this progress is achieved with fairly large networks and powerful computational resources. However, it is difficult to run extremely large models on edge computing devices with limited computing power, which poses a huge challenge to the real-time semantic segmentation tasks. In this paper, we present the Cross-CBAM network, a novel lightweight network for real-time semantic segmentation. Specifically, a Squeeze-and-Excitation Atrous Spatial Pyramid Pooling Module(SE-ASPP) is proposed to get variable field-of-view and multiscale information. And we propose a Cross Convolutional Block Attention Module(CCBAM), in which a cross-multiply operation is employed in the CCBAM module to make high-level semantic information guide low-level detail information. Different from previous work, these works use attention to focus on the desired information in the backbone. CCBAM uses cross-attention for feature fusion in the FPN structure. Extensive experiments on the Cityscapes dataset and Camvid dataset demonstrate the effectiveness of the proposed Cross-CBAM model by achieving a promising trade-off between segmentation accuracy and inference speed. On the Cityscapes test set, we achieve 73.4% mIoU with a speed of 240.9FPS and 77.2% mIoU with a speed of 88.6FPS on NVIDIA GTX 1080Ti.

Computer Vision and Pattern Recognition,Machine Learning,Image and Video Processing

What problem does this paper attempt to address?

### Problems Addressed by the Paper This paper aims to address the challenges in real-time semantic segmentation tasks, particularly achieving high performance on edge computing devices while maintaining high segmentation accuracy and inference speed. #### Main Contributions 1. **Proposed a new feature fusion module (CCBAM)**: This module uses cross-multiplication operations to allow high-level semantic information to guide low-level detail information and enhances feature representation through channel attention and spatial attention. 2. **Designed a lightweight squeeze-excitation atrous spatial pyramid pooling module (SE-ASPP)**: This module not only obtains a larger variable receptive field and multi-scale information but also reduces model complexity and ensures segmentation accuracy. 3. **Developed a lightweight real-time semantic segmentation network**: The Cross-CBAM network's experimental results on the Cityscapes and CamVid datasets show that it achieves a good balance between segmentation accuracy and inference speed. #### Experimental Results - On the Cityscapes test set, the Cross-CBAM-L1 network achieved 75.1% mIoU with a speed of 187.9 FPS. - The Cross-CBAM-L2 network achieved 77.2% mIoU with a speed of 88.6 FPS. These results demonstrate the superior performance of the Cross-CBAM network in real-time semantic segmentation tasks.

Cross-CBAM: A Lightweight network for Scene Segmentation

Cross-CBAM: a lightweight network for real-time scene segmentation

EHANet: Efficient Hybrid Attention Network Towards Real-time Semantic Segmentation

Real-time Semantic Segmentation with Weighted Factorized-Depthwise Convolution

A Scalable Real-time Semantic Segmentation Network for Autonomous Driving

Real-Time Semantic Segmentation Algorithm for Street Scenes Based on Attention Mechanism and Feature Fusion

Based on cross-scale fusion attention mechanism network for semantic segmentation for street scenes

Enhanced Multi-Scale Feature Adaptive Fusion Sparse Convolutional Network for Large-Scale Scenes Semantic Segmentation

Real-Time Semantic Segmentation With Fast Attention

Real-time semantic segmentation network based on parallel atrous convolution for short-term dense concatenate and attention feature fusion

Real-Time High-Performance Semantic Image Segmentation of Urban Street Scenes

BMSeNet: Multiscale Context Pyramid Pooling and Spatial Detail Enhancement Network for Real-Time Semantic Segmentation

BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation

CCNet: Criss-Cross Attention for Semantic Segmentation

Attention based lightweight asymmetric network for real-time semantic segmentation

Research on Efficient Asymmetric Attention Module for Real-Time Semantic Segmentation Networks in Urban Scenes

Lightweight semantic segmentation network with configurable context and small object attention

AttaNet: Attention-Augmented Network for Fast and Accurate Scene Parsing

BiSeNet V3: Bilateral Segmentation Network with Coordinate Attention for Real-time Semantic Segmentation

Asymmetric-Convolution-Guided Multipath Fusion for Real-Time Semantic Segmentation Networks