Abstract:In computer vision, the task of semantic segmentation is crucial for applications such as autonomous driving and intelligent surveillance. However, achieving a balance between real-time performance and segmentation accuracy remains a significant challenge. Although Fast-SCNN is favored for its efficiency and low computational complexity, it still faces difficulties when handling complex street scene images. To address this issue, this paper presents an improved Fast-SCNN, aiming to enhance the accuracy and efficiency of semantic segmentation by incorporating a novel attention mechanism and an enhanced feature extraction module. Firstly, the integrated SimAM (Simple, Parameter-Free Attention Module) increases the network's sensitivity to critical regions of the image and effectively adjusts the feature space weights across channels. Additionally, the refined pyramid pooling module in the global feature extraction module captures a broader range of contextual information through refined pooling levels. During the feature fusion stage, the introduction of an enhanced DAB (Depthwise Asymmetric Bottleneck) block and SE (Squeeze-and-Excitation) attention optimizes the network's ability to process multi-scale information. Furthermore, the classifier module is extended by incorporating deeper convolutions and more complex convolutional structures, leading to a further improvement in model performance. These enhancements significantly improve the model's ability to capture details and overall segmentation performance. Experimental results demonstrate that the proposed method excels in processing complex street scene images, achieving a mean Intersection over Union (mIoU) of 71.7% and 69.4% on the Cityscapes and CamVid datasets, respectively, while maintaining inference speeds of 81.4 fps and 113.6 fps. These results indicate that the proposed model effectively improves segmentation quality in complex street scenes while ensuring real-time processing capabilities.

Real-time semantic segmentation network based on parallel atrous convolution for short-term dense concatenate and attention feature fusion

EHANet: Efficient Hybrid Attention Network Towards Real-time Semantic Segmentation

Real-time Semantic Segmentation with Weighted Factorized-Depthwise Convolution

Enhanced Multi-Scale Feature Adaptive Fusion Sparse Convolutional Network for Large-Scale Scenes Semantic Segmentation

Attention based lightweight asymmetric network for real-time semantic segmentation

Real-Time Semantic Segmentation Algorithm for Street Scenes Based on Attention Mechanism and Feature Fusion

DARSegNet: A Real-Time Semantic Segmentation Method Based on Dual Attention Fusion Module and Encoder-Decoder Network

Cross-CBAM: a lightweight network for real-time scene segmentation

Hybrid Dilated Convolution Network Using Attentive Kernels for Real-Time Semantic Segmentation

Cross-CBAM: A Lightweight network for Scene Segmentation

AttaNet: Attention-Augmented Network for Fast and Accurate Scene Parsing

MAFNet: dual-branch fusion network with multiscale atrous pyramid pooling aggregate contextual features for real-time semantic segmentation

Real-Time Semantic Segmentation via Multiply Spatial Fusion Network

Real-Time Semantic Segmentation With Fast Attention

Based on cross-scale fusion attention mechanism network for semantic segmentation for street scenes

AM‐MulFSNet: A Fast Semantic Segmentation Network Combining Attention Mechanism and Multi‐branch

Asymmetric-Convolution-Guided Multipath Fusion for Real-Time Semantic Segmentation Networks

MEDANet: More Efficient Dual Attention Network for Scene Segmentation

A Fast Attention-Guided Hierarchical Decoding Network for Real-Time Semantic Segmentation

Research on Efficient Asymmetric Attention Module for Real-Time Semantic Segmentation Networks in Urban Scenes

MFAFNet: A Lightweight and Efficient Network with Multi-Level Feature Adaptive Fusion for Real-Time Semantic Segmentation