Abstract:In computer vision, the task of semantic segmentation is crucial for applications such as autonomous driving and intelligent surveillance. However, achieving a balance between real-time performance and segmentation accuracy remains a significant challenge. Although Fast-SCNN is favored for its efficiency and low computational complexity, it still faces difficulties when handling complex street scene images. To address this issue, this paper presents an improved Fast-SCNN, aiming to enhance the accuracy and efficiency of semantic segmentation by incorporating a novel attention mechanism and an enhanced feature extraction module. Firstly, the integrated SimAM (Simple, Parameter-Free Attention Module) increases the network's sensitivity to critical regions of the image and effectively adjusts the feature space weights across channels. Additionally, the refined pyramid pooling module in the global feature extraction module captures a broader range of contextual information through refined pooling levels. During the feature fusion stage, the introduction of an enhanced DAB (Depthwise Asymmetric Bottleneck) block and SE (Squeeze-and-Excitation) attention optimizes the network's ability to process multi-scale information. Furthermore, the classifier module is extended by incorporating deeper convolutions and more complex convolutional structures, leading to a further improvement in model performance. These enhancements significantly improve the model's ability to capture details and overall segmentation performance. Experimental results demonstrate that the proposed method excels in processing complex street scene images, achieving a mean Intersection over Union (mIoU) of 71.7% and 69.4% on the Cityscapes and CamVid datasets, respectively, while maintaining inference speeds of 81.4 fps and 113.6 fps. These results indicate that the proposed model effectively improves segmentation quality in complex street scenes while ensuring real-time processing capabilities.

Based on cross-scale fusion attention mechanism network for semantic segmentation for street scenes

EHANet: Efficient Hybrid Attention Network Towards Real-time Semantic Segmentation

Deep Dual-Stream Network with Scale Context Selection Attention Module for Semantic Segmentation

Enhanced Multi-Scale Feature Adaptive Fusion Sparse Convolutional Network for Large-Scale Scenes Semantic Segmentation

Real-Time Semantic Segmentation Algorithm for Street Scenes Based on Attention Mechanism and Feature Fusion

A Crossmodal Multiscale Fusion Network for Semantic Segmentation of Remote Sensing Data

MSCFNet: A Lightweight Network with Multi-Scale Context Fusion for Real-Time Semantic Segmentation

Multiscale Fusion Convolutional Network in Real-time Semantic Segmentation

MFEAFN: Multi-scale feature enhanced adaptive fusion network for image semantic segmentation

MSFANet: Multiscale Fusion Attention Network for Road Segmentation of Multispectral Remote Sensing Data

MCFNet: Multi-scale Covariance Feature Fusion Network for Real-time Semantic Segmentation

DARSegNet: A Real-Time Semantic Segmentation Method Based on Dual Attention Fusion Module and Encoder-Decoder Network

Adaptive multi-scale dual attention network for semantic segmentation

Real-Time Semantic Segmentation via Multiply Spatial Fusion Network

An Attention-Fused Network for Semantic Segmentation of Very-High-Resolution Remote Sensing Imagery

Edge-Enhanced GCIFFNet: A Multiclass Semantic Segmentation Network Based on Edge Enhancement and Multiscale Attention Mechanism

Cross-CBAM: A Lightweight network for Scene Segmentation

EMFANet: a lightweight network with efficient multi-scale feature aggregation for real-time semantic segmentation

AM‐MulFSNet: A Fast Semantic Segmentation Network Combining Attention Mechanism and Multi‐branch

ZMNet: feature fusion and semantic boundary supervision for real-time semantic segmentation

Cross-CBAM: a lightweight network for real-time scene segmentation