Abstract:In computer vision, the task of semantic segmentation is crucial for applications such as autonomous driving and intelligent surveillance. However, achieving a balance between real-time performance and segmentation accuracy remains a significant challenge. Although Fast-SCNN is favored for its efficiency and low computational complexity, it still faces difficulties when handling complex street scene images. To address this issue, this paper presents an improved Fast-SCNN, aiming to enhance the accuracy and efficiency of semantic segmentation by incorporating a novel attention mechanism and an enhanced feature extraction module. Firstly, the integrated SimAM (Simple, Parameter-Free Attention Module) increases the network's sensitivity to critical regions of the image and effectively adjusts the feature space weights across channels. Additionally, the refined pyramid pooling module in the global feature extraction module captures a broader range of contextual information through refined pooling levels. During the feature fusion stage, the introduction of an enhanced DAB (Depthwise Asymmetric Bottleneck) block and SE (Squeeze-and-Excitation) attention optimizes the network's ability to process multi-scale information. Furthermore, the classifier module is extended by incorporating deeper convolutions and more complex convolutional structures, leading to a further improvement in model performance. These enhancements significantly improve the model's ability to capture details and overall segmentation performance. Experimental results demonstrate that the proposed method excels in processing complex street scene images, achieving a mean Intersection over Union (mIoU) of 71.7% and 69.4% on the Cityscapes and CamVid datasets, respectively, while maintaining inference speeds of 81.4 fps and 113.6 fps. These results indicate that the proposed model effectively improves segmentation quality in complex street scenes while ensuring real-time processing capabilities.

Semantic Annotation for Complex Video Street Views Based on 2D–3D Multi-Feature Fusion and Aggregated Boosting Decision Forests

Real-time Semantic Segmentation with Weighted Factorized-Depthwise Convolution

Enhanced Multi-Scale Feature Adaptive Fusion Sparse Convolutional Network for Large-Scale Scenes Semantic Segmentation

Real-Time Semantic Segmentation Algorithm for Street Scenes Based on Attention Mechanism and Feature Fusion

An RGB-D Fusion Based Semantic Segmentation Algorithm Based on Neighborhood Metric Relations

Efficient Semantic Video Segmentation with Per-Frame Inference

Real-Time Semantic Segmentation via Multiply Spatial Fusion Network

DAABNet: depth-wise asymmetric attention bottleneck for real-time semantic segmentation

DARSegNet: A Real-Time Semantic Segmentation Method Based on Dual Attention Fusion Module and Encoder-Decoder Network

AEFF-SSC: An Attention-Enhanced Feature Fusion for 3D Semantic Scene Completion

Semantic Segmentation of Point Cloud Scene via Multi-Scale Feature Aggregation and Adaptive Fusion

Inter-Level Feature Balanced Fusion Network for Street Scene Segmentation

Hybrid Dilated Convolution Network Using Attentive Kernels for Real-Time Semantic Segmentation

APPFNet: Adaptive point-pixel fusion network for 3D semantic segmentation with neighbor feature aggregation

Unsupervised segmentation via semantic-apparent feature fusion

Towards Deeper and Better Multi-view Feature Fusion for 3D Semantic Segmentation

Deep Common Feature Mining for Efficient Video Semantic Segmentation

Virtual Multi-view Fusion for 3D Semantic Segmentation

Multiscale and Multidirection Depth Map Super Resolution with Semantic Inference

Based on cross-scale fusion attention mechanism network for semantic segmentation for street scenes

Research on Efficient Asymmetric Attention Module for Real-Time Semantic Segmentation Networks in Urban Scenes