Abstract:Real-time semantic segmentation plays a significant role in industry applications, such as autonomous driving, robotics and so on. It is a challenging task as both efficiency and performance need to be considered simultaneously. To address such a complex task, this paper proposes an efficient CNN called Multiply Spatial Fusion Network (MSFNet) to achieve fast and accurate perception. The proposed MSFNet uses Class Boundary Supervision to process the relevant boundary information based on our proposed Multi-features Fusion Module which can obtain spatial information and enlarge receptive field. Therefore, the final upsampling of the feature maps of 1/8 original image size can achieve impressive results while maintaining a high speed. Experiments on Cityscapes and Camvid datasets show an obvious advantage of the proposed approach compared with the existing approaches. Specifically, it achieves 77.1% Mean IOU on the Cityscapes test dataset with the speed of 41 FPS for a 1024*2048 input, and 75.4% Mean IOU with the speed of 91 FPS on the Camvid test dataset.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to maintain high efficiency and high performance simultaneously in real - time semantic segmentation tasks. Specifically, real - time semantic segmentation has a wide range of applications in fields such as autonomous driving and robotics, but this task is challenging because it needs to improve the accuracy of segmentation while ensuring speed. Existing methods often accelerate the inference speed by reducing the resolution of the input image or decreasing the number of feature channels, but this will lead to the loss of spatial information, especially the loss of edge information, thus affecting performance. To solve these problems, the paper proposes an efficient convolutional neural network (CNN), called Multiply Spatial Fusion Network (MSFNet). The main contributions of MSFNet are as follows: 1. **Proposing the Multi - features Fusion Module (MFM)**: Expand the receptive field and recover the loss of spatial information through the designed Spatial Aware Pooling (SAP), while maintaining a relatively small computational cost. 2. **Introducing Class Boundary Supervision (CBS)**: Used to avoid the loss of edge - related spatial information, especially during the up - sampling process. 3. **Experimental results show**: The experimental results on the Cityscapes and CamVid datasets show that MSFNet outperforms most of the existing real - time segmentation methods in both accuracy and inference speed. For example, on the Cityscapes test dataset, MSFNet achieves an average Mean Intersection over Union (Mean IOU) of 77.1% while processing 1024×2048 input images at a speed of 41 frames per second (FPS); on the CamVid test dataset, MSFNet achieves an average Mean IOU of 75.4% while processing images at a speed of 91 FPS. In conclusion, this paper aims to solve the contradiction between efficiency and performance in real - time semantic segmentation by proposing MSFNet. Through the innovative network structure and supervision mechanism, it realizes efficient and accurate semantic segmentation on high - resolution images.

Real-Time Semantic Segmentation via Multiply Spatial Fusion Network

Real-time Semantic Segmentation with Weighted Factorized-Depthwise Convolution

Enhanced Multi-Scale Feature Adaptive Fusion Sparse Convolutional Network for Large-Scale Scenes Semantic Segmentation

Real-Time Semantic Segmentation Algorithm for Street Scenes Based on Attention Mechanism and Feature Fusion

Multiscale Fusion Convolutional Network in Real-time Semantic Segmentation

ZMNet: feature fusion and semantic boundary supervision for real-time semantic segmentation

DARSegNet: A Real-Time Semantic Segmentation Method Based on Dual Attention Fusion Module and Encoder-Decoder Network

MFAFNet: A Lightweight and Efficient Network with Multi-Level Feature Adaptive Fusion for Real-Time Semantic Segmentation

MLFNet: Multi-Level Fusion Network for Real-Time Semantic Segmentation of Autonomous Driving

Real-Time Semantic Segmentation With Fast Attention

MCFNet: Multi-scale Covariance Feature Fusion Network for Real-time Semantic Segmentation

Asymmetric-Convolution-Guided Multipath Fusion for Real-Time Semantic Segmentation Networks

MSCFNet: A Lightweight Network with Multi-Scale Context Fusion for Real-Time Semantic Segmentation

MAFNet: dual-branch fusion network with multiscale atrous pyramid pooling aggregate contextual features for real-time semantic segmentation

BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation

AM‐MulFSNet: A Fast Semantic Segmentation Network Combining Attention Mechanism and Multi‐branch

Real-Time Semantic Segmentation via Spatial-Detail Guided Context Propagation

FBSNet: A Fast Bilateral Symmetrical Network for Real-Time Semantic Segmentation

Real-Time High-Performance Semantic Image Segmentation of Urban Street Scenes

Based on cross-scale fusion attention mechanism network for semantic segmentation for street scenes