Abstract:Semantic segmentation is a fundamental task in the field of computer vision,which aims to assign a category label to each pixel in the input image.Many semantic segmentation networks have complex structures,high computational costs,and massive parameters.As a result,they introduce considerable latency when performing pixel-level scene under-standing on high-resolution images.These limitations greatly restrict the applicability of these methods in resource-constrained scenarios,such as autonomous driving,medical applications,and mobile devices.Therefore,real-time semantic segmentation methods,which produce high-precision segmentation masks with fast inference speeds,receive widespread attention.This study provides a systematic and critical review of real-time semantic segmentation algorithms based on deep learning techniques to explore the development of real-time semantic segmentation in recent years.More-over,it covers three key aspects of real-time semantic segmentation:real-time semantic segmentation networks,main-stream datasets,and common evaluation indicators.In addition,this study conducts a quantitative evaluation of the real-time semantic segmentation methods discussed and provides some insights into the future development in this field.First,semantic segmentation and real-time semantic segmentation tasks and their application scenarios and challenges are intro-duced.The key challenge in real-time semantic segmentation mainly lies on how to extract high-quality semantic informa-tion with high efficiency.Second,some preliminary knowledge for studying real-time semantic segmentation algorithms is introduced in detail.Specifically,this study introduces four kinds of general model compression methods:network prun-ing,neural architecture search,knowledge distillation,and parameter quantification.It also introduces some popular effi-cient CNN modules in real-time semantic segmentation networks,such as MobileNet,ShuffleNet,EfficientNet,and effi-cient Transformer modules,such as external attention,SeaFormer,and MobileViT.Then,existing real-time semantic seg-mentation algorithms are organized and summarized.In accordance with the characteristics of the overall network struc-ture,existing works are categorized into five categories:single-branch,two-branch,multibranch,U-shaped,and neural architecture search networks.Specifically,the encoder of a single-branch network is a single-branch hierarchical backbone network,and its decoder is usually lightweight and does not involve complex fusion of multiscale features.The two-branch network adopts a two-branch encoder structure,using one branch to capture spatial detail information and the other branch to model semantic context information.Multibranch networks are characterized by a multibranch structure in the encoder part of the network or a network with multiresolution inputs,where the input of each resolution passes through a different subnetwork.The U-shaped network has a contracting encoder and an expansive decoder,which are roughly symmetrical to the encoder.Most works of these aforementioned four categories are manually designed,while the neural architecture search networks are obtained using network architecture search technology based on the four types of architectures.These five categories of real-time semantic segmentation methods cover almost all real-time semantic segmentation algorithms based on deep learning,including CNN-based,Transformer-based,and hybrid-architecture-based segmentation networks.Moreover,commonly used datasets and evaluation indicators of accuracy,speed,and model size are introduced for real-time segmentation.We divided popular datasets into the autonomous driving scene and general scene datasets,and the evaluation indicators are divided into accuracy indicators and efficiency descriptors.In addition,this study quantita-tively evaluates various real-time semantic segmentation algorithms mentioned on multiple datasets by using relevant evaluation indicators.To avoid the interference of different devices when conducting a quantitative comparison between real-time semantic segmentation algorithms,this study compares the performance of advanced methods of each category with the same devices and configuration and establishes a fair and integral real-time semantic segmentation evaluation system for subsequent research,thereby contributing to a unified standard for comparison.Finally,current challenges in real-time semantic segmentation are discussed,and possible future directions for improvements are envisioned(e.g.,utilization of Transformers,applications on edge devices,knowledge transfer of visual foundation models,diver-sity of evaluation indicators,variety of datasets,utilization of multimodal data and weakly supervised methods,combi-nation with incremental learning).The algorithms,datasets,and evaluation indicators mentioned in this paper are sum-marized at for the convenience of subsequent researchers.

Are We Ready for Real-Time LiDAR Semantic Segmentation in Autonomous Driving?

A Scalable Real-time Semantic Segmentation Network for Autonomous Driving

Real-Time Semantic Segmentation of LiDAR Point Clouds on Edge Devices for Unmanned Systems

Unifying Terrain Awareness Through Real-Time Semantic Segmentation

Real-Time Semantic Segmentation of 3D Point Cloud for Autonomous Driving

LiDAR Panoptic Segmentation for Autonomous Driving

A Novel Real-Time Edge-Guided LiDAR Semantic Segmentation Network for Unstructured Environments

Ground-Aware Point Cloud Semantic Segmentation for Autonomous Driving

Real-Time Joint Semantic Segmentation and Depth Estimation Using Asymmetric Annotations

SyS3DS: Systematic Sampling of Large-Scale LiDAR Point Clouds for Semantic Segmentation in Forestry Robotics

Deep Learning-Based Real-Time Semantic Segmentation：a Survey

Real-Time Semantic Image Segmentation with Deep Learning for Autonomous Driving: A Survey

A Transformer-based Real-time LiDAR Semantic Segmentation Method for Restricted Mobile Devices

Scan-based Semantic Segmentation of LiDAR Point Clouds: An Experimental Study

SegNet4D: Efficient Instance-Aware 4D Semantic Segmentation for LiDAR Point Cloud

A Real-Time Semantic Segmentation Approach for Autonomous Driving Scenes

PointSeg: Real-Time Semantic Segmentation Based on 3D LiDAR Point Cloud

Moving Object Segmentation in 3D LiDAR Data: A Learning-based Approach Exploiting Sequential Data

On Deep Learning for Geometric and Semantic Scene Understanding Using On-Vehicle 3D LiDAR

LU-Net: A Simple Approach to 3D LiDAR Point Cloud Semantic Segmentation

LiDAR-BEVMTN: Real-Time LiDAR Bird's-Eye View Multi-Task Perception Network for Autonomous Driving