Abstract:Lightweight semantic segmentation promotes the application of semantic segmentation in tiny devices. The existing lightweight semantic segmentation network (LSNet) has the problems of low precision and a large number of parameters. In response to the above problems, we designed a full 1D convolutional LSNet. The tremendous success of this network is attributed to the following three modules: 1D multi-layer space module (1D-MS), 1D multi-layer channel module (1D-MC), and flow alignment module (FA). The 1D-MS and the 1D-MC add global feature extraction operations based on the multi-layer perceptron (MLP) idea. This module uses 1D convolutional coding, which is more flexible than MLP. It increases the global information operation, improving features' coding ability. The FA module fuses high-level and low-level semantic information, which solves the problem of precision loss caused by the misalignment of features. We designed a 1D-mixer encoder based on the transformer structure. It performed fusion encoding of the feature space information extracted by the 1D-MS module and the channel information extracted by the 1D-MC module. 1D-mixer obtains high-quality encoded features with very few parameters, which is the key to the network's success. The attention pyramid with FA (AP-FA) uses an AP to decode features and adds a FA module to solve the problem of feature misalignment. Our network requires no pre-training and only needs a 1080Ti GPU for training. It achieved 72.6 mIoU and 95.6 FPS on the Cityscapes dataset and 70.5 mIoU and 122 FPS on the CamVid dataset. We ported the network trained on the ADE2K dataset to mobile devices, and the latency of 224 ms proves the application value of the network on mobile devices. The results on the three datasets prove that the network generalization ability we designed is powerful. Compared to state-of-the-art lightweight semantic segmentation algorithms, our designed network achieves the best balance between segmentation accuracy and parameters. The parameters of LSNet are only 0.62 M, which is currently the network with the highest segmentation accuracy within 1 M parameters.

Lightweight Convolutional Neural Networks with Context Broadcast Transformer for Real-Time Semantic Segmentation

Real-time Semantic Segmentation with Weighted Factorized-Depthwise Convolution

LACTNet: A Lightweight Real-Time Semantic Segmentation Network Based on an Aggregated Convolutional Neural Network and Transformer

A Scalable Real-time Semantic Segmentation Network for Autonomous Driving

Lightweight Real-time Semantic Segmentation Network with Efficient Transformer and CNN

ELiFormer: A Hierarchical Transformer Based Model with Efficient Encoder and Lightweight Decoder for Semantic Segmentation.

Efficient Semantic Segmentation via Lightweight Multiple-Information Interaction Network

Rethinking 1D convolution for lightweight semantic segmentation

LightFGCNet: A Lightweight and Focusing on Global Context Information Semantic Segmentation Network for Remote Sensing Imagery

LMANet: A Lightweight Asymmetric Semantic Segmentation Network Based on Multi-Scale Feature Extraction

Head-Free Lightweight Semantic Segmentation with Linear Transformer

Lightweight semantic segmentation network with configurable context and small object attention

LRNNet: A Light-Weighted Network with Efficient Reduced Non-Local Operation for Real-Time Semantic Segmentation

A Lightweight CNN–Transformer Network With Laplacian Loss for Low-Altitude UAV Imagery Semantic Segmentation

Lightweight monocular depth estimation using a fusion-improved transformer

MSCFNet: A Lightweight Network with Multi-Scale Context Fusion for Real-Time Semantic Segmentation

A Transformer-Based Decoder for Semantic Segmentation with Multi-level Context Mining

CCTNet: CNN and Cross-Shaped Transformer Hybrid Network for Remote Sensing Image Semantic Segmentation

HAFormer: Unleashing the Power of Hierarchy-Aware Features for Lightweight Semantic Segmentation

Lightweight and Progressively-Scalable Networks for Semantic Segmentation

TCNet: Multiscale Fusion of Transformer and CNN for Semantic Segmentation of Remote Sensing Images