Abstract:Recently, the integration of the local modeling capabilities of Convolutional Neural Networks (CNNs) with the global dependency strengths of Transformers has created a sensation in the semantic segmentation community. However, substantial computational workloads and high hardware memory demands remain major obstacles to their further application in real-time scenarios. In this work, we propose a lightweight multiple-information interaction network for real-time semantic segmentation, called LMIINet, which effectively combines CNNs and Transformers while reducing redundant computations and memory footprint. It features Lightweight Feature Interaction Bottleneck (LFIB) modules comprising efficient convolutions that enhance context integration. Additionally, improvements are made to the Flatten Transformer by enhancing local and global feature interaction to capture detailed semantic information. The incorporation of a combination coefficient learning scheme in both LFIB and Transformer blocks facilitates improved feature interaction. Extensive experiments demonstrate that LMIINet excels in balancing accuracy and efficiency. With only 0.72M parameters and 11.74G FLOPs, LMIINet achieves 72.0% mIoU at 100 FPS on the Cityscapes test set and 69.94% mIoU at 160 FPS on the CamVid test dataset using a single RTX2080Ti GPU.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: in the semantic segmentation task, how to combine the local modeling ability of convolutional neural networks (CNNs) and the global dependency advantages of Transformers, while reducing the computational burden and memory usage, in order to achieve efficient real - time semantic segmentation. ### Specific problems and solutions: 1. **High computational complexity and hardware requirements**: - **Problem**: Although existing models that combine CNNs and Transformers perform well in semantic segmentation, their huge computational workloads and high hardware memory requirements limit their applications in real - time scenarios. - **Solution**: A lightweight multi - information interaction network (LMIINet) is proposed, which achieves efficient real - time semantic segmentation by reducing redundant computations and decreasing memory usage. 2. **Insufficient fusion of local and global information**: - **Problem**: Traditional CNNs are good at extracting local features but are limited in capturing global semantic information; while Transformers can well capture global dependency relationships but are weak in processing local spatial information. - **Solution**: A lightweight feature interaction bottleneck module (LFIB) is designed, which combines depth - wise separable convolution, asymmetric convolution, and dilated convolution techniques, significantly reducing the computational load, and introducing a combined coefficient learning scheme to promote efficient feature interaction. In addition, the Flatten Transformer is improved to enhance the interaction of local and global features to capture more detailed semantic information. 3. **Balance between model performance and efficiency**: - **Problem**: Existing models often increase computational complexity while improving performance, and it is difficult to achieve ideal efficiency in practical applications. - **Solution**: LMIINet greatly reduces the number of parameters and the amount of computation while maintaining high precision by optimizing the network structure and introducing multiple lightweight techniques. Experimental results show that LMIINet can achieve 72.0% mIoU with only 0.72M parameters and 11.74G FLOPs on the Cityscapes test set, and reaches an inference speed of 100 FPS on an RTX2080Ti GPU. ### Summary: This paper aims to effectively combine the advantages of CNNs and Transformers, reduce redundant computations and memory usage by designing a lightweight multi - information interaction network (LMIINet), thereby achieving efficient and accurate semantic segmentation in real - time scenarios.

Efficient Semantic Segmentation via Lightweight Multiple-Information Interaction Network

Real-time Semantic Segmentation with Weighted Factorized-Depthwise Convolution

A Scalable Real-time Semantic Segmentation Network for Autonomous Driving

LACTNet: A Lightweight Real-Time Semantic Segmentation Network Based on an Aggregated Convolutional Neural Network and Transformer

Lightweight Real-time Semantic Segmentation Network with Efficient Transformer and CNN

LRNNet: A Light-Weighted Network with Efficient Reduced Non-Local Operation for Real-Time Semantic Segmentation

LMANet: A Lightweight Asymmetric Semantic Segmentation Network Based on Multi-Scale Feature Extraction

Lightweight and Progressively-Scalable Networks for Semantic Segmentation

MIFNet: A Lightweight Multiscale Information Fusion Network

Lightweight medical image segmentation network with multi-scale feature-guided fusion

ISDNet: Integrating Shallow and Deep Networks for Efficient Ultra-high Resolution Segmentation

When Humans Meet Machines: Towards Efficient Segmentation Networks.

MFAFNet: A Lightweight and Efficient Network with Multi-Level Feature Adaptive Fusion for Real-Time Semantic Segmentation

Rethinking 1D convolution for lightweight semantic segmentation

ESDINet: Efficient Shallow-Deep Interaction Network for Semantic Segmentation of High-Resolution Aerial Images

Real-Time Semantic Segmentation via Multiply Spatial Fusion Network

MLFNet: Multi-Level Fusion Network for Real-Time Semantic Segmentation of Autonomous Driving

EMFANet: a lightweight network with efficient multi-scale feature aggregation for real-time semantic segmentation

LFFNet: lightweight feature-enhanced fusion network for real-time semantic segmentation of road scenes

Feature Fusion Network Based on Hybrid Attention for Semantic Segmentation

MSCFNet: A Lightweight Network with Multi-Scale Context Fusion for Real-Time Semantic Segmentation