Efficient Semantic Segmentation via Lightweight Multiple-Information Interaction Network

Yangyang Qiu,Guoan Xu,Guangwei Gao,Zhenhua Guo,Yi Yu,Chia-Wen Lin
2024-10-03
Abstract:Recently, the integration of the local modeling capabilities of Convolutional Neural Networks (CNNs) with the global dependency strengths of Transformers has created a sensation in the semantic segmentation community. However, substantial computational workloads and high hardware memory demands remain major obstacles to their further application in real-time scenarios. In this work, we propose a lightweight multiple-information interaction network for real-time semantic segmentation, called LMIINet, which effectively combines CNNs and Transformers while reducing redundant computations and memory footprint. It features Lightweight Feature Interaction Bottleneck (LFIB) modules comprising efficient convolutions that enhance context integration. Additionally, improvements are made to the Flatten Transformer by enhancing local and global feature interaction to capture detailed semantic information. The incorporation of a combination coefficient learning scheme in both LFIB and Transformer blocks facilitates improved feature interaction. Extensive experiments demonstrate that LMIINet excels in balancing accuracy and efficiency. With only 0.72M parameters and 11.74G FLOPs, LMIINet achieves 72.0% mIoU at 100 FPS on the Cityscapes test set and 69.94% mIoU at 160 FPS on the CamVid test dataset using a single RTX2080Ti GPU.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: in the semantic segmentation task, how to combine the local modeling ability of convolutional neural networks (CNNs) and the global dependency advantages of Transformers, while reducing the computational burden and memory usage, in order to achieve efficient real - time semantic segmentation. ### Specific problems and solutions: 1. **High computational complexity and hardware requirements**: - **Problem**: Although existing models that combine CNNs and Transformers perform well in semantic segmentation, their huge computational workloads and high hardware memory requirements limit their applications in real - time scenarios. - **Solution**: A lightweight multi - information interaction network (LMIINet) is proposed, which achieves efficient real - time semantic segmentation by reducing redundant computations and decreasing memory usage. 2. **Insufficient fusion of local and global information**: - **Problem**: Traditional CNNs are good at extracting local features but are limited in capturing global semantic information; while Transformers can well capture global dependency relationships but are weak in processing local spatial information. - **Solution**: A lightweight feature interaction bottleneck module (LFIB) is designed, which combines depth - wise separable convolution, asymmetric convolution, and dilated convolution techniques, significantly reducing the computational load, and introducing a combined coefficient learning scheme to promote efficient feature interaction. In addition, the Flatten Transformer is improved to enhance the interaction of local and global features to capture more detailed semantic information. 3. **Balance between model performance and efficiency**: - **Problem**: Existing models often increase computational complexity while improving performance, and it is difficult to achieve ideal efficiency in practical applications. - **Solution**: LMIINet greatly reduces the number of parameters and the amount of computation while maintaining high precision by optimizing the network structure and introducing multiple lightweight techniques. Experimental results show that LMIINet can achieve 72.0% mIoU with only 0.72M parameters and 11.74G FLOPs on the Cityscapes test set, and reaches an inference speed of 100 FPS on an RTX2080Ti GPU. ### Summary: This paper aims to effectively combine the advantages of CNNs and Transformers, reduce redundant computations and memory usage by designing a lightweight multi - information interaction network (LMIINet), thereby achieving efficient and accurate semantic segmentation in real - time scenarios.