Efficient Parallel Multi-Scale Detail and Semantic Encoding Network for Lightweight Semantic Segmentation

Xiao Liu,Xiuya Shi,Lufei Chen,Linbo Qing,Chao Ren
DOI: https://doi.org/10.1145/3581783.3611848
2023-01-01
Abstract:In this work, we propose PMSDSEN, a parallel multi-scale encoder-decoder network architecture for semantic segmentation, inspired by the human visual perception system's ability to aggregate contextual information in various contexts and scales. Our approach introduces the efficient Parallel Multi-Scale Detail and Semantic Encoding (PMSDSE) unit to extract detailed local information and coarse large-range relationships in parallel, enabling the recognition of object boundaries and object-level areas. By stacking multiple PMSDSEs, our network learns fine-grained details and textures along with abstract category and semantic information, effectively utilizing a larger range of surrounding context information for robust segmentation. To further enhance the network's receptive field without increasing computational complexity, the Multi-Scale Semantic Extractor (MSSE) at the end of the encoder is utilized for multi-scale semantic context extraction and detailed information encoding. Additionally, the Dynamic Weighted Feature Fusion (DWFF) strategy is employed to integrate shallow layer detail information and deep layer semantic information during the decoder stage. Our method can obtain multi-scale context from local to global, achieving efficiently low-level feature extraction to high-level semantic interpretation at different scales and in different contexts. Without bells and whistles, PMSDSEN obtains a better trade-off between accuracy and complexity on popular benchmarks, including Cityscapes and Camvid. Specifically, PMSDSEN attains 73.2% mIoU with only 0.9M parameters on the Cityscapes test set. Codes and supplementary materials link: https://github.com/liux520/PMSDSEN.
What problem does this paper attempt to address?