FCPFNet: Feature Complementation Network with Pyramid Fusion for Semantic Segmentation

Jingsheng Lei,Chente Shu,Qiang Xu,Yunxiang Yu,Shengying Yang
DOI: https://doi.org/10.1007/s11063-024-11464-9
IF: 2.565
2024-02-21
Neural Processing Letters
Abstract:Traditional pyramid pooling modules have shown effective improvements in semantic segmentation tasks by capturing multi-scale feature information. However, their limitations arise from the shallow structure, which fails to fully extract contextual information, and the fused multi-scale feature information lacks distinctiveness, resulting in issues with the final segmentation discriminability. To address these issues, we proposes an effective solution called FCPFNet, which is based on global contextual prior for deep feature extraction of detailed information. Specifically, we introduce a novel deep feature aggregation module to extract semantic information from the output feature map of each layer through a deep aggregation of context information module, and expands the effective perception range. Additionally, we propose an Efficient Pyramid Pooling Module (EPPM) to capture distinctive features through communicating information between different sub-features and performs multi-scale fusion, which is integrated as a branch within the network to complement the information loss resulting from downsampling operations. Furthermore, in order to ensure the richness of image detail feature information and maintain a large receptive field to obtain more contextual information, EPPM concatenates the input feature map and the output feature map of the pyramid pooling module to acquire more comprehensive global contextual information. It has been demonstrated by experiment that the method described in this article achieves competitive performance on the challenging scene segmentation datasets Pascal VOC 2012, Cityscapes and Coco-Stuff, with MIOU of 81.0%, 78.8% and 40.1%, respectively.
computer science, artificial intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the limitations of traditional pyramid pooling modules in semantic segmentation tasks. Specifically, these limitations include: 1. **Shallow Structure**: Due to its shallow structure, the traditional pyramid pooling module cannot fully extract context information. 2. **Lack of Discrimination in Multi - scale Feature Fusion**: The fused multi - scale feature information lacks discrimination, resulting in a decline in the discriminative ability of the final segmentation. To overcome these problems, the author proposes a new method named FCPFNet (Feature Complementation Network with Pyramid Fusion for Semantic Segmentation). FCPFNet improves the traditional pyramid pooling module through the following two main modules: 1. **Deep Feature Aggregation Module (DFAM)**: - Through a multi - layer fusion strategy, jointly model and complement various features to expand the receptive field. - Capture global and local feature information to improve the accuracy of multi - scale object segmentation. 2. **Efficient Pyramid Pooling Module (EPPM)**: - Through channel shuffling operations and attention mechanisms, simultaneously capture spatial attention and channel attention. - Establish long - distance dependencies between pixels, extract more discriminative multi - level features, provide more abundant context information, and especially improve the recognition accuracy of small targets at low resolutions. Through these improvements, FCPFNet has achieved excellent performance on challenging scene segmentation datasets such as Pascal VOC 2012, Cityscapes, and Coco - Stuff, reaching MIoU (Mean Intersection over Union) metrics of 81.0%, 78.8%, and 40.1% respectively.