Abstract:Traditional pyramid pooling modules have shown effective improvements in semantic segmentation tasks by capturing multi-scale feature information. However, their limitations arise from the shallow structure, which fails to fully extract contextual information, and the fused multi-scale feature information lacks distinctiveness, resulting in issues with the final segmentation discriminability. To address these issues, we proposes an effective solution called FCPFNet, which is based on global contextual prior for deep feature extraction of detailed information. Specifically, we introduce a novel deep feature aggregation module to extract semantic information from the output feature map of each layer through a deep aggregation of context information module, and expands the effective perception range. Additionally, we propose an Efficient Pyramid Pooling Module (EPPM) to capture distinctive features through communicating information between different sub-features and performs multi-scale fusion, which is integrated as a branch within the network to complement the information loss resulting from downsampling operations. Furthermore, in order to ensure the richness of image detail feature information and maintain a large receptive field to obtain more contextual information, EPPM concatenates the input feature map and the output feature map of the pyramid pooling module to acquire more comprehensive global contextual information. It has been demonstrated by experiment that the method described in this article achieves competitive performance on the challenging scene segmentation datasets Pascal VOC 2012, Cityscapes and Coco-Stuff, with MIOU of 81.0%, 78.8% and 40.1%, respectively.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the limitations of traditional pyramid pooling modules in semantic segmentation tasks. Specifically, these limitations include: 1. **Shallow Structure**: Due to its shallow structure, the traditional pyramid pooling module cannot fully extract context information. 2. **Lack of Discrimination in Multi - scale Feature Fusion**: The fused multi - scale feature information lacks discrimination, resulting in a decline in the discriminative ability of the final segmentation. To overcome these problems, the author proposes a new method named FCPFNet (Feature Complementation Network with Pyramid Fusion for Semantic Segmentation). FCPFNet improves the traditional pyramid pooling module through the following two main modules: 1. **Deep Feature Aggregation Module (DFAM)**: - Through a multi - layer fusion strategy, jointly model and complement various features to expand the receptive field. - Capture global and local feature information to improve the accuracy of multi - scale object segmentation. 2. **Efficient Pyramid Pooling Module (EPPM)**: - Through channel shuffling operations and attention mechanisms, simultaneously capture spatial attention and channel attention. - Establish long - distance dependencies between pixels, extract more discriminative multi - level features, provide more abundant context information, and especially improve the recognition accuracy of small targets at low resolutions. Through these improvements, FCPFNet has achieved excellent performance on challenging scene segmentation datasets such as Pascal VOC 2012, Cityscapes, and Coco - Stuff, reaching MIoU (Mean Intersection over Union) metrics of 81.0%, 78.8%, and 40.1% respectively.

FCPFNet: Feature Complementation Network with Pyramid Fusion for Semantic Segmentation

SAFPN: a Full Semantic Feature Pyramid Network for Object Detection

Research of improving semantic image segmentation based on a feature fusion model

Chemical signalling in the nervous system.

Enhanced Multi-Scale Feature Adaptive Fusion Sparse Convolutional Network for Large-Scale Scenes Semantic Segmentation

CMPF-UNet: a ConvNeXt multi-scale pyramid fusion U-shaped network for multi-category segmentation of remote sensing images

A Pooling-Based Feature Pyramid Network for Salient Object Detection

Pyramid Fusion Transformer for Semantic Segmentation

CIMFNet: Cross-layer Interaction and Multiscale Fusion Network for Semantic Segmentation of High-Resolution Remote Sensing Images

Adaptive Pyramid Context Fusion for Point Cloud Perception

MAPPING NEUTRAL HYDROGEN IN EXTERNAL GALAXIES

PyramidMamba: Rethinking Pyramid Feature Fusion with Selective Space State Model for Semantic Segmentation of Remote Sensing Imagery

Semantic segmentation based on double pyramid network with improved global attention mechanism

CE-FPN: enhancing channel information for object detection

Ppednet: Pyramid Pooling Encoder-Decoder Network For Real-Time Semantic Segmentation

PCANet: Pyramid convolutional attention network for semantic segmentation

APPFNet: Adaptive point-pixel fusion network for 3D semantic segmentation with neighbor feature aggregation

ELKPPNet: An Edge-aware Neural Network with Large Kernel Pyramid Pooling for Learning Discriminative Features in Semantic Segmentation

DPNet: Dual-Pyramid Semantic Segmentation Network Based on Improved Deeplabv3 Plus

PPNet : Pooling Position Attention Network for Semantic Segmentation

Multilevel feature fusion dilated convolutional network for semantic segmentation