Abstract:Benefitting from the development of pyramidal feature learning, current state-of-the-art multi-scale detection paradigm has become proficient in detecting objects of varying scales. However, feature pyramid network (FPN), in spite of constructing multi-scale features with strong semantics, still suffers from limited performance caused by insufficient detail exploitation, information loss, limited receptive fields and hard proposal assignment, which can be mainly categorized into semantic level and instance level. To address these limitations, this paper analyzes the structural components that inhibit multi-scale feature representation and then presents a multi-stage progressive FPN (ProFPN) along with a novel RoI feature representation method called soft proposal assignment. In the semantic level, the bottom-up interaction module is first proposed to address to insufficient exploitation of high resolution features. In the bottom-up interaction module, global context attention blocks are utilized to interact adjacent-level features with detail information in a bottom-up progressive manner. After that, the top-down transfer module is designed to mitigate semantic information loss of high-level features. In the top-down transfer module, multi-branch asymmetric dilated blocks are adopted in a top-down progressive manner, which expands receptive fields to capture more object poses. In the instance level, to overcome the hard assignment of object proposals, a nonparametric strategy named soft proposal assignment is proposed to leverage the scale of each object proposal to generate dynamic weights for RoI features from adjacent levels. Comprehensive experiments conducted on MS COCO dataset demonstrate the superiority of ProFPN. By adding negligible extra FLOPs, the proposed ProFPN outperforms most pyramid-based methods. Moreover, due to the design of inherited feature utilization in ProFPN, transformer-based detectors have witnessed a substantial increase in detecting small objects while simultaneously achieving significant reductions in FLOPs. The source code of the proposed method is available at https://github.com/GingerCohle/ProFPN .

Fractal Pyramid Networks

Up-to-Down Network: Fusing Multi-Scale Context for 3D Semantic Scene Completion

Pyramid Feature Attention Network for Monocular Depth Prediction

FlatteNet: A Simple Versatile Framework for Dense Pixelwise Prediction

PNEN: Pyramid Non-Local Enhanced Networks

DCPNet: A Densely Connected Pyramid Network for Monocular Depth Estimation

EPRNet: Efficient Pyramid Representation Network for Real-Time Street Scene Segmentation

Tripartite Feature Enhanced Pyramid Network for Dense Prediction

Ppednet: Pyramid Pooling Encoder-Decoder Network For Real-Time Semantic Segmentation

Fast Monocular Depth Estimation via Side Prediction Aggregation with Continuous Spatial Refinement

FCPFNet: Feature Complementation Network with Pyramid Fusion for Semantic Segmentation

Panoptic Feature Pyramid Networks

Retro-FPN: Retrospective Feature Pyramid Network for Point Cloud Semantic Segmentation

Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions

Mask-Pyramid Network: A Novel Panoptic Segmentation Method

Info-FPN: An Informative Feature Pyramid Network for object detection in remote sensing images

ProFPN: Progressive feature pyramid network with soft proposal assignment for object detection

CEDNet: A Cascade Encoder-Decoder Network for Dense Prediction

Differential susceptibility of brain proteins to oxidative damage.

EPNet: An Efficient Pyramid Network for Enhanced Single-Image Super-Resolution with Reduced Computational Requirements

Progressive Fusion for Unsupervised Binocular Depth Estimation Using Cycled Networks