RaFPN: Relation-aware Feature Pyramid Network for Dense Image Prediction

Zhuangzhuang Zhou,Yingying Zhu
DOI: https://doi.org/10.1109/tmm.2024.3371787
IF: 7.3
2024-01-01
IEEE Transactions on Multimedia
Abstract:Intuitively, relations among objects assist a model in performing inference under constrained environments. However, the top-down information flow in the Feature pyramid network (FPN) dilutes the relation features contained in the non-adjacent layers. Such a defect reduces the accuracy of detectors, especially for small or obscured objects. To adequately exploit the relations among object instances, we propose the relation-aware feature pyramid network (RaFPN), a simple but effective balanced multi-scale feature module for dense image prediction. RaFPN models the relations among objects by computing the similarity between pixels located on cross-scale features. The result is then delivered to FPN to guide the detector in completing accurate inference. Specifically, we first generate a pair of cross-scale aggregated features based on the channel importance of the output features from FPN. After that, the relation among the cross-scale objects is extracted by a bi-directional interaction mechanism. Finally, relation features are injected directly into each layer of the feature pyramid to avoid dilution. In this way, the relation among instances can adequately guide the detector for dense prediction. Our RaFPN pushes the performance bound of Faster RCNN by 2.0 AP (average precision), outperforming the recent state-of-the-art FPN-based improvements. Notably, for dense prediction tasks such as instance, semantic, and panoptic segmentation, our method brings consistent boosts to them as well.
computer science, information systems,telecommunications, software engineering
What problem does this paper attempt to address?