Abstract:In the wake of developments in remote sensing, the application of target detection of remote sensing is of increasing interest. Unfortunately, unlike natural image processing, remote sensing image processing involves dealing with large variations in object size, which poses a great challenge to researchers. Although traditional multi-scale detection networks have been successful in solving problems with such large variations, they still have certain limitations: (1) The traditional multi-scale detection methods note the scale of features but ignore the correlation between feature levels. Each feature map is represented by a single layer of the backbone network, and the extracted features are not comprehensive enough. For example, the SSD network uses the features extracted from the backbone network at different scales directly for detection, resulting in the loss of a large amount of contextual information. (2) These methods combine with inherent backbone classification networks to perform detection tasks. RetinaNet is just a combination of the ResNet-101 classification network and FPN network to perform the detection tasks; however, there are differences in object classification and detection tasks. To address these issues, a cross-scale feature fusion pyramid network (CF2PN) is proposed. First and foremost, a cross-scale fusion module (CSFM) is introduced to extract sufficiently comprehensive semantic information from features for performing multi-scale fusion. Moreover, a feature pyramid for target detection utilizing thinning U-shaped modules (TUMs) performs the multi-level fusion of the features. Eventually, a focal loss in the prediction section is used to control the large number of negative samples generated during the feature fusion process. The new architecture of the network proposed in this paper is verified by DIOR and RSOD dataset. The experimental results show that the performance of this method is improved by 2–12% in the DIOR dataset and RSOD dataset compared with the current SOTA target detection methods.

Receptive Field Fusion RetinaNet for Object Detection

Feature Combination Based On Receptive Fields And Cross-Fusion Feature Pyramid For Object Detection

An Adaptive Attention Fusion Mechanism Convolutional Network for Object Detection in Remote Sensing Images

Attention-based Fusion Factor in FPN for Object Detection

ℱ3-Net: Feature Fusion and Filtration Network for Object Detection in Optical Remote Sensing Images

Joint-attention feature fusion network and dual-adaptive NMS for object detection

Afmf: Adaptive Fusion of Multi-Scale Features for Pixel-Level Object Detection

Multi-branch feature fusion and refinement network for salient object detection

Fusion Object Detection with Convolutional Neural Network

A Vision Enhancement and Feature Fusion Multiscale Detection Network

MFC-Net : Multi-feature fusion cross neural network for salient object detection

Adaptive Multilevel Fusion Refinement Network for Object Detection in Remote Sensing Images

Improving Object Detection in YOLOv8n with the C2f-f Module and Multi-Scale Fusion Reconstruction

RCNet: Reverse Feature Pyramid and Cross-scale Shift Network for Object Detection

Small Object Detection using Multi-scale Feature Fusion and Attention

Enhancement-fusion Feature Pyramid Network for Object Detection

NLFFTNet: A non-local feature fusion transformer network for multi-scale object detection

Multiscale Feature Fusion and Anchor Adaptive Object Detection Algorithm

M2RNet: Multi-modal and Multi-scale Refined Network for RGB-D Salient Object Detection

CF2PN: A Cross-Scale Feature Fusion Pyramid Network Based Remote Sensing Target Detection