Abstract:The significant differences in target scales of remote sensing images lead to remarkable variations in visual features, posing significant challenges for feature extraction, fusion, regression, and classification. For example, models frequently struggle to capture features of targets across all scales, inadequately consider the weights and importance of features at different scales during fusion, and encounter accuracy limitations when detecting targets of varying scales. To tackle these challenges, we proposes a Scale-Robust Feature Aggregation and Diffusion Network (SRFAD-Net) for remote sensing target detection. This model includes a Scale-Robust Feature Network (SRFN), an Adaptive Feature Aggregation and Diffusion (AFAD) module, and a Focaler-GIoU Loss. SRFN extracts scale-robust features by constructing a multi-scale pyramid. It includes a downsampling (ADown) module that combines the advantages of average pooling and max pooling, effectively preserving background information and salient features. This further enhances the network's ability to handle targets of varying scales and shapes. The introduced Deformable Attention(DAttention) mechanism captures target features effectively through adaptive adjustment of the receptive field's shape and size, reducing background clutter and substantially enhancing the model's performance in detecting distant objects. In the feature fusion stage, we propose the AFAD module, which utilizes a dimension-adaptive perceptual selection mechanism and parallel depthwise convolutional operations to precisely aggregate multi-channel information. It then employs a diffusion mechanism to spread contextual information across various scales, greatly improving the network's ability to extract and fuse features across multiple scales. For the detection head, we adopt the Focaler-GIoU Loss, leveraging its advantages in handling non-overlapping bounding boxes, effectively alleviating the difficulty of localization caused by scale variations. We have undertaken experiments on two widely utilized aerial target datasets: the Remote Sensing Scene Object Detection Dataset (RSOD) and NWPU VHR-10, which is a high-resolution object detection dataset from Northwestern Polytechnical University. The findings of these experiments clearly illustrate that SRFAD-Net surpasses the performances of mainstream detectors.

GODANet: an object detection model for remote sensing images fusing contextual information and dynamic convolution

DroneNet: Rescue Drone-View Object Detection

ℱ3-Net: Feature Fusion and Filtration Network for Object Detection in Optical Remote Sensing Images

SGMFNet: a remote sensing image object detection network based on spatial global attention and multi-scale feature fusion

DGANet: Dynamic Gradient Adjustment Anchor-Free Object Detection in Optical Remote Sensing Images.

MFCANet: Multiscale Feature Context Aggregation Network for Oriented Object Detection in Remote-Sensing Images

Adaptive adjacent context negotiation network for object detection in remote sensing imagery

Feature Enhancement Network for Object Detection in Optical Remote Sensing Images

A Multi-Feature Fusion and Attention Network for Multi-Scale Object Detection in Remote Sensing Images

CAD-Net: A Context-Aware Detection Network for Objects in Remote Sensing Imagery

An Effective and Lightweight Hybrid Network for Object Detection in Remote Sensing Images

A small object detection network for remote sensing based on CS-PANet and DSAN

SRFAD-Net: Scale-Robust Feature Aggregation and Diffusion Network for Object Detection in Remote Sensing Images

An Adaptive Attention Fusion Mechanism Convolutional Network for Object Detection in Remote Sensing Images

Semantic Information Feature Aggregation Network for Object Detection in Remote Sensing Images

GAB-Net: A Robust Detector for Remote Sensing Object Detection Under Dramatic Scale Variation and Complex Backgrounds

Exploiting Full-Scale Feature for Remote Sensing Object Detection Based on Refined Feature Mining and Adaptive Fusion

Deep Adaptive Proposal Network for Object Detection in Optical Remote Sensing Images

Global and Multiscale Aggregate Network for Saliency Object Detection in Optical Remote Sensing Images

A Task-Balanced Multiscale Adaptive Fusion Network for Object Detection in Remote Sensing Images

On-Board Multi-Class Geospatial Object Detection Based on Convolutional Neural Network for High Resolution Remote Sensing Images