SRFAD-Net: Scale-Robust Feature Aggregation and Diffusion Network for Object Detection in Remote Sensing Images

Jing Liu,Donglin Jing,Haijing Zhang,Chunyu Dong
DOI: https://doi.org/10.3390/electronics13122358
IF: 2.9
2024-06-17
Electronics
Abstract:The significant differences in target scales of remote sensing images lead to remarkable variations in visual features, posing significant challenges for feature extraction, fusion, regression, and classification. For example, models frequently struggle to capture features of targets across all scales, inadequately consider the weights and importance of features at different scales during fusion, and encounter accuracy limitations when detecting targets of varying scales. To tackle these challenges, we proposes a Scale-Robust Feature Aggregation and Diffusion Network (SRFAD-Net) for remote sensing target detection. This model includes a Scale-Robust Feature Network (SRFN), an Adaptive Feature Aggregation and Diffusion (AFAD) module, and a Focaler-GIoU Loss. SRFN extracts scale-robust features by constructing a multi-scale pyramid. It includes a downsampling (ADown) module that combines the advantages of average pooling and max pooling, effectively preserving background information and salient features. This further enhances the network's ability to handle targets of varying scales and shapes. The introduced Deformable Attention(DAttention) mechanism captures target features effectively through adaptive adjustment of the receptive field's shape and size, reducing background clutter and substantially enhancing the model's performance in detecting distant objects. In the feature fusion stage, we propose the AFAD module, which utilizes a dimension-adaptive perceptual selection mechanism and parallel depthwise convolutional operations to precisely aggregate multi-channel information. It then employs a diffusion mechanism to spread contextual information across various scales, greatly improving the network's ability to extract and fuse features across multiple scales. For the detection head, we adopt the Focaler-GIoU Loss, leveraging its advantages in handling non-overlapping bounding boxes, effectively alleviating the difficulty of localization caused by scale variations. We have undertaken experiments on two widely utilized aerial target datasets: the Remote Sensing Scene Object Detection Dataset (RSOD) and NWPU VHR-10, which is a high-resolution object detection dataset from Northwestern Polytechnical University. The findings of these experiments clearly illustrate that SRFAD-Net surpasses the performances of mainstream detectors.
engineering, electrical & electronic,computer science, information systems,physics, applied
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper primarily addresses the challenges posed by significant differences in target scales in remote sensing image object detection. Specifically: 1. **Feature Extraction**: The significant differences in target scales in remote sensing images lead to substantial variations in visual features such as texture, shape, and edge information. Traditional feature extraction methods struggle to capture target features at different scales, and existing multi-scale pyramid construction methods tend to dilute or lose information when dealing with small targets, while large targets exhibit complexity and diversity. 2. **Feature Fusion**: Features at different scales exhibit significant differences in expression forms, resolution, and semantic information. Traditional simple fusion methods (such as stacking or concatenation) cannot effectively utilize the information from these features. Moreover, most methods perform feature fusion at only one level, failing to fully leverage the semantic information and resolution at different levels. 3. **Regression and Classification**: When there are significant differences in target scales, regression algorithms struggle to accurately adapt to targets of different scales, leading to discrepancies between predicted positions and sizes and the actual targets. Classifiers also find it challenging to accurately identify targets of different scales, thereby reducing classification performance. To address the above issues, the paper proposes a new network architecture—Scale-Robust Feature Aggregation and Diffusion Network (SRFAD-Net), which includes the following key components: - **Scale-Robust Feature Network (SRFN)**: Utilizes a special ADown module that combines the advantages of average pooling and max pooling to highlight important features while retaining background information. It also introduces a DAttention mechanism to dynamically adjust sampling positions and attention weights, enhancing the ability to capture distant targets. - **Adaptive Feature Aggregation and Diffusion (AFAD) Module**: Precisely selects and integrates multi-channel information through a channel-aware selection mechanism and propagates features containing rich contextual information to different scales through a unique diffusion mechanism, enhancing feature fusion capabilities. - **Focaler-GIoU Loss**: Combines the advantages of Focaler-IoU, which emphasizes difficult samples, and GIoU, which handles non-overlapping bounding boxes, to improve the accuracy of target position and shape description. Through these designs, SRFAD-Net effectively addresses the issues of feature extraction, fusion, regression, and classification in remote sensing image object detection, enhancing detection accuracy and generalization capability.