Abstract:With the advancement of deep neural networks, several methods leveraging convolution neural networks (CNNs) have gained prominence in the field of remote sensing object detection. Acquiring accurate feature representations from feature maps is a critical step in CNN-based object detection methods. Previously, region of interest (RoI)-based methods have been widely used, but of late, deformable convolution network (DCN)-based approaches have started receiving considerable attention. A significant challenge in the use of DCN-based methods is the inefficient distribution patterns of sampling points, stemming from a lack of effective and flexible guidance. To address this, our study introduces Saliency-Guided RepPoints (SGR), an innovative framework designed to enhance feature representation quality in remote sensing object detection. SGR employs a dynamic dual-domain alignment (DDA) training strategy to mitigate potential misalignment issues between spatial and feature domains during the learning process. Furthermore, we propose an interpretable visualization method to assess the alignment between feature representation and classification performance in DCN-based methods, providing theoretical analysis and validation for the effectiveness of sampling points. In this study, we assessed the proposed SGR framework through a series of experiments conducted on four varied and rigorous datasets: DOTA, HRSC2016, DIOR-R, and UCAS-AOD, all of which are widely employed in remote sensing object detection. The outcomes of these experiments substantiate the effectiveness of the SGR framework, underscoring its potential to enhance the accuracy of object detection within remote sensing imagery.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the inefficiency in the sampling point distribution pattern of methods based on Deformable Convolutional Networks (DCN) in remote sensing object detection. Specifically: 1. **Limitations of Existing Methods**: Traditional Region - of - Interest (RoI) - based methods and existing DCN methods, although they have improved the quality of feature representation to a certain extent, still have deficiencies when dealing with remote sensing objects in arbitrary directions and complex backgrounds. In particular, these methods lack effective internal information guidance, resulting in an inflexible and inefficient sampling point distribution pattern. 2. **Alignment Problem in DCN**: The offsets in DCN represent the set of sampling points. However, current methods, when guiding these sampling points, often only focus on the external information of the bounding box and ignore the internal key information. This may lead to alignment problems between the spatial domain and the feature domain, thus affecting the final detection results. To solve these problems, the paper introduces a new framework - Saliency - Guided RepPoints (SGR), aiming to improve the quality of feature representation in remote sensing object detection in the following ways: - **Introducing Saliency Map**: Use the saliency map to provide internal key information to guide the sampling point distribution in DCN, making it more flexible and efficient. - **Dynamic Dual - Domain Alignment (DDA) Training Strategy**: Dynamically adjust label assignment and loss functions to alleviate potential alignment problems between the spatial domain and the feature domain. - **Explanatory Visualization Method**: Propose an explanatory visualization method to verify the effectiveness of sampling points and provide theoretical analysis. Through these improvements, the SGR framework can extract object features more accurately and improve the accuracy of object detection in remote sensing images. Experimental results show that SGR performs well on multiple remote sensing object detection datasets, verifying its effectiveness and potential. ### Mathematical Formulas When describing the sampling point distribution pattern, the paper uses the following formula: \[ P_i(x,y) = I(F_i(x,y))=\{(\Delta x_j,\Delta y_j)\}_{j = 1}^n,\quad i = 3,4,5,6,7\] where: - \(F_i(x,y)\) represents the \(i\)-th layer feature map, - \(I\) is a function that predicts sampling points based on feature map pixels, - \(P_i\) represents the sampling point distribution pattern obtained on the \(i\)-th layer feature map. The formula for feature representation is as follows: \[ R_i(x,y)=\sum_{j = 1}^K w_j\cdot F_i(x+\Delta x_j,y+\Delta y_j)\] where: - \(R_i\) represents the feature representation, - \(w_j\) represents the importance weight of the feature contribution of each neighboring pixel, - \(F_i(x+\Delta x_j,y+\Delta y_j)\) represents the feature contribution obtained by evaluating the feature map at the displacement position \((x+\Delta x_j,y+\Delta y_j)\). The process of generating the saliency map can be represented by the following formula: \[ s_n=S(t_n),\quad n = 1,2,\ldots\] where: - \(t_n\) represents the \(n\)-th real - target image, - \(s_n\) represents the saliency map of the corresponding real - target image, - \(S\) is an algorithm that converts RGB images into saliency maps. The formula for extracting the peak points of the saliency map is: \[ P_{\text{guidance}}=\text{FindPeak}(s)=\{p_n|n = 1,2,\ldots,K\}\] where: - \(P_{\text{guidance}}\) represents the top \(K\) peak points with the highest response values on the saliency map. Through these formulas, the paper details...

SGR: An Improved Point-Based Method for Remote Sensing Object Detection via Dual-Domain Alignment Saliency-Guided RepPoints

SMDC-Net: Saliency-Guided Multihead Distribution Calibration Network for Few-Shot Object Detection on Remote Sensing Images

Accurate salient object detection via dense recurrent connections and residual-based hierarchical feature integration.

Scene Classification of Remote Sensing Images Based on Saliency Dual Attention Residual Network

Global–Local Semantic Interaction Network for Salient Object Detection in Optical Remote Sensing Images With Scribble Supervision

A New Spatial-Oriented Object Detection Framework for Remote Sensing Images

Transcending Pixels: Boosting Saliency Detection via Scene Understanding from Aerial Imagery

CRNet: Channel-Enhanced Remodeling-Based Network for Salient Object Detection in Optical Remote Sensing Images

ORSI Salient Object Detection via Progressive Semantic Flow and Uncertainty-Aware Refinement

Dual-Stream Feature Collaboration Perception Network for Salient Object Detection in Remote Sensing Images

Semantic-Guided Attention Refinement Network for Salient Object Detection in Optical Remote Sensing Images

SR-Net: Saliency Region Representation Network for Vehicle Detection in Remote Sensing Images

Object Detection Based on Global-Local Saliency Constraint in Aerial Images

Localization, balance and affinity: a stronger multifaceted collaborative salient object detector in remote sensing images

Boundary-semantic collaborative guidance network with dual-stream feedback mechanism for salient object detection in optical remote sensing imagery

Salient Object Detection Employing Robust Sparse Representation and Local Consistency

Recurrent Adaptive Graph Reasoning Network With Region and Boundary Interaction for Salient Object Detection in Optical Remote Sensing Images

RoI Fusion Strategy with Self-Attention Mechanism for Object Detection in Remote Sensing Images

A Deep Multiscale Fusion Method via Low-Rank Sparse Decomposition for Object Saliency Detection Based on Urban Data in Optical Remote Sensing Images

A Refined and Efficient CNN Algorithm for Remote Sensing Object Detection

A hierarchical object detection method in large-scale optical remote sensing satellite imagery using saliency detection and CNN