SGR: An Improved Point-Based Method for Remote Sensing Object Detection via Dual-Domain Alignment Saliency-Guided RepPoints

Shuhua Mai,Yanan You,Yunxiang Feng
DOI: https://doi.org/10.3390/rs16020250
IF: 5
2024-01-09
Remote Sensing
Abstract:With the advancement of deep neural networks, several methods leveraging convolution neural networks (CNNs) have gained prominence in the field of remote sensing object detection. Acquiring accurate feature representations from feature maps is a critical step in CNN-based object detection methods. Previously, region of interest (RoI)-based methods have been widely used, but of late, deformable convolution network (DCN)-based approaches have started receiving considerable attention. A significant challenge in the use of DCN-based methods is the inefficient distribution patterns of sampling points, stemming from a lack of effective and flexible guidance. To address this, our study introduces Saliency-Guided RepPoints (SGR), an innovative framework designed to enhance feature representation quality in remote sensing object detection. SGR employs a dynamic dual-domain alignment (DDA) training strategy to mitigate potential misalignment issues between spatial and feature domains during the learning process. Furthermore, we propose an interpretable visualization method to assess the alignment between feature representation and classification performance in DCN-based methods, providing theoretical analysis and validation for the effectiveness of sampling points. In this study, we assessed the proposed SGR framework through a series of experiments conducted on four varied and rigorous datasets: DOTA, HRSC2016, DIOR-R, and UCAS-AOD, all of which are widely employed in remote sensing object detection. The outcomes of these experiments substantiate the effectiveness of the SGR framework, underscoring its potential to enhance the accuracy of object detection within remote sensing imagery.
environmental sciences,imaging science & photographic technology,remote sensing,geosciences, multidisciplinary
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the inefficiency in the sampling point distribution pattern of methods based on Deformable Convolutional Networks (DCN) in remote sensing object detection. Specifically: 1. **Limitations of Existing Methods**: Traditional Region - of - Interest (RoI) - based methods and existing DCN methods, although they have improved the quality of feature representation to a certain extent, still have deficiencies when dealing with remote sensing objects in arbitrary directions and complex backgrounds. In particular, these methods lack effective internal information guidance, resulting in an inflexible and inefficient sampling point distribution pattern. 2. **Alignment Problem in DCN**: The offsets in DCN represent the set of sampling points. However, current methods, when guiding these sampling points, often only focus on the external information of the bounding box and ignore the internal key information. This may lead to alignment problems between the spatial domain and the feature domain, thus affecting the final detection results. To solve these problems, the paper introduces a new framework - Saliency - Guided RepPoints (SGR), aiming to improve the quality of feature representation in remote sensing object detection in the following ways: - **Introducing Saliency Map**: Use the saliency map to provide internal key information to guide the sampling point distribution in DCN, making it more flexible and efficient. - **Dynamic Dual - Domain Alignment (DDA) Training Strategy**: Dynamically adjust label assignment and loss functions to alleviate potential alignment problems between the spatial domain and the feature domain. - **Explanatory Visualization Method**: Propose an explanatory visualization method to verify the effectiveness of sampling points and provide theoretical analysis. Through these improvements, the SGR framework can extract object features more accurately and improve the accuracy of object detection in remote sensing images. Experimental results show that SGR performs well on multiple remote sensing object detection datasets, verifying its effectiveness and potential. ### Mathematical Formulas When describing the sampling point distribution pattern, the paper uses the following formula: \[ P_i(x,y) = I(F_i(x,y))=\{(\Delta x_j,\Delta y_j)\}_{j = 1}^n,\quad i = 3,4,5,6,7\] where: - \(F_i(x,y)\) represents the \(i\)-th layer feature map, - \(I\) is a function that predicts sampling points based on feature map pixels, - \(P_i\) represents the sampling point distribution pattern obtained on the \(i\)-th layer feature map. The formula for feature representation is as follows: \[ R_i(x,y)=\sum_{j = 1}^K w_j\cdot F_i(x+\Delta x_j,y+\Delta y_j)\] where: - \(R_i\) represents the feature representation, - \(w_j\) represents the importance weight of the feature contribution of each neighboring pixel, - \(F_i(x+\Delta x_j,y+\Delta y_j)\) represents the feature contribution obtained by evaluating the feature map at the displacement position \((x+\Delta x_j,y+\Delta y_j)\). The process of generating the saliency map can be represented by the following formula: \[ s_n=S(t_n),\quad n = 1,2,\ldots\] where: - \(t_n\) represents the \(n\)-th real - target image, - \(s_n\) represents the saliency map of the corresponding real - target image, - \(S\) is an algorithm that converts RGB images into saliency maps. The formula for extracting the peak points of the saliency map is: \[ P_{\text{guidance}}=\text{FindPeak}(s)=\{p_n|n = 1,2,\ldots,K\}\] where: - \(P_{\text{guidance}}\) represents the top \(K\) peak points with the highest response values on the saliency map. Through these formulas, the paper details...