Abstract:With the improvement in the quantity and quality of remote sensing images, content-based remote sensing object retrieval (CBRSOR) has become an increasingly important topic. However, existing CBRSOR methods neglect the utilization of global statistical information during both training and test stages, which leads to the overfitting of neural networks to simple sample pairs of samples during training and suboptimal metric performance. Inspired by the Neyman-Pearson theorem, we propose a generalized likelihood ratio test-based metric learning (GLRTML) approach, which can estimate the relative difficulty of sample pairs by incorporating global data distribution information during training and test phases. This guides the network to focus more on difficult samples during the training process, thereby encourages the network to learn more discriminative feature embeddings. In addition, GLRT is a more effective than traditional metric space due to the utilization of global data distribution information. Accurately estimating the distribution of embeddings is critical for GLRTML. However, in real-world applications, there is often a distribution shift between the training and target domains, which diminishes the effectiveness of directly using the distribution estimated on training data. To address this issue, we propose the clustering pseudo-labels-based fast parameter adaptation (CPLFPA) method. CPLFPA efficiently estimates the distribution of embeddings in the target domain by clustering target domain instances and re-estimating the distribution parameters for GLRTML. We reorganize datasets for CBRSOR tasks based on fine-grained ship remote sensing image slices (FGSRSI-23) and military aircraft recognition (MAR20) datasets. Extensive experiments on these datasets demonstrate the effectiveness of our proposed GLRTML and CPLFPA.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: Existing content - based remote sensing object retrieval (CBRSOR) methods fail to effectively utilize global statistical information during the training and testing phases, resulting in the model over - fitting to simple sample pairs and sub - optimal metric performance. Specifically:
1. **Limitations of existing methods**:
- Existing CBRSOR methods ignore the utilization of global statistical information during the training and testing phases.
- Due to GPU memory limitations, current methods can only process a limited number of data batches, causing the loss function to be calculated based only on local data relationships and easily over - fit to simple sample pairs.
- This over - fitting phenomenon makes it difficult for the network to learn more discriminative feature embeddings from more challenging sample pairs, thus affecting the generalization ability of the model.
2. **Importance of introducing global statistical information**:
- The paper points out that by introducing global statistical information, the relative difficulty of sample pairs can be estimated more accurately, making the network pay more attention to difficult samples during the training process.
- This helps to improve the generalization ability and overall performance of the model and avoid over - fitting.
3. **The proposed new method**:
- Inspired by the Neyman - Pearson theorem, the author proposes a metric learning method based on the generalized likelihood ratio test (GLRTML).
- GLRTML estimates the relative difficulty of sample pairs during the training and testing phases by combining global data distribution information, guiding the network to learn more discriminative feature embeddings.
- In addition, in order to deal with possible domain differences between the training set and the test set, the author also proposes a fast parameter adaptation method based on clustering pseudo - labels (CPLFPA) to efficiently re - estimate the distribution parameters in the target domain.
In summary, this paper aims to improve the performance of the CBRSOR task and the generalization ability of the model by introducing global statistical information and solving the domain difference problem.
### Formula display
The formulas involved in the paper are as follows:
- Definition of likelihood ratio:
\[
s(i, j)=\log \left(\frac{p\left(I_{i}, I_{j} \mid H_{1}\right)}{p\left(I_{i}, I_{j} \mid H_{0}\right)}\right)
\]
Calculated in the embedding space:
\[
s(i, j)=\log \left(\frac{p\left(x_{\theta, i}, x_{\theta, j} \mid H_{1}\right)}{p\left(x_{\theta, i}, x_{\theta, j} \mid H_{0}\right)}\right)
\]
- Differential embedding representation:
\[
s(i, j)=\log \left(\frac{p\left(x_{\theta, i j} \mid H_{1}\right)}{p\left(x_{\theta, i j} \mid H_{0}\right)}\right)
\]
- Likelihood ratio under the multivariate Gaussian distribution assumption:
\[
s(i, j)=\frac{1}{2}\left(x_{\theta, i j}-\mu_{0}\right)^{T} \Sigma_{0}^{-1}\left(x_{\theta, i j}-\mu_{0}\right)-\frac{1}{2}\left(x_{\theta, i j}-\mu_{1}\right)^{T} \Sigma_{1}^{-1}\left(x_{\theta, i j}-\mu_{1}\right)+C_{0}
\]
- Final simplified similarity score (MG - GLRTML):
\[
s(i, j)=x_{\theta, i j}^{T}\left(\Sigma_{0}^{-1}-\Sigma_{1}^{-1}\right) x_{\theta, i j}
\]
- Maximum likelihood estimation of the covariance matrix:
\[
\hat{\Sigma}_{1}=\frac{1}{N_{1}} \sum_{l = 0}^{N_{1}-1}\left(x_{\theta, l}^{+}\right)\left(x_{\theta, l}^{+}\right)^{T}, \quad x_{\theta, l}^{+} \in X_{1}
\]
\[
\hat{\Sigma}_{0}=\frac{1}{N_{0}} \sum_{