Abstract:Recent advancements in keypoint detection and descriptor extraction have shown impressive performance in local feature learning tasks. However, existing methods generally exhibit suboptimal performance under extreme conditions such as significant appearance changes and domain shifts. In this study, we introduce a multi-level feature aggregation network that incorporates two pivotal components to facilitate the learning of robust and accurate features with domain adaptation. First, we employ domain adaptation supervision to align high-level feature distributions across different domains to achieve invariant domain representations. Second, we propose a Transformer-based booster that enhances descriptor robustness by integrating visual and geometric information through wave position encoding concepts, effectively handling complex conditions. To ensure the accuracy and robustness of features, we adopt a hierarchical architecture to capture comprehensive information and apply meticulous targeted supervision to keypoint detection, descriptor extraction, and their coupled processing. Extensive experiments demonstrate that our method, RADA, achieves excellent results in image matching, camera pose estimation, and visual localization tasks.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the poor performance of existing key - point detection and descriptor extraction methods under extreme conditions (such as significant appearance changes and domain shifts). Specifically: 1. **Unstable performance of existing methods in different domains**: When an image shifts from one domain (e.g., daytime) to another domain (e.g., nighttime), the key - point detection stage is easily affected by low - level image statistical changes, resulting in the subsequent feature descriptors being unable to maintain invariance. 2. **Insufficient robustness under complex conditions**: In cases of drastic changes in illumination, viewing angles, etc., local visual information becomes unreliable and difficult to distinguish, which will lead to a decline in the quality of descriptors. To solve these problems, the authors propose RADA (Robust and Accurate feature Learning with Domain Adaptation), that is, a multi - level feature aggregation network combined with domain - adaptive supervision and Transformer - based enhancer. The following are the main contributions of the paper: - **Domain - adaptive supervision**: Achieve invariant domain representation by minimizing the differences in high - level feature distributions between different domains. Specifically, use the maximum mean difference (MMD) metric and the gradient reversal layer to align the feature distributions between different domains. \[ L_{\text{MMD}} (X_S, X_T) = \left\| \frac{1}{|X_S|} \sum_{x_s \in X_S} \phi(x_s) - \frac{1}{|X_T|} \sum_{x_t \in X_T} \phi(x_t) \right\| \] \[ L_{\text{adv}}(X_S, X_T) = \frac{1}{N} \sum_{i = 1}^N (-l_i \log(s_i) - (1 - l_i) \log(1 - s_i)) \] \[ L_{\text{da}} = L_{\text{adv}} + \lambda L_{\text{MMD}} \] - **Transformer - based enhancer**: Enhance the robustness of descriptors by integrating visual and geometric information, especially by using the concept of wave position encoding. Descriptors and position information are fused through amplitude and phase relationships to generate position - aware descriptors. \[ w_j = A_j \odot e^{i\theta_j} = A_j \odot (\cos \theta_j + i\cdot\sin \theta_j),\quad j = 1, 2,\ldots, n \] \[ d^{\text{PE}}_j = d_j+\text{MLP}_F([A_j \odot \cos \theta_j, A_j \odot \sin \theta_j]) \] - **Hierarchical feature aggregation network**: Ensure the accuracy of key - point detection, descriptor extraction and their coupling processing through a carefully designed loss function. These improvements make RADA perform excellently in image matching, camera pose estimation and visual localization tasks, especially under cross - domain conditions. ### Summary The core problem of the paper is how to improve the robustness and accuracy of key - point detection and descriptor extraction under extreme domain change conditions. By introducing domain - adaptive supervision and Transformer - based enhancer, RADA effectively solves this challenge and shows excellent performance in multiple benchmark tests.

RADA: Robust and Accurate Feature Learning with Domain Adaptation

Joint Feature-Level And Pixel-Level Domain Adaption For Object Detection In The Wild

Selective Transfer with Reinforced Transfer Network for Partial Domain Adaptation.

Trust-aware Conditional Adversarial Domain Adaptation with Feature Norm Alignment.

Hierarchical Domain Adaptation with Local Feature Patterns

RFA-Net: Reconstructed Feature Alignment Network for Domain Adaptation Object Detection in Remote Sensing Imagery

Multi-Level Domain Adaptive Learning For Cross-Domain Detection

Gotta Adapt 'Em All: Joint Pixel and Feature-Level Domain Adaptation for Recognition in the Wild

CADA: Multi-scale Collaborative Adversarial Domain Adaptation for unsupervised optic disc and cup segmentation

An Adversarial Domain Adaptation Network for Cross-Domain Fine-Grained Recognition

Multitarget Domain Adaptation Building Instance Extraction of Remote Sensing Imagery With Domain-Common Approximation Learning

Deep Reconstruction-Classification Networks for Unsupervised Domain Adaptation

Deep Residual Correction Network for Partial Domain Adaptation

Multicomponent Adversarial Domain Adaptation: A General Framework.

Deep ladder reconstruction-classification network for unsupervised domain adaptation

Domain Invariant and Class Discriminative Feature Learning for Visual Domain Adaptation

Domain Adaptation for Underwater Image Enhancement

Transferable Attention for Domain Adaptation

Domain Adaptation for Remote Sensing Image Semantic Segmentation: An Integrated Approach of Contrastive Learning and Adversarial Learning

When Unsupervised Domain Adaptation Meets Tensor Representations.

Locality Robust Domain Adaptation for cross-scene hyperspectral image classification