RADA: Robust and Accurate Feature Learning with Domain Adaptation

Jingtai He,Gehao Zhang,Tingting Liu,Songlin Du
2024-07-23
Abstract:Recent advancements in keypoint detection and descriptor extraction have shown impressive performance in local feature learning tasks. However, existing methods generally exhibit suboptimal performance under extreme conditions such as significant appearance changes and domain shifts. In this study, we introduce a multi-level feature aggregation network that incorporates two pivotal components to facilitate the learning of robust and accurate features with domain adaptation. First, we employ domain adaptation supervision to align high-level feature distributions across different domains to achieve invariant domain representations. Second, we propose a Transformer-based booster that enhances descriptor robustness by integrating visual and geometric information through wave position encoding concepts, effectively handling complex conditions. To ensure the accuracy and robustness of features, we adopt a hierarchical architecture to capture comprehensive information and apply meticulous targeted supervision to keypoint detection, descriptor extraction, and their coupled processing. Extensive experiments demonstrate that our method, RADA, achieves excellent results in image matching, camera pose estimation, and visual localization tasks.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the poor performance of existing key - point detection and descriptor extraction methods under extreme conditions (such as significant appearance changes and domain shifts). Specifically: 1. **Unstable performance of existing methods in different domains**: When an image shifts from one domain (e.g., daytime) to another domain (e.g., nighttime), the key - point detection stage is easily affected by low - level image statistical changes, resulting in the subsequent feature descriptors being unable to maintain invariance. 2. **Insufficient robustness under complex conditions**: In cases of drastic changes in illumination, viewing angles, etc., local visual information becomes unreliable and difficult to distinguish, which will lead to a decline in the quality of descriptors. To solve these problems, the authors propose RADA (Robust and Accurate feature Learning with Domain Adaptation), that is, a multi - level feature aggregation network combined with domain - adaptive supervision and Transformer - based enhancer. The following are the main contributions of the paper: - **Domain - adaptive supervision**: Achieve invariant domain representation by minimizing the differences in high - level feature distributions between different domains. Specifically, use the maximum mean difference (MMD) metric and the gradient reversal layer to align the feature distributions between different domains. \[ L_{\text{MMD}} (X_S, X_T) = \left\| \frac{1}{|X_S|} \sum_{x_s \in X_S} \phi(x_s) - \frac{1}{|X_T|} \sum_{x_t \in X_T} \phi(x_t) \right\| \] \[ L_{\text{adv}}(X_S, X_T) = \frac{1}{N} \sum_{i = 1}^N (-l_i \log(s_i) - (1 - l_i) \log(1 - s_i)) \] \[ L_{\text{da}} = L_{\text{adv}} + \lambda L_{\text{MMD}} \] - **Transformer - based enhancer**: Enhance the robustness of descriptors by integrating visual and geometric information, especially by using the concept of wave position encoding. Descriptors and position information are fused through amplitude and phase relationships to generate position - aware descriptors. \[ w_j = A_j \odot e^{i\theta_j} = A_j \odot (\cos \theta_j + i\cdot\sin \theta_j),\quad j = 1, 2,\ldots, n \] \[ d^{\text{PE}}_j = d_j+\text{MLP}_F([A_j \odot \cos \theta_j, A_j \odot \sin \theta_j]) \] - **Hierarchical feature aggregation network**: Ensure the accuracy of key - point detection, descriptor extraction and their coupling processing through a carefully designed loss function. These improvements make RADA perform excellently in image matching, camera pose estimation and visual localization tasks, especially under cross - domain conditions. ### Summary The core problem of the paper is how to improve the robustness and accuracy of key - point detection and descriptor extraction under extreme domain change conditions. By introducing domain - adaptive supervision and Transformer - based enhancer, RADA effectively solves this challenge and shows excellent performance in multiple benchmark tests.