Relative Distance Guided Dynamic Partition Learning for Scale-Invariant UAV-View Geo-Localization

Quan Chen,Tingyu Wang,Rongfeng Lu,Bolun Zheng,Zhedong Zheng,Chenggang Yan
2024-12-23
Abstract:UAV-view Geo-Localization~(UVGL) presents substantial challenges, particularly due to the disparity in visual appearance between drone-captured imagery and satellite perspectives. Existing methods usually assume consistent scaling factor across different views. Therefore, they adopt predefined partition alignment and extract viewpoint-invariant representation by constructing a variety of part-level features. However, the scaling assumption is not always hold in the real-world scenarios that variations of UAV flight state leads to the scale mismatch of cross-views, resulting in serious performance degradation. To overcome this issue, we propose a partition learning framework based on relative distance, which alleviates the dependence on scale consistency while mining fine-grained features. Specifically, we propose a distance guided dynamic partition learning strategy~(DGDPL), consisting of a square partition strategy and a distance-guided adjustment strategy. The former is utilized to extract fine-grained features and global features in a simple manner. The latter calculates the relative distance ratio between drone- and satellite-view to adjust the partition size, thereby explicitly aligning the semantic information between partition pairs. Furthermore, we propose a saliency-guided refinement strategy to refine part-level features, so as to further improve the retrieval accuracy. Extensive experiments show that our approach achieves superior geo-localization accuracy across various scale-inconsistent scenarios, and exhibits remarkable robustness against scale variations. The code will be released.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
This paper attempts to solve the scale - change problem in UAV - view Geo - Localization (UVGL). Specifically, existing methods usually assume that there is a consistent scaling factor between different viewpoints and adopt a predefined partition - alignment strategy to extract view - point - invariant feature representations. However, in actual scenarios, the scale mismatch between cross - viewpoints due to the change of the UAV's flight state causes a significant performance degradation. To solve this problem, the author proposes a Distance Guided Dynamic Partition Learning (DGDPL) framework based on relative distance to reduce the dependence on scale consistency and mine fine - grained features. The DGDPL framework mainly includes the following parts: 1. **Square Partition Strategy (SPS)**: It is used to simply extract fine - grained features and global features. 2. **Distance - Guided Adjustment Strategy (DGAS)**: By calculating the relative distance ratio between the UAV view and the satellite view to adjust the partition size, thus explicitly aligning the semantic information between partition pairs. 3. **Saliency - Guided Refinement Strategy (SGRS)**: It is used to further optimize local features and improve retrieval accuracy. Through these strategies, the DGDPL framework can achieve superior geo - localization accuracy in various scale - inconsistent scenarios and shows significant robustness to scale changes. ### Formula Summary 1. **Relative Distance Ratio Formula**: \[ \beta=\frac{H_D - H_S}{H_S} \] where \(H_D\) and \(H_S\) represent the distances from the UAV view and the satellite view to the ground respectively. 2. **Adjustment Factor Formula**: \[ \theta = \text{int}(\beta\cdot\lambda\cdot\alpha) \] where \(\text{int}(\cdot)\) represents the rounding operation, \(\lambda\) is the up - sampling factor, and \(\alpha\) is a hyper - parameter that regulates the degree of partition change. 3. **Partition - Adjusted Shape Formula**: \[ f_{n_{SPS}}^D\in\mathbb{R}^{2048\times(32\lambda\frac{n_{SPS}}{N_{SPS}} - 2\theta)\times(32\lambda\frac{n_{SPS}}{N_{SPS}} - 2\theta)} \] \[ f_{n_{SPS}}^S\in\mathbb{R}^{2048\times(32\lambda\frac{n_{SPS}}{N_{SPS}})\times(32\lambda\frac{n_{SPS}}{N_{SPS}})} \] 4. **Saliency Heat Map Generation Formula**: \[ heat_{n_{SPS}}^i=\frac{N(M(f_{n_{SPS}}^i)) + CM_{n_{SPS}}^i}{2} \] where \(N(\cdot)\) represents the normalization operation and \(M(\cdot)\) represents the average pooling operation. 5. **Cross - Entropy Loss Function**: \[ L_{CE}=\sum_{i,j,n_{SPS}}-\log\left(\frac{\exp(z_{n_{SPS}}^{i,j}(y))}{\sum_{c = 1}^C\exp(z_{n_{SPS}}^{i,j}(c))}\right) \] where