GeoViewMatch: A Multi-Scale Feature-Matching Network for Cross-View Geo-Localization Using Swin-Transformer and Contrastive Learning

Wenhui Zhang,Zhinong Zhong,Hao Chen,Ning Jing

DOI: https://doi.org/10.3390/rs16040678

IF: 5

2024-02-15

Remote Sensing

Abstract:Cross-view geo-localization aims to locate street-view images by matching them with a collection of GPS-tagged remote sensing (RS) images. Due to the significant viewpoint and appearance differences between street-view images and RS images, this task is highly challenging. While deep learning-based methods have shown their dominance in the cross-view geo-localization task, existing models have difficulties in extracting comprehensive meaningful features from both domains of images. This limitation results in not establishing accurate and robust dependencies between street-view images and the corresponding RS images. To address the aforementioned issues, this paper proposes a novel and lightweight neural network for cross-view geo-localization. Firstly, in order to capture more diverse information, we propose a module for extracting multi-scale features from images. Secondly, we introduce contrastive learning and design a contrastive loss to further enhance the robustness in extracting and aligning meaningful multi-scale features. Finally, we conduct comprehensive experiments on two open benchmarks. The experimental results have demonstrated the superiority of the proposed method over the state-of-the-art methods.

environmental sciences,imaging science & photographic technology,remote sensing,geosciences, multidisciplinary

What problem does this paper attempt to address?

The paper aims to address the key challenge in cross-view geo-localization, which is how to determine the location of a street view image by matching it with a collection of remote sensing (RS) images tagged with global positioning system (GPS) coordinates. This task is highly challenging due to the significant viewpoint and appearance differences between street view images and remote sensing images. The paper proposes a novel lightweight neural network model named GeoViewMatch to tackle the aforementioned problem. Specifically, the main contributions of this method are as follows: 1. **Multi-scale Feature Extraction**: To capture more diverse information, the paper proposes a module to extract multi-scale features from images. This approach helps to establish more accurate and robust dependencies between street view images and their corresponding remote sensing images. 2. **Application of Contrastive Learning**: Contrastive learning is introduced, and a contrastive loss function is designed to further enhance the robustness of extracting and aligning meaningful multi-scale features from different types of images. 3. **Swin-Transformer-based Model**: By leveraging the powerful global modeling capability and self-attention mechanism of Swin-Transformer, the model can effectively handle the viewpoint differences between street view images and remote sensing images and extract multi-scale features from them. 4. **Experimental Validation**: Extensive experiments were conducted on two public benchmark datasets, and the results show that the proposed GeoViewMatch method outperforms existing state-of-the-art methods in terms of accuracy and efficiency. In summary, this study proposes an effective solution to improve feature representation capability in cross-view geo-localization tasks by combining Swin-Transformer and contrastive learning techniques, thereby achieving more accurate localization.

GeoViewMatch: A Multi-Scale Feature-Matching Network for Cross-View Geo-Localization Using Swin-Transformer and Contrastive Learning

Geo-Localization with Transformer-Based 2D-3D Match Network

Leveraging Local Planar Motion Property for Robust Visual Matching and Localization.

Each Part Matters: Local Patterns Facilitate Cross-View Geo-Localization

Mutual Relative Position Learning Transformer for Cross-View Geo-Localization

ConGeo: Robust Cross-view Geo-localization across Ground View Variations

Direction-Guided Multiscale Feature Fusion Network for Geo-Localization

Cross-view Geo-localization with Evolving Transformer

A Contrastive Learning Based Multiview Scene Matching Method for UAV View Geo-Localization

UAV-Satellite View Synthesis for Cross-view Geo-Localization

Cross-view Geo-localization via Learning Disentangled Geometric Layout Correspondence

Learning Cross-View Visual Geo-Localization Without Ground Truth

Learning Cross-view Geo-localization Embeddings via Dynamic Weighted Decorrelation Regularization

A Satellite-Drone Image Cross-View Geolocalization Method Based on Multi-Scale Information and Dual-Channel Attention Mechanism

IML-Net: A Framework for Cross-View Geo-Localization with Multi-Domain Remote Sensing Data

TransFG: A Cross-View Geo-Localization of Satellite and UAVs Imagery Pipeline Using Transformer-Based Feature Aggregation and Gradient Guidance

Ground–Satellite Coupling for Cross-View Geolocation Combined With Multiscale Fusion of Spatial Features

A Global-Matching Framework For Multi-View Stereopsis

Multibranch Joint Representation Learning Based on Information Fusion Strategy for Cross-View Geo-Localization

Where am I looking at? Joint Location and Orientation Estimation by Cross-View Matching