Abstract:Satellite-Drone Image Cross-View Geolocalization has wide applications. Due to the pronounced variations in the visual features of 3D objects under different angles, Satellite-Drone cross-view image geolocalization remains an unresolved challenge. The key to successful cross-view geolocalization lies in extracting crucial spatial structure information across different scales in the image. Recent studies improve image matching accuracy by introducing an attention mechanism to establish global associations among local features. However, existing methods primarily focus on using single-scale features and employ a single-channel attention mechanism to correlate local convolutional features from different locations. This approach inadequately explores and utilizes multi-scale spatial structure information within the image, particularly lacking in the extraction and utilization of locally valuable information. In this paper, we propose a cross-view image geolocalization method based on multi-scale information and a dual-channel attention mechanism. The multi-scale information includes features extracted from different scales using various convolutional slices, and it extensively utilizes shallow network features. The dual-channel attention mechanism, through successive local and global feature associations, effectively learns depth discriminative features across different scales. Experimental results were conducted using existing satellite and drone image datasets, with additional validation performed on an independent self-made dataset. The findings indicate that our approach exhibits superior performance compared to existing methods. The methodology presented in this paper exhibits enhanced capabilities, especially in the exploitation of multi-scale spatial structure information and the extraction of locally valuable information.

Attention-Enhanced Cross-modal Localization Between Spherical Images and Point Clouds

Persistent Stereo Visual Localization on Cross-Modal Invariant Map

3D LiDAR-Based Global Localization Using Siamese Neural Network

Multimodal Localization: Stereo over LiDAR Map

LiDAR-Based Global Localization Using Histogram of Orientations of Principal Normals

2-Entity RANSAC for Robust Visual Localization in Changing Environment

LocNet: Global Localization in 3D Point Clouds for Mobile Robots.

Leveraging Local Planar Motion Property for Robust Visual Matching and Localization.

Monocular Visual Place Recognition in LiDAR Maps via Cross-Modal State Space Model and Multi-View Matching

C2L-PR: Cross-modal Camera-to-LiDAR Place Recognition Via Modality Alignment and Orientation Voting

LHMap-loc: Cross-Modal Monocular Localization Using LiDAR Point Cloud Heat Map

(LC)$^2$: LiDAR-Camera Loop Constraints For Cross-Modal Place Recognition

From Satellite to Ground: Satellite Assisted Visual Localization with Cross-view Semantic Matching

A Hybrid Approach for Cross-modality Pose Estimation Between Image and Point Cloud

Cross-Modal Visual Relocalization in Prior LiDAR Maps Utilizing Intensity Textures

Monocular Camera Localization in Prior LiDAR Maps with 2D-3D Line Correspondences

A Satellite-Drone Image Cross-View Geolocalization Method Based on Multi-Scale Information and Dual-Channel Attention Mechanism

ModaLink: Unifying Modalities for Efficient Image-to-PointCloud Place Recognition

SphereVLAD++: Attention-Based and Signal-Enhanced Viewpoint Invariant Descriptor

Global Localization in Large-scale Point Clouds via Roll-pitch-yaw Invariant Place Recognition and Low-overlap Global Registration

CVLNet: Cross-View Semantic Correspondence Learning for Video-based Camera Localization