Abstract:Cross-view geo-localization aims to match the query input ground-view image and the aerial-view images in the reference dataset one by one to determine the ground image's geographic location. This research is extremely challenging because the variation of the observation angle between cross-view images brings about great geometric appearance differences between image pairs. Nowadays, the introduction of generative networks into matching models has been shown to work well on the CVUSA (Cross-View USA) dataset, and the latest models clarify the paradigm of end-to-end generative cross-view image matching methods. However, this result relies on an assumption on the dataset: for all query input ground images, there must exist a reference aerial image that is exactly centered on the location of that image, which is clearly not consistent with real-world application scenarios; and the performance of state-of-the-art generative models degrades significantly when departing from this assumption of center alignment. To address this problem, this paper provides a generative model (atten-ganCV) for non-center-aligned datasets. This model feeds the query ground image directly into a generative adversarial network to obtain a generated aerial view image, where the generator atten-UNet innovatively introduces an attention mechanism. Then, model matches the synthesized image with the real aerial image in the reference dataset one by one, and finally obtains the matching result with the highest similarity, thus determining the geographic location of the query input. The model is tested on both the center-aligned CVUSA dataset and the non-center-aligned VIGOR (Cross-view Image Geo-localization beyond One-to-one Retrieval) dataset. In the VIGOR dataset, this model achieves approximately the same accuracy as the state-of-the-art model with 3 times the inference speed.

Retrieval-guided Cross-view Image Synthesis

Benchmarking Large-Scale Multi-View 3D Reconstruction Using Realistic Synthetic Images

Generative View Synthesis: From Single-view Semantics to Novel-view Images

CrossViewDiff: A Cross-View Diffusion Model for Satellite-to-Street View Synthesis

Cross-View Image Translation Based on Local and Global Information Guidance

Geometry-guided Cross-view Diffusion for One-to-many Cross-view Image Synthesis

Cross-View Meets Diffusion: Aerial Image Synthesis with Geometry and Text Guidance

Cross-View Image Synthesis Using Conditional GANs

Atten-ganCV: an End-to-End Close-Coupled Image-Generating Cross-View Network

View Synthesis with Multi-scale Cost Aggregation and Confidence Prior

Cross-View Image Retrieval -- Ground to Aerial Image Retrieval through Deep Learning

View Independent Generative Adversarial Network for Novel View Synthesis

Efficient Depth-Guided Urban View Synthesis

Deep View Synthesis Via Self-Consistent Generative Network

CVGSR: Stereo image Super-Resolution with Cross-View guidance

Cross-view image geo-localization with Panorama-BEV Co-Retrieval Network

Multi-Channel Attention Selection GAN With Cascaded Semantic Guidance for Cross-View Image Translation

Cross-view Self-localization from Synthesized Scene-graphs

Cross View Capture for Stereo Image Super-Resolution

Deep Cross-View Reconstruction GAN Based on Correlated Subspace for Multi-View Transformation

Coming Down to Earth: Satellite-to-Street View Synthesis for Geo-Localization