Abstract:This article primarily focuses on the study of image-based localization technology. While traditional methods have made significant advancements in technology and applications, the emerging field of visual image-based localization technology demonstrates tremendous potential for research. Deep learning has exhibited a strong performance in image processing, particularly in developing visual navigation and localization techniques using large-scale visual models. This paper introduces a sophisticated scene image localization technique based on large models in a vast spatial sample environment. The study involved training convolutional neural networks using millions of geographically labeled images, extracting image position information using large model algorithms, and collecting sample data under various conditions in elastic scene space. Through visual computation, the shooting position of photos was inferred to obtain the approximate position information of users. This method utilizes geographic location information to classify images and combines it with landmarks, natural features, and architectural styles to determine their locations. The experimental results show variations in positioning accuracy among different models, with the most optimal model obtained through training on a large-scale dataset. They also indicate that the positioning error in urban street-based images is relatively small, whereas the positioning effect in outdoor and local scenes, especially in large-scale spatial environments, is limited. This suggests that the location information of users can be effectively determined through the utilization of geographic data, to classify images and incorporate landmarks, natural features, and architectural styles. The study's experimentation indicates the variation in positioning accuracy among different models, highlighting the significance of training on a large-scale dataset for optimal results. Furthermore, it highlights the contrasting impact on urban street-based images versus outdoor and local scenes in large-scale spatial environments.

Learning Multi-context Aware Location Representations from Large-scale Geotagged Images

Discovering personally semantic places from GPS trajectories.

LESS-Map: Lightweight and Evolving Semantic Map in Parking Lots for Long-term Self-Localization

G3: An Effective and Adaptive Framework for Worldwide Geolocalization Using Large Multi-Modality Models

Img2Loc: Revisiting Image Geolocalization using Multi-modality Foundation Models and Image-based Retrieval-Augmented Generation

Learning Large-scale Location Embedding From Human Mobility Trajectories with Graphs

Geolocation Representation from Large Language Models are Generic Enhancers for Spatio-Temporal Learning

Context-Aware Location Annotation on Mobility Records Through User Grouping.

Ground–Satellite Coupling for Cross-View Geolocation Combined With Multiscale Fusion of Spatial Features

FLsM: Fuzzy Localization of Image Scenes Based on Large Models

Image-Based Geolocation Using Large Vision-Language Models

Multi-context Embedding Based Personalized Place Semantics Recognition.

Multimodal Information Joint Learning for Geotagged Image Search.

Location Discriminative Vocabulary Coding for Mobile Landmark Search

Learning Neighborhood Representation from Multi-Modal Multi-Graph: Image, Text, Mobility Graph and Beyond

Accurate sensing of scene geo-context via mobile visual localization

Location Sensitive Image Retrieval and Tagging

IML-Net: A Framework for Cross-View Geo-Localization with Multi-Domain Remote Sensing Data

GaGA: Towards Interactive Global Geolocation Assistant

Multi-modal Tag Localization for Mobile Video Search.

Where We Are and What We're Looking At: Query Based Worldwide Image Geo-localization Using Hierarchies and Scenes