Abstract:This article primarily focuses on the study of image-based localization technology. While traditional methods have made significant advancements in technology and applications, the emerging field of visual image-based localization technology demonstrates tremendous potential for research. Deep learning has exhibited a strong performance in image processing, particularly in developing visual navigation and localization techniques using large-scale visual models. This paper introduces a sophisticated scene image localization technique based on large models in a vast spatial sample environment. The study involved training convolutional neural networks using millions of geographically labeled images, extracting image position information using large model algorithms, and collecting sample data under various conditions in elastic scene space. Through visual computation, the shooting position of photos was inferred to obtain the approximate position information of users. This method utilizes geographic location information to classify images and combines it with landmarks, natural features, and architectural styles to determine their locations. The experimental results show variations in positioning accuracy among different models, with the most optimal model obtained through training on a large-scale dataset. They also indicate that the positioning error in urban street-based images is relatively small, whereas the positioning effect in outdoor and local scenes, especially in large-scale spatial environments, is limited. This suggests that the location information of users can be effectively determined through the utilization of geographic data, to classify images and incorporate landmarks, natural features, and architectural styles. The study's experimentation indicates the variation in positioning accuracy among different models, highlighting the significance of training on a large-scale dataset for optimal results. Furthermore, it highlights the contrasting impact on urban street-based images versus outdoor and local scenes in large-scale spatial environments.

Object Localization Based on Natural Language Descriptions for Fine-Grained Image

LocNet: Global Localization in 3D Point Clouds for Mobile Robots.

3D LiDAR-Based Global Localization Using Siamese Neural Network

DesCo: Learning Object Recognition with Rich Language Descriptions

From Satellite to Ground: Satellite Assisted Visual Localization with Cross-view Semantic Matching

LocLoc: Low-level Cues and Local-area Guides forWeakly Supervised Object Localization

Global Localization with Object-Level Semantics and Topology

LocLoc: Low-level Cues and Local-area Guides for Weakly Supervised Object Localization

CurriculumLoc: Enhancing Cross-Domain Geolocalization through Multi-Stage Refinement

CurriculumLoc: Enhancing Cross-Domain Geolocalization Through Multistage Refinement

Multimodal Query-guided Object Localization

FLsM: Fuzzy Localization of Image Scenes Based on Large Models

Multi-scale discriminative Region Discovery for Weakly-Supervised Object Localization

Natural Language Object Retrieval

Learning Semantic-Aware Local Features for Long Term Visual Localization

Learning Local Features with Context Aggregation for Visual Localization

Few-shot Object Localization

Object Localization Based on Proposal Fusion.

Semantic R-CNN for Natural Language Object Detection.

LOC-ZSON: Language-driven Object-Centric Zero-Shot Object Retrieval and Navigation

An End-to-End Approach to Natural Language Object Retrieval Via Context-Aware Deep Reinforcement Learning.