FLsM: Fuzzy Localization of Image Scenes Based on Large Models

Weiyi Chen,Lingjuan Miao,Jinchao Gui,Yuhao Wang,Yiran Li

DOI: https://doi.org/10.3390/electronics13112106

IF: 2.9

2024-05-29

Electronics

Abstract:This article primarily focuses on the study of image-based localization technology. While traditional methods have made significant advancements in technology and applications, the emerging field of visual image-based localization technology demonstrates tremendous potential for research. Deep learning has exhibited a strong performance in image processing, particularly in developing visual navigation and localization techniques using large-scale visual models. This paper introduces a sophisticated scene image localization technique based on large models in a vast spatial sample environment. The study involved training convolutional neural networks using millions of geographically labeled images, extracting image position information using large model algorithms, and collecting sample data under various conditions in elastic scene space. Through visual computation, the shooting position of photos was inferred to obtain the approximate position information of users. This method utilizes geographic location information to classify images and combines it with landmarks, natural features, and architectural styles to determine their locations. The experimental results show variations in positioning accuracy among different models, with the most optimal model obtained through training on a large-scale dataset. They also indicate that the positioning error in urban street-based images is relatively small, whereas the positioning effect in outdoor and local scenes, especially in large-scale spatial environments, is limited. This suggests that the location information of users can be effectively determined through the utilization of geographic data, to classify images and incorporate landmarks, natural features, and architectural styles. The study's experimentation indicates the variation in positioning accuracy among different models, highlighting the significance of training on a large-scale dataset for optimal results. Furthermore, it highlights the contrasting impact on urban street-based images versus outdoor and local scenes in large-scale spatial environments.

engineering, electrical & electronic,physics, applied,computer science, information systems

What problem does this paper attempt to address?

The paper primarily focuses on the research of image localization technology and proposes a fuzzy localization method for image scenes based on large models (FLsM). This method aims to address the problem of achieving visual localization in the absence of or inability to use satellite navigation signals. Specifically, the study attempts to solve the following key issues: 1. **Achieving Fuzzy Image Localization**: The paper proposes a technique for providing geographic range localization based on the extraction of semantic information from the visual image itself, known as visual fuzzy localization. This method can provide localization information when satellite navigation is unavailable. 2. **Constructing an Elastic Scale Space Environment**: To meet the localization needs of different scale scenarios, the paper introduces the concept of an "elastic scale space." This space covers variations from large-scale to fine-grained scales, emphasizing the variability and uncertainty of the environment, and allows localization through the information contained in the image itself. 3. **Utilizing Large-Scale Datasets to Train Models**: The paper uses a large number of geo-tagged images to train convolutional neural networks and employs large model algorithms to extract image location information. This helps improve the efficiency and accuracy of image matching, especially in fuzzy localization tasks. 4. **Improving Localization Accuracy**: By comparing different models, the study finds that the scale of the training dataset is crucial for optimizing localization results. Additionally, the paper discusses the differences in localization performance in various types of scenarios (such as urban streets, outdoor, and local scenes). In summary, the main goal of this research is to explore a lightweight visual localization method in environments where satellite navigation systems are limited or fail, to meet the needs of various scenarios and applications. By combining the advantages of deep learning technology and geographic information systems, the proposed solution in the paper can achieve reliable image localization in complex and variable environments.

FLsM: Fuzzy Localization of Image Scenes Based on Large Models

3D Model-free Visual Localization System from Essential Matrix under Local Planar Motion

Long-Term Map-Based Visual Localization: Analysis of Individual Components of a Hierarchical Pipeline

Leveraging Local Planar Motion Property for Robust Visual Matching and Localization.

Visual Localizer: Outdoor Localization Based on ConvNet Descriptor and Global Optimization for Visually Impaired Pedestrians

3D LiDAR-Based Global Localization Using Siamese Neural Network

From Satellite to Ground: Satellite Assisted Visual Localization with Cross-view Semantic Matching

LSFB: A Low-cost and Scalable Framework for Building Large-Scale Localization Benchmark

FM-Loc: Using Foundation Models for Improved Vision-based Localization

Robust and accurate mobile visual localization and its applications

Visual Localization via Few-Shot Scene Region Classification

A Survey on Deep Learning for Localization and Mapping: Towards the Age of Spatial Machine Intelligence

iSimLoc: Visual Global Localization for Previously Unseen Environments With Simulated Images

Deep Learning for Visual Localization and Mapping: A Survey

Local Supports Global: Deep Camera Relocalization With Sequence Enhancement

Visual-Marker-Based Localization for Flat-Variation Scene

Self-Supervised Camera Relocalization with Hierarchical Fern Encoding

Deep 6-DoF camera relocalization in variable and dynamic scenes by multitask learning

Image-Based Geolocation Using Large Vision-Language Models