A Novel Multimodal Feature-Level Fusion Scheme for High-Accurate Indoor Localization

Siyu Tang,Kaixuan Huang,Shunqing Zhang
DOI: https://doi.org/10.1109/jsen.2024.3397793
IF: 4.3
2024-06-19
IEEE Sensors Journal
Abstract:Smartphone-based indoor localization has been widely explored to meet the demand for high-precision, cost-effective solutions within indoor positioning systems. Prior methodologies have predominantly concentrated on enhancing the localization accuracy inherent in single-sensor-based localization, thereby potentially constraining their applications. In this article, we introduce an innovative wireless fidelity (WiFi)-visual multimodal framework designed for achieving high-precision, low-cost indoor localization. The initial stages involve the utilization of two modal-specific encoders for feature extraction. Then, we propose a multimodal fusion transformer to incorporate the global context and adopt elementwise summation to fuse the two deep features. Finally, we leverage a task-specific decoder for position prediction. During training, WiFi-aided learning is adopted to confer enhanced reliability to the labeling process. The efficacy of the proposed method is assessed in a corridor scenario and a laboratory scenario of a typical building. During testing, we propose backend optimization to get smooth and globally consistent location predictions. Experimental outcomes affirm that the proposed WiFi-visual multimodal approach introduced herein attains a localization accuracy of less than half a meter, concurrently exhibiting commendable runtime efficiency.
engineering, electrical & electronic,instruments & instrumentation,physics, applied
What problem does this paper attempt to address?