Image Retrieval for Visual Localization via Scene Text Detection and Logo Filtering

Xiaotian Tang,Dongxiao Li,Ming Zhang
DOI: https://doi.org/10.1109/icivc55077.2022.9886770
2022-01-01
Abstract:Visual localization is a core component of technologies enabling autonomous driving, augmented reality, etc. Image retrieval is a crucial step commonly used in state-of-the-art visual localization approaches. This paper presents a text-based image retrieval and filtering algorithm specifically designed for shopping mall scenarios. Compared to repetitive architectural styles, e.g. walls, floor, columns, which are common in shopping malls, logos of the shops are more informative and distinctive. The characteristics of text regions are fully utilized: (1) features extracted by a CNN network are used to filter out texts with less information, (2) image patches centered on text areas are used for global feature extraction, (3) the size and contact of text areas are used for the ranking of retrieved images when multiple texts are detected in one query image. Our approach can be incorporated into most existing retrieval network generating a global descriptor for the input image. Intensive experiments are carried out on ground-truth image databases, verifying the effectiveness of our approach.
What problem does this paper attempt to address?