A Fast Image Matching Algorithm Based on Locality-sensitive Hashing

Zhijian Song,Wei Li,Yuping Song,Jun Sun,Weiwei Zhuang
DOI: https://doi.org/10.21203/rs.3.rs-3193302/v1
2023-01-01
Abstract:The progress of Internet technology and the portability of mobile communication equipment have provided essential convenience for building an information society. Today, with networked and digital platforms as an important way of socializing, the information carrier is more based on images, videos, etc. The explosive growth of these multimedia data poses a huge challenge to efficient data storage and fast information retrieval. We design an algorithm combining Locality-sensitive Hashing (LSH) and Vision Transformer (ViT) to greatly improve the image matching rate and accuracy, and the algorithm is called ViT-LSH. The principle of LSH is to form the same hash value for points with similar distances in the data set. The results show that the LSH algorithm reduces space consumption and improves query processing efficiency compared with the other five schemes. Therefore, the LSH algorithm can help to quickly get the approximate nearest neighbor query results in a probabilistic manner, thereby ensuring the accuracy of the query results and the efficiency of time and space. The ViT divides the image into fixed-size image blocks ( Patch), uses Linear projection to project the Patch flatten to the specified dimension, gets the Tokens sequence, and adopts the sequence as the input of the feature to realize a new segmentation mode. The ViT does not employ down-sampling so that the image resolution is not reduced. The modeling of global information is a new semantic segmentation mode. The steps of ViT-LSH are as follows: first, preprocess the image, such as denoising, then segment the image, then extract features by ViT, and then reduce features' dimension. The LSH performs hash coding on the image features. After calculating the image's hash code, the image's similarity can be directly calculated by the Hamming distance, sorted according to the similarity, and the matching results are obtained. The experimental results exhibit that ViT-LSH can greatly enhance the efficiency and accuracy of image matching.
What problem does this paper attempt to address?