A novel approach for image retrieval in remote sensing using vision-language-based image caption generation
Prem Shanker Yadav,Dinesh Kumar Tyagi,Santosh Kumar Vipparthi
DOI: https://doi.org/10.1007/s11042-024-20447-w
IF: 2.577
2024-12-04
Multimedia Tools and Applications
Abstract:Recent advancements in satellite technologies have resulted in the emergence of Remote Sensing (RS) images. Hence, the primary imperative research domain is designing a precise retrieval model for retrieving the most pertinent images based on the query. Present Remote Sensing Image Retrieval (RSIR) systems use visual descriptors to characterize the primitives (such as various land-cover types) that are visible in the images. However, the visual descriptors are inadequate for defining the complicated content of RS images. To solve this problem, a new model is devised for image retrieval based on image captions. The goal is to generate textual illustrations with captions to define relations amongst objects precisely. Here, image captioning is attained based on the vision-language pre-training model. The image captions are utilized for generating features like term frequency-inverse document frequency (TF-IDF), length of text, and Bag of Words. Meanwhile, query text is utilized wherein features like TF-IDF, text length, and Bag of Words are obtained. The similarity between query text features and the image captions features has been computed on the basis of a hybrid similarity measure wherein weights are tuned with the proposed Honey Badger Political Optimizer (HBPO) to retrieve the image. The proposed HBPO provided enhanced efficiency with elevated precision of 93.3%, recall of 93.7%, F1-score of 93.5%, and Recall-Oriented Understudy for Gisting Evaluation (ROUGE) of 0.441.
computer science, information systems, theory & methods,engineering, electrical & electronic, software engineering