Semantic Completion: Enhancing Image-Text Retrieval with Information Extraction and Compression

Xue Chen,Yi Guo
DOI: https://doi.org/10.1007/978-981-97-2238-9_5
2024-01-01
Abstract:Image-text retrieval is an essential branch in the field of information retrieval, facing the serious challenge of the cross-modal semantic gap. Although significant progress has been made in recent years, most research has ignored an essential problem: text as image description is incomplete, so the problem of semantic loss between image and text still exists. In this paper, we propose a novel information extraction and compression based image-text retrieval method to alleviate the above problem. The method aims to bridge the semantic gap between the two modalities by generating rich and high-quality semantic descriptions from a set of related sentences via an information extraction and compression module. To validate the effectiveness of the method, we conducted extensive experiments on the Flickr30K and MSCOCO datasets. The experimental results show that our method achieves significant performance improvement in image text retrieval with an appropriate fusion ratio. When the amount of pre-trained images is 4M, the evaluation metrics of our method improve by at least 4.22% and 3.69% compared to the baseline method. This further confirms the advantages and potential of our method in solving the semantic loss problem in image-text retrieval.
What problem does this paper attempt to address?