Improving Image-Text Matching by Integrating Word Sense Disambiguation

Xiao Pu,Ping Yang,Lin Yuan,Xinbo Gao
DOI: https://doi.org/10.1109/lsp.2024.3466992
2024-10-08
IEEE Signal Processing Letters
Abstract:This letter presents a novel approach to enhance image-text matching by incorporating word sense disambiguation (WSD) within the text encoder. Our method explicitly models the senses of potentially ambiguous words, refining the semantic understanding between images and text. We introduce a sense-aware mechanism for image-text alignment by integrating a lightweight WSD component into the matching framework, optimizing both tasks simultaneously. Our WSD module operates on extensive word contexts, leveraging the power of graph attention networks (GAT), and distills knowledge from a substantially larger pre-trained WSD model through multi-task learning. Our experiments demonstrate the effectiveness of augmenting original word embeddings with sense representations derived from our WSD approach. We systematically evaluate our method against several baselines and state-of-the-art approaches on two widely-used image-text matching benchmarks: MS-COCO and Flickr30K. The results illustrate significant improvements in matching accuracy, highlighting the efficacy of our proposed approach.
engineering, electrical & electronic
What problem does this paper attempt to address?