Abstract:The task of visual word sense disambiguation (VWSD) is to find the images that accurately match target words or phrases from a given candidate set of images. It is a challenging problem, requiring VWSD models to understand the related information between images and texts and distinguish the ambiguous information. To address this problem, we propose a method of contrastive learning with soft negative sampling (CLSNS), which facilitates the optimization of VWSD models via comparisons of positive and soft negative samples. When calculating the text and image comparison loss, the ambiguous target word is used as the soft negative sample, the target phrase with known semantics is used as the positive sample, and other irrelevant text image pairs are used as negative samples. bidirectional margin loss (BML) is utilized to distinguish between negative samples and soft negative samples, and Info Noise-contrastive estimation (InfoNCE) loss to distinguish between positive and negative samples. To make the model could be trained more effectively, we expand the target word and target phrase using prompt technology. We can broaden the diversity of the training data by inserting various prompts which will enhance the model's functionality and generalizability. These augmentations can improve the model's comprehension of and ability to provide domain-specific content, resulting in more precise and focused responses. Moreover, an extended Wikipedia-based text model is introduced to enrich the input text information. Extensive experiments conducted on a VWSD dataset shared from SemEval 2023 demonstrate the effect of the proposed method.

Vision Meets Definitions: Unsupervised Visual Word Sense Disambiguation Incorporating Gloss Information

Unsupervised Word Sense Disambiguation Based on WordNet

Contrastive Learning with Soft Negative Sampling for Visual Word Sense Disambiguation.

Large Language Models and Multimodal Retrieval for Visual Word Sense Disambiguation

Language Models as Knowledge Bases for Visual Word Sense Disambiguation

Word Sense Disambiguation by Refining Target Word Embedding

SRCB at SemEval-2023 Task 1: Prompt Based and Cross-Modal Retrieval Enhanced Visual Word Sense Disambiguation.

GlossBERT: BERT for Word Sense Disambiguation with Gloss Knowledge

Improving Image-Text Matching by Integrating Word Sense Disambiguation

Learning Sense Representation from Word Representation for Unsupervised Word Sense Disambiguation (student Abstract).

Multi-sense Definition Modeling using Word Sense Decompositions

Incorporating Glosses into Neural Word Sense Disambiguation

Word Sense Disambiguation: A Comprehensive Knowledge Exploitation Framework

MTA: A Lightweight Multilingual Text Alignment Model for Cross-Language Visual Word Sense Disambiguation

Word Sense Disambiguation using Knowledge-based Word Similarity

Chinese word sense disambiguation with variable context window

Word Sense Disambiguation by Semantic Inference.

MG-BERT: A Multi-glosses BERT Model for Word Sense Disambiguation

WSD-GAN: Word Sense Disambiguation Using Generative Adversarial Networks.

A Unified Model for Word Sense Representation and Disambiguation.

Chinese Word Sense Disambiguation Based on Context Expansion.