Image-relevant Entities Knowledge aware News Image Captioning

Sonali Ajankar,Tanima Dutta
DOI: https://doi.org/10.1109/mmul.2024.3363429
IF: 3.4911
2024-01-01
IEEE Multimedia
Abstract:News image captioning (NIC) generates entity-rich captions for news images via news article context. However, it inherits various challenges, like the presence of abstract semantic information based on named entities deteriorating the relationship between news images and the article. Due to the ambiguous relationship among image articles, the existing works struggle to exploit multimodal clues between text and images. To alleviate the aforementioned limitations, we proposed the image-relevant entities knowledge-aware NIC (IEK-NIC) novel framework. We propose to tweak the output of the model using the time-step-bounded entity constrained beam search algorithm for incorporating the entities’ knowledge produced by the image-relevant entities generation method. The efficient usage of entities while generating captions plays a crucial role in enhancing performance. IEK-NIC shows an improvement in Consensus-based Image Description Evaluation score by a margin of 1.46 and 1.49 compared to the state of the art on the GoodNews and NYTimes800K datasets, respectively.
computer science, information systems, theory & methods, software engineering, hardware & architecture
What problem does this paper attempt to address?