Image Caption Generation Method Based on Knowledge Graph Guidance and Self-Attention Mechanism

min zhang,zhiwei Wu,JingFan Tang,PengFei Li,Ming Jiang
DOI: https://doi.org/10.2139/ssrn.4263601
2022-01-01
SSRN Electronic Journal
Abstract:Generating accurate image caption in different environments usually requires a combination of information about the target's environment, target behavior, etc. Knowledge graphs can model entities, concepts, attributes, and their relationships, using a basic, universal "language" like "graphs" to represent real-world relationships in high quality and provide rich semantic relationships for an individual caption. Therefore, this paper proposes an image caption generation method based on knowledge graph guidance and self-attention mechanism(KSA). Firstly, with the guidance of a knowledge graph, we generate corresponding triads for target-target connections in images; secondly, we build an adaptive attention mechanism model (KGAM) that can automatically decide when the encoder-decoder framework relies on visual information and when it relies on language model and knowledge graph information; finally, we use the popular CNN+LSTM encoder-decoder method to generate image caption information-carrying target-target connections by combining the triple and the self-attentive mechanism. In this paper, the performance of the model was compared with the base method on the MSCOCO and Visual Genome datasets, and better results were obtained.
What problem does this paper attempt to address?