Stylized Image Captioning Model Based on Disentangle-Retrieve-Generate

CHEN Zhang-hui,XIONG Yun
DOI: https://doi.org/10.11896/jsjkx.211100129
2022-01-01
Computer Science
Abstract:Image captioning aims to generate a description text for the input image to accurately describe the image content.The stylized image captioning goes a step further on the basis of image captioning and introduces the consideration of language style.It also needs appropriately express the specific language style,which makes the generated text more diverse.In order to better incorporate style factors to the description text,a stylized image captioning model based on disentangle-retrieve-generate framework is proposed.The model first splits the sentences in the stylized corpus into content and style parts,and constructs a content-style memory module,then retrieves appropriate style from the memory module according to the factual caption of the image.Finally,the factual caption and retrieved style part are input into the language model for stylized caption generation.Experimental results on real datasets show that,compared to existing methods,the proposed model has better performance in various evaluation me-trics,and can accurately describe the image content while expressing a specific style.
What problem does this paper attempt to address?