To Be an Artist: Automatic Generation on Food Image Aesthetic Captioning

Xiaohan Zou,Cheng Lin,Yinjia Zhang,Qinpei Zhao
DOI: https://doi.org/10.1109/ictai50040.2020.00124
2020-01-01
Abstract:Image aesthetic captioning is a multi-modal task that is to generate aesthetic critiques for images. In contrast to common image captioning tasks, where different captions aimed at providing factual descriptions of a same image are always similar, captions with respect to different aesthetic attributes of the same image can be totally different in an aesthetic captioning task. Such inter-aspect differences are always overlooked, which leads to the lack of diversity and coherence of the captions generated by most of the existing image aesthetic captioning systems. In this paper, we propose a novel model to generate aesthetic captions for food images. Our model redefines food image aesthetic captioning as a compositional task that consists of two separated modules, i.e., a single-aspect captioning and an unsupervised text compression. The first module is guaranteed to generate the captions and learn feature representations of each aesthetic attribute. Then, the second module is supposed to study the associations among all feature representations and automatically aggregate captions of all aesthetic attributes to a final sentence. We also collect a dataset which contains pair-wise image-comment data related to six aesthetic attributes. Two new evaluation criteria are introduced to comprehensively assess the quality of the generated captions. Experiments on the dataset demonstrate the effectiveness of the proposed model.
What problem does this paper attempt to address?