Product promotion copywriting from multimodal data: New benchmark and model

Jinjin Ren,Wei Zheng,Liming Lin
DOI: https://doi.org/10.2139/ssrn.4592141
IF: 6
2024-01-11
Neurocomputing
Abstract:In our latest project, we devise a comprehensive corpus for product promotion text generation, named Video-Enabled Product Promotion Corpus (VPPC), which integrates multimodal and multi-structural information of products such as visual spatial details and fine structural specifics. It is crucial to highlight that this is one of the largest datasets available in the field of video captioning. Notably, conventional multimodal text generation often focuses on regular descriptions of entities and events, which doesn not suffice the real-world requirements of product promotion copywriting, as it necessitates a more lively language style and a high degree of authenticity. Regrettably, there is an evident lack of reusable evaluation frameworks and sufficient datasets at the current stage. To address these challenges, we have proposed a unique baseline approach and authenticity evaluation metric, both tailored to meet the realistic demands of our dataset. The results are promising, as our method surpasses previous approaches across all evaluation metrics.
computer science, artificial intelligence
What problem does this paper attempt to address?