Social Media Ready Caption Generation for Brands

Himanshu Maheshwari,Koustava Goswami,Apoorv Saxena,Balaji Vasan Srinivasan
2024-01-03
Abstract:Social media advertisements are key for brand marketing, aiming to attract consumers with captivating captions and pictures or logos. While previous research has focused on generating captions for general images, incorporating brand personalities into social media captioning remains unexplored. Brand personalities are shown to be affecting consumers' behaviours and social interactions and thus are proven to be a key aspect of marketing strategies. Current open-source multimodal LLMs are not directly suited for this task. Hence, we propose a pipeline solution to assist brands in creating engaging social media captions that align with the image and the brand personalities. Our architecture is based on two parts: a the first part contains an image captioning model that takes in an image that the brand wants to post online and gives a plain English caption; b the second part takes in the generated caption along with the target brand personality and outputs a catchy personality-aligned social media caption. Along with brand personality, our system also gives users the flexibility to provide hashtags, Instagram handles, URLs, and named entities they want the caption to contain, making the captions more semantically related to the social media handles. Comparative evaluations against various baselines demonstrate the effectiveness of our approach, both qualitatively and quantitatively.
Computation and Language
What problem does this paper attempt to address?
This paper proposes a solution to the problem of generating attractive headlines for brands on social media that are consistent with their brand image. Current research mainly focuses on generating captions for general images, while neglecting the integration of brand personality. Brand personality influences consumer behavior and social interaction, which is crucial for marketing strategies. Existing multimodal large language models (LLMs) are not suitable for this task. Therefore, the paper proposes a pipeline solution that includes two parts: the first part is an image caption generation model that takes brand images as input and outputs ordinary English captions; the second part takes the generated captions and the target brand personality as input, and outputs social media captions that are in line with the personality, allowing users to add tags, Instagram handles, etc.