Towards Real-World Stickers Use: A New Dataset for Multi-Tag Sticker Recognition

Bingbing Wang,Bin Liang,Chun-Mei Feng,Wangmeng Zuo,Zhixin Bai,Shijue Huang,Kam-Fai Wong,Xi Zeng,Ruifeng Xu
2024-06-16
Abstract:In real-world conversations, the diversity and ambiguity of stickers often lead to varied interpretations based on the context, necessitating the requirement for comprehensively understanding stickers and supporting multi-tagging. To address this challenge, we introduce StickerTAG, the first multi-tag sticker dataset comprising a collected tag set with 461 tags and 13,571 sticker-tag pairs, designed to provide a deeper understanding of stickers. Recognizing multiple tags for stickers becomes particularly challenging due to sticker tags usually are fine-grained attribute aware. Hence, we propose an Attentive Attribute-oriented Prompt Learning method, ie, Att$^2$PL, to capture informative features of stickers in a fine-grained manner to better differentiate tags. Specifically, we first apply an Attribute-oriented Description Generation (ADG) module to obtain the description for stickers from four attributes. Then, a Local Re-attention (LoR) module is designed to perceive the importance of local information. Finally, we use prompt learning to guide the recognition process and adopt confidence penalty optimization to penalize the confident output distribution. Extensive experiments show that our method achieves encouraging results for all commonly used metrics.
Multimedia
What problem does this paper attempt to address?
The paper primarily addresses the challenges brought by the complexity and diversity of sticker usage in the real world by proposing a new multi-label sticker dataset called StickerTAG and a method named "Attentive Attribute-oriented Prompt Learning" (Att2PL). ### Problems Addressed 1. **Multi-label Sticker Understanding and Recognition**: In real conversations, stickers often generate multiple interpretations due to their diversity and ambiguity, requiring a system that can comprehensively understand stickers and support multi-label annotation. 2. **Building High-Quality Datasets**: Existing datasets usually assign a single emotion label to each sticker, limiting their ability to capture the diverse information that stickers may convey. Therefore, the authors created a sticker dataset, StickerTAG, which includes multiple labels. 3. **Proposing Effective Methods for Multi-label Sticker Recognition**: Facing the challenge of multi-label sticker recognition, especially the need to accurately distinguish the rich and subtle meanings in stickers, the authors designed a new method, Att2PL, to capture the fine-grained features of stickers, thereby better distinguishing different labels. ### Method Overview - **StickerTAG Dataset**: Contains 13,571 sticker-label pairs and 461 labels, making it the first sticker dataset for multi-label recognition. - **Attentive Attribute-oriented Prompt Learning (Att2PL)**: - **Attribute-oriented Description Generation (ADG)**: Utilizes large multimodal language models to generate descriptions based on attributes such as content, style, character, and action. - **Local Re-attention (LoR) Module**: Captures key area information in stickers through masked image modeling to obtain more detail-focused embedding representations. - **Prompt-based Classification**: Uses soft prompts initialized with attribute descriptions for classification to achieve multi-label prediction. - **Confidence Penalty Optimization**: Penalizes overly confident output distributions during optimization to improve model performance. ### Main Contributions 1. **First Multi-label Sticker Dataset**: StickerTAG, including 461 labels and 13,571 sticker-label pairs, aimed at more practical scenarios. 2. **Proposed a New Method**: Att2PL, which helps capture the fine-grained features of stickers, enhancing the model's ability to distinguish different labels. 3. **Experimental Results**: Extensive experiments on the StickerTAG dataset show that this method significantly outperforms baseline models, validating its effectiveness.