Abstract:In real-world conversations, the diversity and ambiguity of stickers often lead to varied interpretations based on the context, necessitating the requirement for comprehensively understanding stickers and supporting multi-tagging. To address this challenge, we introduce StickerTAG, the first multi-tag sticker dataset comprising a collected tag set with 461 tags and 13,571 sticker-tag pairs, designed to provide a deeper understanding of stickers. Recognizing multiple tags for stickers becomes particularly challenging due to sticker tags usually are fine-grained attribute aware. Hence, we propose an Attentive Attribute-oriented Prompt Learning method, ie, Att$^2$PL, to capture informative features of stickers in a fine-grained manner to better differentiate tags. Specifically, we first apply an Attribute-oriented Description Generation (ADG) module to obtain the description for stickers from four attributes. Then, a Local Re-attention (LoR) module is designed to perceive the importance of local information. Finally, we use prompt learning to guide the recognition process and adopt confidence penalty optimization to penalize the confident output distribution. Extensive experiments show that our method achieves encouraging results for all commonly used metrics.

What problem does this paper attempt to address?

The paper primarily addresses the challenges brought by the complexity and diversity of sticker usage in the real world by proposing a new multi-label sticker dataset called StickerTAG and a method named "Attentive Attribute-oriented Prompt Learning" (Att2PL). ### Problems Addressed 1. **Multi-label Sticker Understanding and Recognition**: In real conversations, stickers often generate multiple interpretations due to their diversity and ambiguity, requiring a system that can comprehensively understand stickers and support multi-label annotation. 2. **Building High-Quality Datasets**: Existing datasets usually assign a single emotion label to each sticker, limiting their ability to capture the diverse information that stickers may convey. Therefore, the authors created a sticker dataset, StickerTAG, which includes multiple labels. 3. **Proposing Effective Methods for Multi-label Sticker Recognition**: Facing the challenge of multi-label sticker recognition, especially the need to accurately distinguish the rich and subtle meanings in stickers, the authors designed a new method, Att2PL, to capture the fine-grained features of stickers, thereby better distinguishing different labels. ### Method Overview - **StickerTAG Dataset**: Contains 13,571 sticker-label pairs and 461 labels, making it the first sticker dataset for multi-label recognition. - **Attentive Attribute-oriented Prompt Learning (Att2PL)**: - **Attribute-oriented Description Generation (ADG)**: Utilizes large multimodal language models to generate descriptions based on attributes such as content, style, character, and action. - **Local Re-attention (LoR) Module**: Captures key area information in stickers through masked image modeling to obtain more detail-focused embedding representations. - **Prompt-based Classification**: Uses soft prompts initialized with attribute descriptions for classification to achieve multi-label prediction. - **Confidence Penalty Optimization**: Penalizes overly confident output distributions during optimization to improve model performance. ### Main Contributions 1. **First Multi-label Sticker Dataset**: StickerTAG, including 461 labels and 13,571 sticker-label pairs, aimed at more practical scenarios. 2. **Proposed a New Method**: Att2PL, which helps capture the fine-grained features of stickers, enhancing the model's ability to distinguish different labels. 3. **Experimental Results**: Extensive experiments on the StickerTAG dataset show that this method significantly outperforms baseline models, validating its effectiveness.

Towards Real-World Stickers Use: A New Dataset for Multi-Tag Sticker Recognition

Sticker820K: Empowering Interactive Retrieval with Stickers

Reply with Sticker: New Dataset and Model for Sticker Retrieval

Selecting Stickers in Open-Domain Dialogue Through Multitask Learning

Impact of Stickers on Multimodal Chat Sentiment Analysis and Intent Recognition: A New Task, Dataset and Baseline

TGCA-PVT: Topic-Guided Context-Aware Pyramid Vision Transformer for Sticker Emotion Recognition

A Small Sticker is Enough: Spoofing Face Recognition Systems Via Small Stickers

VSD2M: A Large-scale Vision-language Sticker Dataset for Multi-frame Animated Sticker Generation

STICKERCONV: Generating Multimodal Empathetic Responses from Scratch

PerSRV: Personalized Sticker Retrieval with Vision-Language Model

Learning to Respond with Your Favorite Stickers

Text-to-Sticker: Style Tailoring Latent Diffusion Models for Human Expression

Adversarial Sticker: A Stealthy Attack Method in the Physical World

Meaningful Adversarial Stickers for Face Recognition in Physical World.

Understanding Chat Messages for Sticker Recommendation in Messaging Apps

LogoSticker: Inserting Logos into Diffusion Models for Customized Generation

Retrieval-Based Face Annotation by Weak Label Regularized Local Coordinate Coding

Learning to Name Faces

Semi-automatic Dynamic Auxiliary-Tag-aided Image Annotation

Matar: Keywords Enhanced Multi-Label Learning For Tag Recommendation

TagCLIP: Improving Discrimination Ability of Open-Vocabulary Semantic Segmentation