Abstract:Stickers with vivid and engaging expressions are becoming increasingly popular in online messaging apps, and some works are dedicated to automatically select sticker response by matching the stickers image with previous utterances. However, existing methods usually focus on measuring the matching degree between the dialog context and sticker image, which ignores the user preference of using stickers. Hence, in this article, we propose to recommend an appropriate sticker to user based on multi-turn dialog context and sticker using history of user. Two main challenges are confronted in this task. One is to model the sticker preference of user based on the previous sticker selection history. Another challenge is to jointly fuse the user preference and the matching between dialog context and candidate sticker into final prediction making. To tackle these challenges, we propose a Preference Enhanced Sticker Response Selector (PESRS) model. Specifically, PESRS first employs a convolutional-based sticker image encoder and a self-attention-based multi-turn dialog encoder to obtain the representation of stickers and utterances. Next, deep interaction network is proposed to conduct deep matching between the sticker and each utterance. Then, we model the user preference by using the recently selected stickers as input and use a key-value memory network to store the preference representation. PESRS then learns the short-term and long-term dependency between all interaction results by a fusion network and dynamically fuses the user preference representation into the final sticker selection prediction. Extensive experiments conducted on a large-scale real-world dialog dataset show that our model achieves the state-of-the-art performance for all commonly used metrics. Experiments also verify the effectiveness of each component of PESRS.

Integrating Stickers into Multimodal Dialogue Summarization: A Novel Dataset and Approach for Enhancing Social Media Interaction

Impact of Stickers on Multimodal Chat Sentiment Analysis and Intent Recognition: A New Task, Dataset and Baseline

Selecting Stickers in Open-Domain Dialogue Through Multitask Learning

STICKERCONV: Generating Multimodal Empathetic Responses from Scratch

Topic-Oriented Spoken Dialogue Summarization for Customer Service with Saliency-Aware Topic Modeling

Sticker820K: Empowering Interactive Retrieval with Stickers

Reply with Sticker: New Dataset and Model for Sticker Retrieval

Towards Expressive Communication with Internet Memes: A New Multimodal Conversation Dataset and Benchmark

VSD2M: A Large-scale Vision-language Sticker Dataset for Multi-frame Animated Sticker Generation

MSCTD: A Multimodal Sentiment Chat Translation Dataset

Towards Real-World Stickers Use: A New Dataset for Multi-Tag Sticker Recognition

TODSum: Task-Oriented Dialogue Summarization with State Tracking

Dialogue Discourse-Aware Graph Model and Data Augmentation for Meeting Summarization

TGCA-PVT: Topic-Guided Context-Aware Pyramid Vision Transformer for Sticker Emotion Recognition

CSDS: A Fine-Grained Chinese Dataset for Customer Service Dialogue Summarization

JDDC 2.1: A Multimodal Chinese Dialogue Dataset with Joint Tasks of Query Rewriting, Response Generation, Discourse Parsing, and Summarization

Ch-Sims: A Chinese Multimodal Sentiment Analysis Dataset With Fine-Grained Annotations Of Modality

Make Acoustic and Visual Cues Matter: CH-SIMS v2.0 Dataset and AV-Mixup Consistent Module

Learning to Respond with Your Favorite Stickers

StyleChat: Learning Recitation-Augmented Memory in LLMs for Stylized Dialogue Generation

ClidSum: A Benchmark Dataset for Cross-Lingual Dialogue Summarization