Abstract:Stickers with vivid and engaging expressions are becoming increasingly popular in online messaging apps, and some works are dedicated to automatically select sticker response by matching the stickers image with previous utterances. However, existing methods usually focus on measuring the matching degree between the dialog context and sticker image, which ignores the user preference of using stickers. Hence, in this article, we propose to recommend an appropriate sticker to user based on multi-turn dialog context and sticker using history of user. Two main challenges are confronted in this task. One is to model the sticker preference of user based on the previous sticker selection history. Another challenge is to jointly fuse the user preference and the matching between dialog context and candidate sticker into final prediction making. To tackle these challenges, we propose a Preference Enhanced Sticker Response Selector (PESRS) model. Specifically, PESRS first employs a convolutional-based sticker image encoder and a self-attention-based multi-turn dialog encoder to obtain the representation of stickers and utterances. Next, deep interaction network is proposed to conduct deep matching between the sticker and each utterance. Then, we model the user preference by using the recently selected stickers as input and use a key-value memory network to store the preference representation. PESRS then learns the short-term and long-term dependency between all interaction results by a fusion network and dynamically fuses the user preference representation into the final sticker selection prediction. Extensive experiments conducted on a large-scale real-world dialog dataset show that our model achieves the state-of-the-art performance for all commonly used metrics. Experiments also verify the effectiveness of each component of PESRS.

TGCA-PVT: Topic-Guided Context-Aware Pyramid Vision Transformer for Sticker Emotion Recognition

Sticker820K: Empowering Interactive Retrieval with Stickers

Towards Real-World Stickers Use: A New Dataset for Multi-Tag Sticker Recognition

Two in One Go: Single-stage Emotion Recognition with Decoupled Subject-context Transformer

Emotion Recognition via Environmental Context and Human Body

Reply with Sticker: New Dataset and Model for Sticker Retrieval

Transformer-Based Multimodal Emotional Perception for Dynamic Facial Expression Recognition in the Wild

Emotion-Aware Transformer Encoder for Empathetic Dialogue Generation

Facial Expression Recognition Based on Multi-Scale Convolutional Vision Transformer

Impact of Stickers on Multimodal Chat Sentiment Analysis and Intent Recognition: A New Task, Dataset and Baseline

Selecting Stickers in Open-Domain Dialogue Through Multitask Learning

Emotion recognition using hierarchical spatial-temporal learning transformer from regional to global brain

EERCA-ViT: Enhanced Effective Region and Context-Aware Vision Transformers for Image Sentiment Analysis

VSD2M: A Large-scale Vision-language Sticker Dataset for Multi-frame Animated Sticker Generation

Learning to Respond with Your Favorite Stickers

Topic and Style-aware Transformer for Multimodal Emotion Recognition

Modality-collaborative Transformer with Hybrid Feature Reconstruction for Robust Emotion Recognition

PerSRV: Personalized Sticker Retrieval with Vision-Language Model

A Multi-Stage Visual Perception Approach for Image Emotion Analysis

Short and Long Range Relation Based Spatio-Temporal Transformer for Micro-Expression Recognition

A Simple and Interactive Transformer for Fine-Grained Emotion Detection