TGCA-PVT: Topic-Guided Context-Aware Pyramid Vision Transformer for Sticker Emotion Recognition

Jian Chen,Wei Wang,Yuzhu Hu,Junxin Chen,Han Liu,Xiping Hu
DOI: https://doi.org/10.1145/3664647.3680781
2024-01-01
Abstract:Online chatting has become an essential aspect of our daily interactions, with stickers emerging as a prevalent tool for conveying emotions more vividly than plain text. While conventional image emotion recognition focuses on global features, sticker emotion recognition necessitates incorporating both global and local features, along with additional modalities like text. To address this, we introduce a topic ID-guided transformer method to facilitate a more nuanced analysis of the stickers. Since each sticker will have a topic, and the same topic will have the same object, we introduce a topic ID as a flag to group images by theme. Our approach encompasses a novel topic-guided context-aware module and a topic-guided attention mechanism, enabling the extraction of comprehensive topic context features from stickers sharing the same topic ID, significantly enhancing emotion recognition accuracy. Moreover, we integrate a frequency linear attention module to leverage frequency domain information to capture better the object information of the stickers and a locally enhanced re-attention mechanism for improved local feature extraction. Extensive experiments and ablation studies on the large-scale sticker emotion dataset SER30k validate the efficacy of our method. Experimental results show that our proposed method obtains the best accuracy on both single-modal and multi-modal sticker emotion recognition.
What problem does this paper attempt to address?