Reply with Sticker: New Dataset and Model for Sticker Retrieval

Bin Liang,Bingbing Wang,Zhixin Bai,Qiwei Lang,Mingwei Sun,Kaiheng Hou,Lanjun Zhou,Ruifeng Xu,Kam-Fai Wong
2024-07-22
Abstract:Using stickers in online chatting is very prevalent on social media platforms, where the stickers used in the conversation can express someone's intention/emotion/attitude in a vivid, tactful, and intuitive way. Existing sticker retrieval research typically retrieves stickers based on context and the current utterance delivered by the user. That is, the stickers serve as a supplement to the current utterance. However, in the real-world scenario, using stickers to express what we want to say rather than as a supplement to our words only is also important. Therefore, in this paper, we create a new dataset for sticker retrieval in conversation, called \textbf{StickerInt}, where stickers are used to reply to previous conversations or supplement our words\footnote{We believe that the release of this dataset will provide a more complete paradigm than existing work for the research of sticker retrieval in the open-domain online conversation.}. Based on the created dataset, we present a simple yet effective framework for sticker retrieval in conversation based on the learning of intention and the cross-modal relationships between conversation context and stickers, coined as \textbf{Int-RA}. Specifically, we first devise a knowledge-enhanced intention predictor to introduce the intention information into the conversation representations. Subsequently, a relation-aware sticker selector is devised to retrieve the response sticker via cross-modal relationships. Extensive experiments on the created dataset show that the proposed model achieves state-of-the-art performance in sticker retrieval\footnote{The dataset and source code of this work are released at \url{<a class="link-external link-https" href="https://github.com/HITSZ-HLT/Int-RA" rel="external noopener nofollow">this https URL</a>}.}.
Multimedia
What problem does this paper attempt to address?
The paper addresses the problem of sticker retrieval in online conversations, specifically focusing on two scenarios: using stickers to supplement the current utterance and using stickers to directly reply to previous conversations. The authors aim to improve upon existing sticker retrieval research, which typically treats stickers as supplementary to text, by considering stickers as a primary means of expression. To achieve this goal, the authors create a new dataset called StickerInt, which includes conversations where stickers are used either to supplement the text or to directly respond to previous messages. They also propose a framework called Int-RA, which incorporates the prediction of user intentions and the use of cross-modal relationships between conversation context and stickers. Key contributions of the paper include: 1. **Creation of StickerInt Dataset:** This dataset is designed to be more comprehensive than existing datasets, covering both the use of stickers as supplements and as direct replies. It includes annotated intention tags for each sticker, providing additional context for models to learn from. 2. **Proposed Int-RA Framework:** The framework consists of a knowledge-enhanced intention predictor and a relation-aware sticker selector. The intention predictor uses commonsense knowledge to infer the user's intention in the conversation, while the sticker selector leverages cross-modal attention to retrieve the most relevant sticker. 3. **Experimental