The New Modality: Emoji Challenges in Prediction, Anticipation, and Retrieval

Spencer Cappallo,Stacey Svetlichnaya,Pierre Garrigues,Thomas Mensink,Cees G. M. Snoek
DOI: https://doi.org/10.48550/arXiv.1801.10253
2018-02-02
Abstract:Over the past decade, emoji have emerged as a new and widespread form of digital communication, spanning diverse social networks and spoken languages. We propose to treat these ideograms as a new modality in their own right, distinct in their semantic structure from both the text in which they are often embedded as well as the images which they resemble. As a new modality, emoji present rich novel possibilities for representation and interaction. In this paper, we explore the challenges that arise naturally from considering the emoji modality through the lens of multimedia research. Specifically, the ways in which emoji can be related to other common modalities such as text and images. To do so, we first present a large scale dataset of real-world emoji usage collected from Twitter. This dataset contains examples of both text-emoji and image-emoji relationships. We present baseline results on the challenge of predicting emoji from both text and images, using state-of-the-art neural networks. Further, we offer a first consideration into the problem of how to account for new, unseen emoji - a relevant issue as the emoji vocabulary continues to expand on a yearly basis. Finally, we present results for multimedia retrieval using emoji as queries.
Computation and Language,Information Retrieval,Multimedia
What problem does this paper attempt to address?
The main problem this paper attempts to address is treating emojis as a new modality, exploring the challenges in prediction, anticipation, and retrieval, and studying their relationship with other common modalities such as text and images. Specifically, the paper focuses on the following aspects: 1. **Emojis as an Independent Modality**: The paper posits that emojis are a new modality independent of text and images, with unique semantic structures and expressions. This perspective helps in understanding the richness and diversity of emojis in digital communication. 2. **Emoji Prediction**: The paper proposes a challenging task of predicting relevant emojis from text and/or images. This involves utilizing existing multimodal data to train models that can accurately predict emojis related to given inputs. 3. **Handling Emerging Emojis**: Since new emojis are added every year, the paper also proposes a "zero-shot" challenge task, which involves predicting new emojis without training data. This requires models to leverage external knowledge sources to infer the meanings of new emojis. 4. **Emoji-Based Multimedia Retrieval**: The paper explores how to use emojis as a query language for multimedia retrieval. This task leverages the cross-cultural and language-independent nature of emojis, making them an effective retrieval tool. To support these research goals, the paper constructs a large-scale real-world emoji usage dataset (Twemoji) and proposes three specific challenge tasks along with their baseline results. These tasks include: - Emoji Prediction - Emoji Anticipation - Query-by-Emoji Through these tasks, the paper aims to advance the study of emojis as an independent modality and further explore their potential applications in the multimedia field.