Emoji Prediction in Tweets using BERT

Muhammad Osama Nusrat,Zeeshan Habib,Mehreen Alam,Saad Ahmed Jamal
2023-08-26
Abstract:In recent years, the use of emojis in social media has increased dramatically, making them an important element in understanding online communication. However, predicting the meaning of emojis in a given text is a challenging task due to their ambiguous nature. In this study, we propose a transformer-based approach for emoji prediction using BERT, a widely-used pre-trained language model. We fine-tuned BERT on a large corpus of text (tweets) containing both text and emojis to predict the most appropriate emoji for a given text. Our experimental results demonstrate that our approach outperforms several state-of-the-art models in predicting emojis with an accuracy of over 75 percent. This work has potential applications in natural language processing, sentiment analysis, and social media marketing.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
This paper attempts to address the problem of predicting the most appropriate emoji in social media texts (such as tweets). Specifically, due to the increasing use of emojis in online communication and their often ambiguous meanings, predicting the most suitable emoji for a given text is a challenging task. The authors propose a BERT-based transformer approach to predict the most appropriate emoji for a given text by fine-tuning BERT on a large dataset of tweets containing both text and emojis. ### Main Contributions of the Paper: 1. **Proposed a BERT-based transformer approach**: Utilized a pre-trained BERT model and fine-tuned it to predict emojis in a given text. 2. **Experimental results demonstrate the effectiveness of the method**: Conducted experiments on two different datasets, showing that the method achieves over 75% accuracy in emoji prediction tasks. 3. **Explored the impact of different factors on model performance**: Including the size of the training data, the number of emojis, etc. 4. **Potential applications**: This research has potential applications in fields such as natural language processing, sentiment analysis, and social media marketing. ### Specific Problems Addressed: - **Ambiguity of emojis**: Emojis can have different meanings in different contexts, making it challenging to accurately predict the most appropriate emoji for a given text. - **Diversity and scale of datasets**: Lack of large-scale, diverse datasets to train and evaluate models. - **Cross-linguistic and cultural diversity**: Emojis used in different languages and cultures vary significantly, necessitating the development of language and culture-specific models. Through this research, the authors hope to improve the effectiveness of communication on social media platforms, particularly in cases of ambiguous text, by adding emojis to increase the clarity of information.