Leveraging distant supervision and deep learning for twitter sentiment and emotion classification

Muhamet Kastrati,Zenun Kastrati,Ali Shariq Imran,Marenglen Biba

DOI: https://doi.org/10.1007/s10844-024-00845-0

2024-03-24

Journal of Intelligent Information Systems

Abstract:Nowadays, various applications across industries, healthcare, and security have begun adopting automatic sentiment analysis and emotion detection in short texts, such as posts from social media. Twitter stands out as one of the most popular online social media platforms due to its easy, unique, and advanced accessibility using the API. On the other hand, supervised learning is the most widely used paradigm for tasks involving sentiment polarity and fine-grained emotion detection in short and informal texts, such as Twitter posts. However, supervised learning models are data-hungry and heavily reliant on abundant labeled data, which remains a challenge. This study aims to address this challenge by creating a large-scale real-world dataset of 17.5 million tweets. A distant supervision approach relying on emojis available in tweets is applied to label tweets corresponding to Ekman's six basic emotions. Additionally, we conducted a series of experiments using various conventional machine learning models and deep learning, including transformer-based models, on our dataset to establish baseline results. The experimental results and an extensive ablation analysis on the dataset showed that BiLSTM with FastText and an attention mechanism outperforms other models in both classification tasks, achieving an F1-score of 70.92% for sentiment classification and 54.85% for emotion detection.

computer science, information systems, artificial intelligence

What problem does this paper attempt to address?

### Problems the Paper Aims to Solve This paper aims to address the issue of supervised learning models' high dependency on large amounts of labeled data in sentiment analysis and emotion detection tasks. Specifically, the goals of the paper are as follows: 1. **Automatically Create Large-Scale Sentiment Datasets**: - Utilize emojis on Twitter for distant supervision to automatically label Twitter datasets for sentiment polarity and emotion classification tasks. 2. **Evaluate the Performance of Different Models**: - Test various traditional machine learning models and deep learning models (including Transformer-based models) on the newly created dataset to establish benchmark results and explore suitable methods for sentiment polarity and emotion detection for this dataset. 3. **Improve Classifier Performance**: - Propose a multi-layer BiLSTM model that combines pre-trained word embedding techniques and attention mechanisms for sentiment polarity and multi-class emotion classification tasks. ### Main Research Questions - **RQ1**: How to automatically create a large-scale sentiment dataset using emojis on Twitter? - **RQ2**: How do the amount of training data and class imbalance affect the performance of traditional machine learning algorithms and deep neural networks? - **RQ3**: To what extent can pre-trained word embedding techniques and attention mechanisms improve performance in sentiment and emotion classification tasks? ### Core Contributions - Collected and organized a large-scale real-world Twitter dataset, automatically labeled using emojis according to the Ekman model. - Compared the performance of traditional machine learning algorithms and deep neural networks on sentiment polarity and emotion classification tasks. - Proposed a multi-layer BiLSTM model that combines pre-trained word embeddings and attention mechanisms for sentiment polarity and multi-class emotion classification. - Conducted ablation analysis to explore the impact of dataset size, number of classes, and class imbalance on classification performance. Through these efforts, the paper aims to overcome the issues of data scarcity and limited model generalization ability in existing sentiment analysis and emotion detection tasks.

Leveraging distant supervision and deep learning for twitter sentiment and emotion classification

An Emotion based Sentiment Analysis on Twitter Dataset

Practical Text Classification With Large Pre-Trained Language Models

Deep learning for emotion analysis in Arabic tweets

Hybrid Deep Learning Approach for Sentiment Analysis on Twitter Data

Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm

Emotion recognition of social media users based on deep learning

Advancing Twitter Sentiment Analysis: An Ensemble Approach with Transformer-XL, RoBERTa, and XGBoost

Improving Sentiment Analysis for Social Media Applications Using an Ensemble Deep Learning Language Model

Emotion Detection and Analysis on Social Media

A machine learning-based approach for sentiment analysis on distance learning from Arabic Tweets

Fine-Tuning BERT Based Approach for Multi-Class Sentiment Analysis on Twitter Emotion Data

Using an auxiliary dataset to improve emotion estimation in users’ opinions

Multi-class Emotion AI by reconstructing linguistic context of words

A Novel Machine Learning Approach for Sentiment Analysis on Twitter Incorporating the Universal Language Model Fine-Tuning and SVM

DistilRoBiLSTMFuse: an efficient hybrid deep learning approach for sentiment analysis

Multi-layered perceptron based deep learning model for emotion extraction on monolingual text using intelligence feature engineering and filtering techniques

Live Sentiment Analysis Using Multiple Machine Learning and Text Processing Algorithms

Emotion analysis of Arabic tweets using deep learning approach

Detecting rumors in social media using emotion based deep learning approach

A Multitask Multimodal Ensemble Model for Sentiment- and Emotion-Aided Tweet Act Classification