Abstract:Emotion detection (ED) and sentiment analysis (SA) play a vital role in identifying an individual's level of interest in any given field. Humans use facial expressions, voice pitch, gestures, and words to convey their emotions. Emotion detection and sentiment analysis in English and Chinese have received much attention in the last decade. Still, poor-resource languages such as Urdu have been mostly disregarded, which is the primary focus of this research. Roman Urdu should also be investigated like other languages because social media platforms are frequently used for communication. Roman Urdu faces a significant challenge in the absence of corpus for emotion detection and sentiment analysis because linguistic resources are vital for natural language processing. In this study, we create a corpus of 1021 sentences for emotion detection and 20,251 sentences for sentiment analysis, both obtained from various areas, and annotate it with the aid of human annotators from six and three classes, respectively. In order to train large-scale unlabeled data, the bag-of-word, term frequency-inverse document frequency, and Skip-gram models are employed, and the learned word vector is then fed into the CNN-LSTM model. In addition to our proposed approach, we also use other fundamental algorithms, including a convolutional neural network, long short-term memory, artificial neural networks, and recurrent neural networks for comparison. The result indicates that the CNN-LSTM proposed method paired with Word2Vec is more effective than other approaches regarding emotion detection and evaluating sentiment analysis in Roman Urdu. Furthermore, we compare our based model with some previous work. Both emotion detection and sentiment analysis have seen significant improvements, jumping from an accuracy of 85% to 95% and from 89% to 93.3%, respectively.

Resource Construction and Ensemble Learning based Sentiment Analysis for the Low-resource Language Uyghur

Utterance-Based Audio Sentiment Analysis Learned by a Parallel Combination of CNN and LSTM.

A Mixture Language Model for the Classification of Chinese Online Reviews.

Audio Sentiment Analysis by Heterogeneous Signal Features Learned from Utterance-Based Parallel Neural Network.

Transfer Learning for Low-Resource Sentiment Analysis

A Novel Approach for Emotion Detection and Sentiment Analysis for Low Resource Urdu Language Based on CNN-LSTM

Sentiment Analysis in Low-Resource Settings: A Comprehensive Review of Approaches, Languages, and Data Sources

A multimodal approach to cross-lingual sentiment analysis with ensemble of transformer and LLM

Heterogeneous Ensemble Deep Learning Model for Enhanced Arabic Sentiment Analysis

Urdu Speech and Text Based Sentiment Analyzer

Uzbek Sentiment Analysis based on local Restaurant Reviews

Contextually Enriched Meta-Learning Ensemble Model for Urdu Sentiment Analysis

Improving Uyghur ASR systems with decoders using morpheme-based language models

UCAS-IIE-NLP at SemEval-2023 Task 12: Enhancing Generalization of Multilingual BERT for Low-resource Sentiment Analysis

Deep Sentiment Analysis Using CNN-LSTM Architecture of English and Roman Urdu Text Shared in Social Media

UTSA: Urdu Text Sentiment Analysis Using Deep Learning Methods

Ensemble Language Models for Multilingual Sentiment Analysis

Sentiment analysis techniques, challenges, and opportunities: Urdu language-based analytical study

Linguistically Regularized LSTM for Sentiment Classification

An Effective Deployment of Diffusion LM for Data Augmentation in Low-Resource Sentiment Classification

SentiUrdu-1M: A large-scale tweet dataset for Urdu text sentiment analysis using weakly supervised learning