Transfer Learning for Low-Resource Sentiment Analysis

Razhan Hameed,Sina Ahmadi,Fatemeh Daneshfar
2023-04-11
Abstract:Sentiment analysis is the process of identifying and extracting subjective information from text. Despite the advances to employ cross-lingual approaches in an automatic way, the implementation and evaluation of sentiment analysis systems require language-specific data to consider various sociocultural and linguistic peculiarities. In this paper, the collection and annotation of a dataset are described for sentiment analysis of Central Kurdish. We explore a few classical machine learning and neural network-based techniques for this task. Additionally, we employ an approach in transfer learning to leverage pretrained models for data augmentation. We demonstrate that data augmentation achieves a high F$_1$ score and accuracy despite the difficulty of the task.
Computation and Language
What problem does this paper attempt to address?
This paper aims to address the issue of data scarcity in Central Kurdish sentiment analysis. Specifically, the research team focuses on how to conduct effective sentiment analysis in a resource-constrained language environment. Due to the lack of available datasets and models, previous studies have struggled to compare different methods. Therefore, the main contribution of this paper is the creation of a manually annotated dataset for evaluation purposes and the enhancement of the dataset through transfer learning to overcome data imbalance issues. Additionally, the research explores the impact of emojis on sentiment analysis tasks and conducts experimental validation using various classical machine learning models as well as a Bidirectional Long Short-Term Memory (BiLSTM) model. Through these methods, the paper demonstrates the effectiveness of the proposed approach in low-resource language sentiment analysis and provides benchmarks and publicly available datasets for further research.