Multidimensional Affective Analysis for Low-Resource Languages: A Use Case with Guarani-Spanish Code-Switching Language

Marvin M. Agüero-Torales,Antonio G. López-Herrera,David Vilares
DOI: https://doi.org/10.1007/s12559-023-10165-0
IF: 4.89
2023-07-01
Cognitive Computation
Abstract:This paper focuses on text-based affective computing for Jopara, a code-switching language that combines Guarani and Spanish. First, we collected a dataset of tweets primarily written in Guarani and annotated them for three widely used dimensions in sentiment analysis: (a) emotion recognition, (b) humor detection, and (c) offensive language identification. Then, we developed several neural network models, including large language models specifically designed for Guarani, and compared their performance against off-the-shelf multilingual and Spanish pre-trained models for the aforementioned dimensions. Our experiments show that language models incorporating Guarani during pre-training or pre-fine-tuning consistently achieve the best results, despite limited resources (a single 24-GB GPU and only 800K tokens). Notably, even a Guarani BERT model with just two layers of Transformers shows a favorable balance between accuracy and computational power, likely due to the inherent low-resource nature of the task. We present a comprehensive overview of corpus creation and model development for low-resource languages like Guarani, particularly in the context of its code-switching with Spanish, resulting in Jopara. Our findings shed light on the challenges and strategies involved in analyzing affective language in such linguistic contexts.
computer science, artificial intelligence,neurosciences
What problem does this paper attempt to address?