Fine-tuning BERT, DistilBERT, XLM-RoBERTa and Ukr-RoBERTa models for sentiment analysis of ukrainian language reviews
Prytula M,
DOI: https://doi.org/10.15407/jai2024.02.085
IF: 14.4
2024-06-28
Artificial Intelligence
Abstract:Sentiment analysis is one of the crucial tasks of natural language processing, which includes recognizing emotions expressed in textual data from various fields of activity. Automated tonality detection impacts businesses and helps increase profits by analyzing customer sentiment and responding quickly to their level of satisfaction with products or services. Therefore, the development of tools that will allow qualitative classification of text sentiment is significant, considering that users leave many reviews on various social networks, platforms, and websites in today's world. The study examines the fine-tuning of BERT, DistilBERT, XLM-RoBERTa, and Ukr-RoBERTa models for sentiment analysis of reviews in the Ukrainian language, as transformer models demonstrate a better understanding of the context and show high efficiency in solving natural language processing tasks. The dataset used in this study comprised about 11,000 user comments in Ukrainian, covering a range of topics such as shops, restaurants, hotels, medical facilities, fitness clubs, and the provision of various services. The textual data was categorized into two classes: positive and negative. Following text preprocessing, the dataset was divided into training and test samples in an 80:20 ratio. The hyperparameters were selected to optimize the performance of the pre-trained models for comment sentiment classification, and their effectiveness was evaluated using metrics such as accuracy, recall, precision, and F1-score. The results show that DistilBERT requires significantly fewer computing resources and is faster than other models. The XLM-RoBERTa model achieved the highest accuracy of 91.32%. However, considering the time needed to train the model and all the classification metrics, Ukr-RoBERTa is the optimal choice.
computer science, artificial intelligence