Large Language Models in Targeted Sentiment Analysis

Nicolay Rusnachenko,Anton Golubev,Natalia Loukachevitch
2024-04-19
Abstract:In this paper we investigate the use of decoder-based generative transformers for extracting sentiment towards the named entities in Russian news articles. We study sentiment analysis capabilities of instruction-tuned large language models (LLMs). We consider the dataset of RuSentNE-2023 in our study. The first group of experiments was aimed at the evaluation of zero-shot capabilities of LLMs with closed and open transparencies. The second covers the fine-tuning of Flan-T5 using the "chain-of-thought" (CoT) three-hop reasoning framework (THoR). We found that the results of the zero-shot approaches are similar to the results achieved by baseline fine-tuned encoder-based transformers (BERT-base). Reasoning capabilities of the fine-tuned Flan-T5 models with THoR achieve at least 5% increment with the base-size model compared to the results of the zero-shot experiment. The best results of sentiment analysis on RuSentNE-2023 were achieved by fine-tuned Flan-T5-xl, which surpassed the results of previous state-of-the-art transformer-based classifiers. Our CoT application framework is publicly available:
Computation and Language
What problem does this paper attempt to address?
The paper attempts to address the problem of Targeted Sentiment Analysis (TSA) in Russian news texts. Specifically, the researchers explored the following points: 1. **Sentiment Analysis using Large Language Models (LLMs)**: - Investigated the ability of decoder-based generative transformers to extract sentiment towards named entities in Russian news articles. - Evaluated the sentiment analysis capabilities of instruction-tuned large language models (LLMs) in zero-shot and few-shot settings. 2. **Dataset and Experimental Design**: - Conducted experiments using the RuSentNE-2023 dataset, which contains Russian texts annotated with sentiment. - The experiments were divided into two parts: the first part assessed the zero-shot capabilities of LLMs with different transparency levels ("closed models" and "open models"); the second part involved fine-tuning experiments using the Flan-T5 model combined with the Three-Hop Reasoning (THoR) framework. 3. **Model Comparison and Performance Improvement**: - Compared the performance of various LLMs (such as GPT-4, GPT-3.5, Mistral, DeciLM, etc.) in zero-shot settings and found that these models performed worse on Russian texts compared to their performance on translated English texts. - The fine-tuned Flan-T5 model demonstrated excellent performance under the THoR framework, surpassing the previous state-of-the-art encoder-based classifiers. 4. **Error Analysis**: - Analyzed the main types of discrepancies between model predictions and human annotations, including misjudgment of compound sentiment sentences (E1), incorrect sentiment direction towards multiple entities (E2), and sentiment recognition errors for single entities (E3). In summary, this paper aims to improve the accuracy and robustness of sentiment analysis tasks by enhancing the application of large language models in Russian targeted sentiment analysis.