Abstract:Recently, Large Language Models (LLMs) have garnered increasing attention in the field of natural language processing, revolutionizing numerous downstream tasks with powerful reasoning and generation abilities. For example, In-Context Learning (ICL) introduces a fine-tuning-free paradigm, allowing out-of-the-box LLMs to execute downstream tasks by analogy learning without any fine-tuning. Besides, in a fine-tuning-dependent paradigm where substantial training data exists, Parameter-Efficient Fine-Tuning (PEFT), as the cost-effective methods, enable LLMs to achieve excellent performance comparable to full fine-tuning. However, these fascinating techniques employed by LLMs have not been fully exploited in the ABSA field. Previous works probe LLMs in ABSA by merely using randomly selected input-output pairs as demonstrations in ICL, resulting in an incomplete and superficial evaluation. In this paper, we shed light on a comprehensive evaluation of LLMs in the ABSA field, involving 13 datasets, 8 ABSA subtasks, and 6 LLMs. Specifically, we design a unified task formulation to unify ``multiple LLMs for multiple ABSA subtasks in multiple paradigms.'' For the fine-tuning-dependent paradigm, we efficiently fine-tune LLMs using instruction-based multi-task learning. For the fine-tuning-free paradigm, we propose 3 demonstration selection strategies to stimulate the few-shot abilities of LLMs. Our extensive experiments demonstrate that LLMs achieve a new state-of-the-art performance compared to fine-tuned Small Language Models (SLMs) in the fine-tuning-dependent paradigm. More importantly, in the fine-tuning-free paradigm where SLMs are ineffective, LLMs with ICL still showcase impressive potential and even compete with fine-tuned SLMs on some ABSA subtasks.

What problem does this paper attempt to address?

### Problems the paper attempts to solve This paper aims to comprehensively evaluate the performance of large - language models (LLMs) in the aspect - based sentiment analysis (ABSA) field. Specifically, the paper attempts to solve the following key problems: 1. **Comprehensive performance of LLMs in ABSA tasks**: - Evaluate the performance of LLMs on different datasets and subtasks through extensive experiments involving 13 datasets, 8 ABSA subtasks, and 6 LLMs. - Design a unified task formula to unify "multiple LLMs handling multiple ABSA subtasks in multiple paradigms". 2. **Performance of LLMs in the fine - tuning - dependent paradigm**: - Efficiently fine - tune open - source LLMs using instruction - driven multi - task learning and low - rank adaptation (LoRA) methods, and explore whether LLMs can outperform small - language models (SLMs) in the fine - tuning - dependent paradigm. 3. **Performance of LLMs in the zero/few - shot learning paradigm**: - Research the performance of LLMs in the zero/few - shot learning paradigm, especially explore the impact of different demonstration selection strategies (random selection, keyword selection, semantic selection) on the performance of LLMs. - Explore whether LLMs can effectively replace fine - tuned SLMs in data - scarce scenarios. 4. **Effectiveness of demonstration selection strategies**: - Compare the effects of three demonstration selection strategies: random selection, keyword selection, and semantic selection, and study how these strategies affect the performance of LLMs in different ABSA subtasks. 5. **Performance differences of LLMs on different subtasks and models**: - Analyze whether there are differences in the performance of LLMs on different ABSA subtasks and the performance differences between different LLMs. ### Main contributions 1. **Comprehensively evaluate the performance of LLMs in the ABSA field**: - Involve 13 datasets, 8 subtasks, and 6 LLMs to achieve "a unified formula for multiple LLMs handling multiple ABSA subtasks in multiple paradigms". 2. **Demonstrate the superiority of efficiently fine - tuned LLMs in the fine - tuning - dependent paradigm**: - Efficiently fine - tune LLMs through instruction - driven multi - task learning, comprehensively outperforming fine - tuned SLMs. 3. **Research different demonstration selection strategies and significantly improve the performance of API - based LLMs in the zero/few - shot learning paradigm**: - In data - scarce scenarios where SLMs are completely ineffective, API - based LLMs combined with ICL can still perform well, and even be comparable to fine - tuned SLMs on some subtasks. In conclusion, through comprehensive experiments and analysis, this paper shows that LLMs always outperform SLMs in the ABSA field, whether in the fine - tuning - dependent paradigm or the zero/few - shot learning paradigm, and provides a valuable baseline for future research.

A Comprehensive Evaluation of Large Language Models on Aspect-Based Sentiment Analysis

Large language models for aspect-based sentiment analysis

Sentiment Analysis in the Era of Large Language Models: A Reality Check

Boosting Large Language Models with Continual Learning for Aspect-based Sentiment Analysis

Single Ground Truth Is Not Enough: Add Linguistic Variability to Aspect-based Sentiment Analysis Evaluation

Exploring Large Language Models for Multimodal Sentiment Analysis: Challenges, Benchmarks, and Future Directions

Toward Knowledge Integration with Large Language Model for End-to-End Aspect-Based Sentiment Analysis in Social Multimedia

A Survey on Evaluation of Large Language Models

Evaluating Large Language Models: A Comprehensive Survey

A Survey on Evaluation of Large Language ModelsJust Accepted

Utilizing Large Language Models for Event Deconstruction to Enhance Multimodal Aspect-Based Sentiment Analysis

Iterative Data Generation with Large Language Models for Aspect-based Sentiment Analysis

Analyzing Large Language Models for Classroom Discussion Assessment

DnA-Eval: Enhancing Large Language Model Evaluation through Decomposition and Aggregation

A Comprehensive Survey of Small Language Models in the Era of Large Language Models: Techniques, Enhancements, Applications, Collaboration with LLMs, and Trustworthiness

A Comprehensive Overview of Large Language Models

A Comprehensive Evaluation of Large Language Models on Legal Judgment Prediction

LLM-augmented Preference Learning from Natural Language

Through the Lens of Core Competency: Survey on Evaluation of Large Language Models

Evaluating Large Language Models at Evaluating Instruction Following