Analysis of the Evolution of Advanced Transformer-Based Language Models: Experiments on Opinion Mining

Nour Eddine Zekaoui,Siham Yousfi,Maryem Rhanoui,Mounia Mikram
DOI: https://doi.org/10.11591/ijai.v12.i4.pp1995-2010
2023-08-07
Abstract:Opinion mining, also known as sentiment analysis, is a subfield of natural language processing (NLP) that focuses on identifying and extracting subjective information in textual material. This can include determining the overall sentiment of a piece of text (e.g., positive or negative), as well as identifying specific emotions or opinions expressed in the text, that involves the use of advanced machine and deep learning techniques. Recently, transformer-based language models make this task of human emotion analysis intuitive, thanks to the attention mechanism and parallel computation. These advantages make such models very powerful on linguistic tasks, unlike recurrent neural networks that spend a lot of time on sequential processing, making them prone to fail when it comes to processing long text. The scope of our paper aims to study the behaviour of the cutting-edge Transformer-based language models on opinion mining and provide a high-level comparison between them to highlight their key particularities. Additionally, our comparative study shows leads and paves the way for production engineers regarding the approach to focus on and is useful for researchers as it provides guidelines for future research subjects.
Computation and Language,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
The paper aims to study the performance of state-of-the-art Transformer-based language models in sentiment analysis (opinion mining) tasks and to reveal the differences between these models through comparative experiments. The main focuses of the paper are as follows: 1. **Model Behavior Study**: Investigate the performance of the latest Transformer-based pre-trained language models on text materials and reveal the differences between them. 2. **Advanced Comparison**: Provide a high-level comparative analysis to highlight the key features of these models. 3. **Guidance for Future Research**: Offer guidance for researchers and practical application suggestions for production engineers. The paper specifically focuses on the following types of models: - **Encoder Models**: Such as BERT, RoBERTa, etc. - **Decoder Models**: Such as GPT, GPT-2, etc. - **Hybrid Models**: Such as BART, XLNet, etc. Through comparative experiments, the authors found that: - **Autoregressive Models** (such as GPT, GPT-2) perform poorly on understanding tasks (such as sentiment classification) because they lack bidirectional contextual information. - **Autoencoding Models** (such as BERT, RoBERTa) perform excellently because they can utilize both left and right contextual information. - **Hybrid Models** (such as XLNet) also perform well because they combine the advantages of autoencoding models. The final results show that the ELECTRA model achieved the highest F1 score of 95.6 on the IMDb movie review dataset. Additionally, the paper discusses the impact of maximum sequence length and data cleaning on model performance, noting that excessive data cleaning may lead to a decline in model performance.