A sentiment analysis model for car review texts based on adversarial training and whole word mask BERT

Xingchen Liu,Yawen Li,Yingxia Shao,Ang Li,Jian Liang
DOI: https://doi.org/10.48550/arXiv.2206.02389
2022-06-06
Abstract:In the field of car evaluation, more and more netizens choose to express their opinions on the Internet platform, and these comments will affect the decision-making of buyers and the trend of car word-of-mouth. As an important branch of natural language processing (NLP), sentiment analysis provides an effective research method for analyzing the sentiment types of massive car review texts. However, due to the lexical professionalism and large text noise of review texts in the automotive field, when a general sentiment analysis model is applied to car reviews, the accuracy of the model will be poor. To overcome these above challenges, we aim at the sentiment analysis task of car review texts. From the perspective of word vectors, pre-training is carried out by means of whole word mask of proprietary vocabulary in the automotive field, and then training data is carried out through the strategy of an adversarial training set. Based on this, we propose a car review text sentiment analysis model based on adversarial training and whole word mask BERT(ATWWM-BERT).
Computation and Language,Information Retrieval
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve several key problems in sentiment analysis of automobile review texts: 1. **Handling of specialized vocabulary**: - Automobile review texts contain a large number of technical terms and domain - specific words. Traditional general - purpose sentiment analysis models perform poorly when dealing with these specialized words, resulting in lower classification accuracy. - The paper proposes to pre - train the BERT model by introducing the Whole Word Mask (WWM) technique to better handle specialized words in the automotive field. 2. **Impact of text noise**: - There is a large amount of noise in automobile review texts, such as redundant punctuation marks, modal words, etc. This noise will interfere with the learning process of the model and reduce the robustness of the model. - The paper adopts the method of Adversarial Training. By adding perturbations to word embeddings and the final fully - connected layer, the anti - interference ability of the model is improved. 3. **Utilization of context information**: - Traditional sentiment analysis methods are often unable to fully utilize context information. Especially when dealing with long texts, they are prone to overlook important semantic information. - The model proposed in the paper can better extract and utilize context information by combining whole - word masking and adversarial training, thereby improving the accuracy of sentiment classification. 4. **Improving classification performance**: - In order to improve the classification performance of sentiment analysis of automobile review texts, the paper proposes a sentiment analysis model based on Adversarial Training and Whole - Word - Masked BERT (ATWWM - BERT). - The experimental results show that this model is superior to the existing state - of - the - art models in terms of accuracy (ACC) and macro - averaged F1 value (Macro - F1), especially when dealing with sentiment analysis tasks in the automotive field, showing significant advantages. In summary, this paper mainly solves the problems of improper handling of specialized vocabulary, large influence of text noise, and insufficient utilization of context information in sentiment analysis of automobile review texts, and significantly improves the classification performance by proposing a new model structure.