Detecting Spam Reviews on Vietnamese E-commerce Websites

Co Van Dinh,Son T. Luu,Anh Gia-Tuan Nguyen
DOI: https://doi.org/10.1007/978-3-031-21743-2_48
2022-12-09
Abstract:The reviews of customers play an essential role in online shopping. People often refer to reviews or comments of previous customers to decide whether to buy a new product. Catching up with this behavior, some people create untruths and illegitimate reviews to hoax customers about the fake quality of products. These are called spam reviews, confusing consumers on online shopping platforms and negatively affecting online shopping behaviors. We propose the dataset called ViSpamReviews, which has a strict annotation procedure for detecting spam reviews on e-commerce platforms. Our dataset consists of two tasks: the binary classification task for detecting whether a review is spam or not and the multi-class classification task for identifying the type of spam. The PhoBERT obtained the highest results on both tasks, 86.89% and 72.17%, respectively, by macro average F1 score.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
The paper attempts to address the problem of detecting spam reviews on Vietnamese e-commerce websites. Specifically, the authors constructed a dataset named ViSpamReviews, which contains over 19,000 user reviews manually annotated to identify spam reviews and their types through a rigorous annotation process. The paper proposes two tasks: the first task is a binary classification task to determine whether a review is spam or not; the second task is a multi-class classification task to identify the specific type of spam review. The authors also applied various classification models, including deep neural network-based models (such as Text-CNN, LSTM, GRU) and Transformer-based models (such as PhoBERT and BERT4News), and evaluated their performance on the dataset. Through experimental results, the PhoBERT model achieved the best performance on both tasks, with macro-average F1 scores of 86.89% and 72.17%, respectively. Additionally, the authors analyzed the mispredictions and found that the main challenge lies in distinguishing between normal and spam reviews, especially when the review content is short or only involves the brand rather than the product itself. Finally, the authors proposed future research directions, including expanding the dataset to detect spam paragraphs within reviews and identifying user opinions on specific product features and related services.