Online detection and infographic explanation of spam reviews with data drift adaptation

Francisco de Arriba-Pérez,Silvia García-Méndez,Fátima Leal,Benedita Malheiro,J. C. Burguillo
DOI: https://doi.org/10.15388/24-INFOR562
2024-06-21
Abstract:Spam reviews are a pervasive problem on online platforms due to its significant impact on reputation. However, research into spam detection in data streams is scarce. Another concern lies in their need for transparency. Consequently, this paper addresses those problems by proposing an online solution for identifying and explaining spam reviews, incorporating data drift adaptation. It integrates (i) incremental profiling, (ii) data drift detection & adaptation, and (iii) identification of spam reviews employing Machine Learning. The explainable mechanism displays a visual and textual prediction explanation in a dashboard. The best results obtained reached up to 87 % spam F-measure.
Machine Learning,Artificial Intelligence,Computation and Language,Social and Information Networks
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve two main problems in spam reviews detection on online platforms: 1. **Spam reviews detection in data streams**: - Spam reviews are a common problem on online platforms because they have a significant impact on reputation. However, relatively few studies have been conducted on spam reviews detection in data streams. Traditional spam reviews detection methods are usually based on static data sets, while in practical applications, review data changes dynamically. Therefore, a method that can process and adapt to these changes in real - time is required. 2. **Transparency and interpretability**: - Another important problem with spam reviews detection systems is their transparency and interpretability. In order to make users trust and understand how the system works, the detection results need to be presented in an intuitive and easy - to - understand way. Existing methods often lack sufficient explanation mechanisms, making it difficult for users to understand why a certain review is marked as spam. To this end, the paper proposes an online spam reviews detection framework combined with data drift adaptation. The framework solves the problems through the following three key modules: - **Incremental Profiling**: Extract features from user - generated content through natural language processing (NLP) techniques and gradually update user profiles. - **Data Drift Detection & Adaptation**: Monitor changes in input data, identify and adapt to data drift, and ensure that the model can maintain high accuracy when the data distribution changes. - **Identification and Explanation of Spam Reviews**: Use machine learning methods to identify spam reviews and display prediction results on the dashboard through visualization and text explanation. Finally, the framework achieves an F - measure as high as 87%, demonstrating its efficiency and transparency in spam reviews detection. ### Formula summary - **Incremental average calculation**: \[ favg_{tk}=\frac{1}{k}\sum_{i = 0}^{k}f_{ti} \] where \(f\) represents a feature, and \([f_{to}, f_{t1},..., f_{tk}]\) represents the past feature data of each user. - **Incremental maximum calculation**: \[ fmax_{tk}=\max(f_{ti}) \] These formulas are used to calculate the incremental features of users and items, so as to better capture the trend of data changes over time.