Are Neighbors Alike? A Semi-supervised Probabilistic Ensemble for Online Review Spammers Detection

Zhiang Wu,Guannan Liu,Junjie Wu,Yong Tan
DOI: https://doi.org/10.2139/ssrn.4013130
2022-01-01
SSRN Electronic Journal
Abstract:Review spammers can harm the trustworthy environment of online platforms by purposefully posting unauthentic ratings and comments for products or online merchants, with the aim of gaining improper benefits. Though a vast majority of methods have been proposed to resolve the spammer detection problem, several challenges such as collusion recognition, label scarcity, model explanation, etc., are still persistent and call for further investigation. Building on the prevalent collusive spamming behaviors and the network homophily theory, we introduce a reviewer network to account for the explicit co-review relations, and then propose a semi-supervised probabilistic ensemble to collectively model both reviewers' individual behavioral features and the reviewer network. Our model features in integrating partial labels propagation with the feature-based learning for reviewer network modelling, which is proved theoretically to be a weighted logistic regression on a network-related synthetic data set. The rich parameters that characterize the importance of network information, the strength of network homophily, and the value of unlabeled data, make our model more transparent. The empirical evaluations on a real-life data set have demonstrated the effectiveness of our model and the value of unlabeled data learning. In particular, the reviewer network after proper trimming shows strong homophily effect and plays a vital role in choosing collusion-related features for high-performance yet interpretable spammer detection.
What problem does this paper attempt to address?